Introductory Textbook (Also Suitable for Self-Study)
Jurafsky, Daniel, and James H. Martin: Speech and Language Processing, 2008. Prentice Hall. (Second, Revised Edition; In Preparation). Draft On-Line Version.
Research Articles (Obligatory Readings)
Grefenstette, Gregory, and Pasi Tapanainen: "What is a Word, What is a Sentence? Problems of Tokenization" i Proceedings of The 3rd International Conference on Computational Lexicography, 1994. pp. 79–87. On-Line Copy.
Fred Karlsson: "Constraint Grammar as a Framework for Parsing Running Text" i Proceedings of the 13th International Conference on Computational Linguistics, 1990. pp. 168–173. On-Line Copy.
Adwait Ratnaparkhi: "A Maximum Entropy Model for Part-Of-Speech Tagging" i Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996. pp. 133–142. On-Line Copy.
Samuelsson, Christer, and Atro Voutilainen: "Comparing a Linguistic and a Stochastic Tagger" i Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, 1997. 246–253. On-Line Copy.
Reynar, Jeffrey C, and Adwait Ratnaparkhi: "A Maximum Entropy Approach to Identifying Sentence Boundaries" i Proceedings of the 5th Conference on Applied Natural Language Processing, 1997. pp. 16–19. On-Line Copy.
Nivre, Joakim: "Two Strategies for Text Parsing" i A Man of Measure: Festschrift in Honour of Fred Karlsson on his 60th Birthday, 2006. pp. 440–448. On-Line Copy.
Charniak, Eugene: " Statistical Techniques for Natural Language Parsing" i AI Magazine, 1997. (On-Line Copy).
Collins, Michael: "Head-Driven Statistical Models for Natural Language Parsing" i Computational Linguistics, 2003. (On-Line Copy).
Klein, Dan, and Christopher D. Manning: "Accurate Unlexicalized Parsing" i Proceedings of the 41st Meeting of the Association for Computational Linguistics, 2003. (On-Line Copy).
Charniak, Eugene: " A Maximum-Entropy-Inspired Parser" i Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics, 2000. (On-Line Copy).
Charniak, Eugene, and Mark Johnson: "Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking" i Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, 2005. (On-Line Copy).
Briscoe, Edward, and John A. Carroll : "Robust Accurate Statistical Annotation of General Text." i Proceedings of the 3rd International Conference on Language Resources and Evaluation, 2002. pp. 1499–1504. (On-Line Copy).
Carroll, John A., Edward Briscoe, and Antonio Sanfilippo: "Parser Evaluation. A Survey and a New Proposal" i Proceedings of the 1st International Conference on Language Resources and Evaluation, 1998. pp. 447–454. (On-Line Copy).
Briscoe, Edward, and John A. Carroll: "Evaluating the Accuracy of an Unlexicalized Statistical Parser on the PARC DepBank" i Proceedings of the COLING|ACL 2006 Main Conference Poster Sessions, 2006. (On-Line Copy).
Nivre, Joakim, Johann Hall, Jense Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov, and E. Marsi : "MaltParser: A Language-Independent System for Data-Driven Dependency Parsing" i Natural Language Engineering, 2007. 13 (2), pp. 95–135. On-Line Copy.
Gildea, Daniel: "Corpus Variation and Parser Performance" i Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, 2001. (On-Line Copy).
Kaplan, Ron, Stefan Riezler, Tracy King, John Maxwell, Alexander Vasserman, and Richard Crouch: "Speed and Accuracy in Shallow and Deep Stochastic Parsing." i Proceedings of the Human Language Technology Conference and the 4th Annual Meeting of the North American Chapter of the Association for Computational Linguistics, 2004. (On-Line Copy).
Toutanova, Kristina, Christopher D. Manning, Dan Flickinger, and Stephan Oepen: "Stochastic HPSG Parse Disambiguation using the Redwoods Corpus" i Research on Language and Computation, 2005. 3(1): pp. 83–105. (On-Line Copy).
Palmer, Martha, Daniel Gildea, and Paul Kingsbury: "The Proposition Bank: An Annotated Corpus of Semantic Roles" i Computational Linguistics, 2005. 31(1): pp. 71–106. (On-Line Copy).
Miyao, Yusuke, and Jun'ichi Tsujii: "Deep Linguistic Analysis for the Accurate Identification of Predicate-Argument Relations" i Proceedings of the 20th International Conference on Computational Linguistics, 2004. pp. 1392–1397. (On-Line Copy).
Background Reading (Optional or Project-Related)
Manning, Christopher D., and Hinrich Schuetze: Foundations of Statistical Natural Language Processing, 1999. The MIT Press.
Hagen, Kristin, Johannessen, Janne Bondi, and Anders N?klestad: "A Constraint-Based Tagger for Norwegian" i Proceedings of the 17th Scandinavian Conference of Linguistics 1998, 1998. (On-Line Copy).
Brants, Thorsten: "TnT. A Statistical Part-of-Speech Tagger" i Proceedings of the 6th Conference on Applied Natural Language Processing, 2000. (On-Line Copy).
Berger, Adam, Stephen Della Pietra, and Vincent Della Pietra: "A Maximum Entropy Approach to Natural Language Processing" i Computational Linguistics, 1996. (On-Line Copy).
Charniak, Eugene: "Statistical Parsing with a Context-Free Grammar and Word Statistics" i Proceedings of the Fourteenth National Conference on Artificial Intelligence, 1997. (On-Line Copy).
Marcus, Mitch, Beatrice Santorini, and Mary Ann Marcinkiewicz: "Building a Large Annotated Corpus of English. The Penn Treebank" i Computational Linguistics, 1993. (On-Line Copy).
Petrov, Slav, Leon Barrett, Romain Thibaux, and Dan Klein: "Learning Accurate, Compact, and Interpretable Tree Annotation" i Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 2006. (On-Line Copy).
Riezler, Stefan, Tracy H. King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell, and Mark Johnson: "Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques" i Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002. (On-Line Copy).