The syllabus consists of
- Weekly slides
- Weekly exercises
- Mandatory assignments
- Readings
The detailed syllabus for each week is listed under each week page.
This is an overview of the mandatory readings, so far. For recommended readings, exercises etc. see the weekly pages.
Jurafsky and Martin, Speech and Language Processing, 3. ed. (edition of 21 Sept 2021!)
- For chapters 1-6, there are only smaller corrections in the 2021 edition compared to the edition of 30 Dec. 2020
- Ch. 2 Regular expressions, etc.
- Sec. 2.0
- Sec. 2.2 Words
- Sec. 2.3 Corpora
- Sec. 2.4 Normalization, except 2.4.3 and the technical details of 2.4.1
- Sec 2.5 Edit distance
- Ch. 3, "N-gram Language Models"
- Sections 3.0-3.4
- Ch. 4, "Na?ve Bayes Classification and Sentiment"
- Except (for now) section 4.9 Statistical significance testing
- Ch. 5, "Logistic Regression"
- Except some of the technicalities of sections 5.3, 5.4, 5.5, 5.8
- Ch. 6, "Vector Semantics and Embeddings", everything except
- Not section 6.6 Pointwise Mutual Information (PMI)
- Ch. 7 "Neural Networks and Neural Language Models"
- Ch. 8 "Sequence labeling"
- Sec 8.0-8.2
- Sec. 8.4 "HMM POS tagging"
- Except 8.4.5-8.4.6 "The Viterbi Algorithm"
- Sec. 8.5 CRF
- Sec. 8.7-8.8
- Ch. 9 Deep Learning Architectures for Sequence Processing
- Sec. 9.1-9.5
- Ch. 10 Machine Translation and Encoder-Decoder Models
- Sec. 10.0, 10.2-10.4
- Ch. 18, "Word Senses and Word Net"
- Sec. 18.0-18.3
- Ch. 24, "Dialogue systems and chatbots,
- Sections 24.1-24.6
- Ch. 25, "Phonetics"
- Sections 25.1-25.5 (excluding the details not discussed in class)
- Chap 26, "Speech Recognition and ASR"
- Sections 26.1 and 26.5 (excluding the part on statistical significance)
- Ch. 3, sec. 6 Normalizing Text
- Ch. 3, sec. 8 Segmentation
- Ch. 5, sec. 1 Using a tagger
- Ch. 5, sec. 2 Tagged corpora
Wikipedia
Other:
- Garrod, S., & Pickering, M. J. (2007). Alignment in dialogue. The Oxford handbook of psycholinguistics, 443-451.
- Section 24.6 from the Dialogue chapter of 2nd edition of Jurafsky & Martin (on MDPs).
- Hovy, D., & Spruit, S. L. (2016). The social impact of natural language processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 591-598).
-
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems (pp. 4349-4357). NB: you can skip the details of the experimental design and evaluation results.
-
Ziyuan Zhong, "A tutorial on Fairness in Machine Learning", Towards Data science. NB: you can skip Section 5 of the text.
-
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). NB: you can skip Section 4 of the paper, as well as the details of the experimental design and evaluation results.
-
Chapter 2 of Domingo-Ferrer, J., Sánchez, D., & Soria-Comas, J. (2016). Database anonymization: privacy models, data utility, and microaggregation-based inter-model connections. Synthesis Lectures on Information Security, Privacy, & Trust, 8(1), 1-136. NB: You can skip the technical details on measuring information loss.