IN4080 - Autumn 2021

Syllabus

The syllabus consists of

Weekly slides
Weekly exercises
Mandatory assignments
Readings

The detailed syllabus for each week is listed under each week page.

This is an overview of the mandatory readings, so far. For recommended readings, exercises etc. see the weekly pages.

Jurafsky and Martin, Speech and Language Processing, 3. ed. (edition of 21 Sept 2021!)

For chapters 1-6, there are only smaller corrections in the 2021 edition compared to the edition of 30 Dec. 2020
Ch. 2 Regular expressions, etc.
- Sec. 2.0
- Sec. 2.2 Words
- Sec. 2.3 Corpora
- Sec. 2.4 Normalization, except 2.4.3 and the technical details of 2.4.1
- Sec 2.5 Edit distance
Ch. 3, "N-gram Language Models"
- Sections 3.0-3.4
Ch. 4, "Na?ve Bayes Classification and Sentiment"
- Except (for now) section 4.9 Statistical significance testing
Ch. 5, "Logistic Regression"
- Except some of the technicalities of sections 5.3, 5.4, 5.5, 5.8
Ch. 6, "Vector Semantics and Embeddings", everything except
- Not section 6.6 Pointwise Mutual Information (PMI)
Ch. 7 "Neural Networks and Neural Language Models"
Ch. 8 "Sequence labeling"
- Sec 8.0-8.2
- Sec. 8.4 "HMM POS tagging"
  - Except 8.4.5-8.4.6 "The Viterbi Algorithm"
- Sec. 8.5 CRF
- Sec. 8.7-8.8
Ch. 9 Deep Learning Architectures for Sequence Processing
- Sec. 9.1-9.5
Ch. 10 Machine Translation and Encoder-Decoder Models
- Sec. 10.0, 10.2-10.4
Ch. 18, "Word Senses and Word Net"
- Sec. 18.0-18.3
Ch. 24, "Dialogue systems and chatbots,
- Sections 24.1-24.6
Ch. 25, "Phonetics"
- Sections 25.1-25.5 (excluding the details not discussed in class)
Chap 26, "Speech Recognition and ASR"
- Sections 26.1 and 26.5 (excluding the part on statistical significance)

NLTK Book

Ch. 3, sec. 6 Normalizing Text
Ch. 3, sec. 8 Segmentation
Ch. 5, sec. 1 Using a tagger
Ch. 5, sec. 2 Tagged corpora

Wikipedia

Other:

Garrod, S., & Pickering, M. J. (2007). Alignment in dialogue. The Oxford handbook of psycholinguistics, 443-451.
Section 24.6 from the Dialogue chapter of 2nd edition of Jurafsky & Martin (on MDPs).
Hovy, D., & Spruit, S. L. (2016). The social impact of natural language processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 591-598).
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems (pp. 4349-4357). NB: you can skip the details of the experimental design and evaluation results.
Ziyuan Zhong, "A tutorial on Fairness in Machine Learning", Towards Data science. NB: you can skip Section 5 of the text.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). NB: you can skip Section 4 of the paper, as well as the details of the experimental design and evaluation results.
Chapter 2 of Domingo-Ferrer, J., Sánchez, D., & Soria-Comas, J. (2016). Database anonymization: privacy models, data utility, and microaggregation-based inter-model connections. Synthesis Lectures on Information Security, Privacy, & Trust, 8(1), 1-136. NB: You can skip the technical details on measuring information loss.

Published Aug. 18, 2021 12:16 PM - Last modified Dec. 2, 2021 2:28 PM