Background
Natural Language Processing is an interdisciplinary discipline building on insights from various fields including
- Language and Linguistics
- Computer Science in general and programming in particular
- Statistics
- Machine Learning and "Data Science"
Students who come to this class have different backgrunds. Some are familiar with some of the fields, others are familiar with different fields. We will try to cover much of the background material - but not all. You might have to read some on your own. What we will cover in class will be adepted to what can be assumed from the first year master students in Informatics: Language and Computation, since this course is mandatory for these students.
Here is some more on assumed background and recommendations on what to read.
Language and linguistics
You have to be familiar with some core concepts of linguistics, like "parts of speech" and "sentence structure". If you have not taken any courses in linguistics or NLP/Computational Linguistics you should consult some of the following.
- Chapter 3, "Linguistic Essentials", p. 81-115, in Manning and Schütze: Foundations of Statistical Natural Language Processing. This is the best overview for what will be assumed in the course. Unfortunately, the book is not online, but you find it in the library.
- You are recommended to acquire Jurafsky and Martin, Speech and Language Processing, anyhow. The sections 3.1 + 12.1-12.3 introduce some of the key concepts of morphology and syntax.
- You are also recommended to read sections 8.1-8.3 in the NLTK book: Natural Language Processing with Python, by Bird, Klein and Loper
Programming in Python
We assume you have programming experience from some language(s), and that if you have no experience with Python, you're able to catch up. We have given some advices here, and will go more into details in the first group session.
Statistics
Since we don't presuppose any background in Statistics, we will give a crash course in the three first lectures. Do you need a book on statistics? We will cover all the concepts on the slides, so a book is not strictly required. But it could be useful with some more explanations and examples than what we reach to cover in class.
- If you already own a book on statistics, that will probably suffice, e.g. the STK1000 book, Moore and McCabe, Introduction to the Practice of Statistics.
- I like Gonnick and Smith's, The Cartoon Guide to Statistics. It is mostly drawings - not too many words, but it covers the essentials.
- Statistics in a Nutshell by Sarah Boslaugh covers what we we need in not too many pages, and in roughly the same order as we will present the material.
- There are several free book and courses on statistics on the internet - I don't have any particular recommendations.
- Last time I gave the course some students recommended Khan academy
Week to week
1 week, 17 Aug |
IntroductionRecommended reading
Looking at dataRecommended reading
The following each cover the lecture
|
2 week, 24 Aug |
ProbabilitiesRecommended readingThe following each cover the lecture
|
2 week, 27 Aug |
Lab: Python and NLTKMandatory readingNatural Language Processing with Python (=NLTK book)
|
3 week, 1 Sep |
StatisticsPresentation (corrected 18 Sept.) Recommended readingThe following each cover the lecture
It is also a good idea to repeat the parts from INF1080 Logic on "Kombinatorikk" |
3 week, 3 Sep |
Exercises on whiteboardMoved to room Java! |
4 week, 10 Sep |
Working with textsMoved to room Java!Mandatory reading
|
5 week, 14 Sep |
Classification, evaluation and more statistics - mostly statisticsPresentation (corrected 18 sept) Mandatory reading
Recommended readingParts of the following each cover (most of) the statistical part
|
5 week, 17 Sep |
Lab |
6 week, 21 Sep |
Classification, evaluation and more statistics, contd.Mandatory reading
Recommended reading
|
6 week, 24 Sep |
Lab |
7 week, 28 Sep |
Information extractionMandatory reading |
7 week, 1 Oct |
Lab
|
8 week, 5 Oct |
Dependency GrammarPresentation (screen, handout) Mandatory reading
|
8 week, 8 oct |
Reading group |
9 week, 12 oct |
Dependency parsingPresentation (screen, handout) Mandatory reading
|
9 week, 15 oct |
LabMini-lecture on experimental methodology Oblig 3a |
10 week, 19 oct |
Semantic rolesPresentation (screen, handout) Mandatory reading
|
10 week, 22 oct |
No group |
11 week, 26 Oct |
Semantic Role LabelingPresentation (screen, handout) Mandatory reading
|
11 week, 29 Oct |
Lab |
12 week, 2 Nov |
Machine Learning in NLP, Logistic RegressionMandatory reading |
12 week, 5 Nov |
Lab |
13 week, 9 Nov |
Statistical significance, chi square, collocations
Mandatory reading
|
13 week, 12 Nov |
No group |
14 week, 16 Nov |
More on collocations, feature selection and maximum entropy |
14 week, 19 Nov |
Lab |
15 week, 23 Nov |
No class |
16 week, 26 Nov |
No class |
17 week, 30 Nov |
Exercises from earlier examsRoom Java!
|