Plans for week 38, September 19-23
Dear all, welcome back to a new week and FYS-STK3155/4155. We hope you've had a relaxing weekend.
Last week we discussed resampling methods like the bootstrap and cross-validation, as well as other statistical properties such as the central limit theorem and expectation values. Data sampling refers to statistical methods for selecting observations from the domain with the objective of estimating a population parameter. Whereas data resampling refers to methods for economically using a collected dataset to improve the estimate of the population parameter and help to quantify the uncertainty of the estimate.
Both data sampling and data resampling are methods that are required in a predictive modeling problem.
This week we will use the first lecture on Thursday to wrap up our discussion on resampling methods, with a focus on cross-validation. Thereafter, we move over to classification problems. Think of these as data sets with discrete outcomes (yes or no or more classes of outputs) instead of the continuous functions we have been dealing with till now. The first classification method we will encounter is logistic regression. As linear regression it is a purely deterministic method but serves also the aim of introducing various optimization algorithms, such as gradient descent, stochastic gradient descent and many more. After logistic regression we start with neural networks and deep learning methods (next week).
The plan this week is thus
-Lab Wednesday and Thursday: work on project 1
-Thursday: Summary of regression methods, cross-validation and discussion of project 1. Start Logistic Regression
-Friday: Classification problems and Logistic Regression, from binary cases to several categories. Start optimization methods
Reading recommendations:
-See lecture notes for week 37 on cross-validation and week 38 at https://compphysics.github.io/MachineLearning/doc/web/course.html.
-Bishop 4.1, 4.2 and 4.3. Not all the material is relevant or will be covered. Section 4.3 is the most relevant, but 4.1 and 4.2 give interesting background readings for logistic regression
- Hastie et al 4.1, 4.2 and 4.3 on logistic regression
- For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.
Best wishes to you all,
Morten et al