Weekly update for week 39

Hi all,

here follows our weekly update fr FYS-STK3155/4155. Last week

we went through logistic regression (chapter 4.4-4.5 of Hastie et al.)

and since our cost function is defined by a log function that depends

on the parameters of the model β, we need to find the minima

numerically. We started thus discussing various gradient methods and

will continue this week with stochastic gradient methods. Much of

this material is not well covered by the book of Hastie et al and thus

most of the material will be covered by the lectures slides at

https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html

Also, the text of Murphy at

https://github.com/CompPhysics/MachineLearning/blob/master/doc/Textbooks/MachineLearningMurphy.pdf

contains a better discussion in chapter 8.3. Chapter 8 of Murphy's

text has also a good discussion of logistic regression. I am not too

pleased with chapter 4 of Hastie et al.

This week we will wrap up our discussion of gradient methods on Thursday. Before we move on to neural networks,

we will also present another popular and simple to implement algorithm for classification (and regression as well),

namey the so-called k-nearest neighbors algorithm.

Thereafter we start with neural networks. This will also be the main topic for project 2.

For project 1, the textbook of Hastiee et al is not optimal when it

comes to relate the MSE as an assessment method with the bias and the

model variance. If we define the data with yi (for a given set

of measurements i=1,2,…,n and our model as fi we have

from our linear regression discussion Undefined control sequence \epslion, where

?i is assumed to be normal distributed. Often we don't

know what it looks like.

The MSE is defined as

MSE=1/n∑ni=1(fi?yi)2. Adding and subtracting the mean value the model f???=1/n∑ni=1fi we have

MSE=1/n∑ni=1(fi?yi+f????f???)2=1/n∑i(fi?f???)2+1/n∑i(yi?f???)2+2/n∑i(yi?f???)(f????fi).

The first term on the rhs is the variance Var(f)=1/n∑i<pclass="p1">(fi?f???)2, the second term is the bias Bias=1/n∑i(yi?f???)2.

The last term is zero only if ∑iyi?i is zero (see the lecture slides for more info).

It means that, after resampling with either cross-validation or

bootstrap you should evaluate the MSE and compare this with the

variance and the bias. They should be equal or almost equal when you

add up the variance and the bias. For the project it may be useful to

make a plot similar to figure 2.11 of Hastie et al where complexity is the order of the polynomial.

For the terrain data, depending on the size of your area, it may be difficult to fit a polynomial with good MSE if the area

is too large. Try to make the area smaller. We will post some code examples for this later.

In the meantime, best wishes to everybody and see you at the lab tomorrow.

Bendik, Kristine, Morten and ?yvind

Publisert 25. sep. 2018 10:59 - Sist endret 25. sep. 2018 10:59