Weekly update for week 39
Hi all,
here follows our weekly update fr FYS-STK3155/4155. Last week
we went through logistic regression (chapter 4.4-4.5 of Hastie et al.)
and since our cost function is defined by a log function that depends
on the parameters of the model β, we need to find the minima
numerically. We started thus discussing various gradient methods and
will continue this week with stochastic gradient methods. Much of
this material is not well covered by the book of Hastie et al and thus
most of the material will be covered by the lectures slides at
https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html
Also, the text of Murphy at
https://github.com/CompPhysics/MachineLearning/blob/master/doc/Textbooks/MachineLearningMurphy.pdf
contains a better discussion in chapter 8.3. Chapter 8 of Murphy's
text has also a good discussion of logistic regression. I am not too
pleased with chapter 4 of Hastie et al.
This week we will wrap up our discussion of gradient methods on Thursday. Before we move on to neural networks,
we will also present another popular and simple to implement algorithm for classification (and regression as well),
namey the so-called k-nearest neighbors algorithm.
Thereafter we start with neural networks. This will also be the main topic for project 2.
For project 1, the textbook of Hastiee et al is not optimal when it
comes to relate the MSE as an assessment method with the bias and the
model variance. If we define the data with yi (for a given set
of measurements i=1,2,…,n and our model as fi we have
from our linear regression discussion Undefined control sequence \epslion, where
?i is assumed to be normal distributed. Often we don't
know what it looks like.
The MSE is defined as
MSE=1/n∑ni=1(fi?yi)2. Adding and subtracting the mean value the model f???=1/n∑ni=1fi we have
MSE=1/n∑ni=1(fi?yi+f????f???)2=1/n∑i(fi?f???)2+1/n∑i(yi?f???)2+2/n∑i(yi?f???)(f????fi).
The first term on the rhs is the variance Var(f)=1/n∑i<pclass="p1">(fi?f???)2, the second term is the bias Bias=1/n∑i(yi?f???)2.
The last term is zero only if ∑iyi?i is zero (see the lecture slides for more info).
It means that, after resampling with either cross-validation or
bootstrap you should evaluate the MSE and compare this with the
variance and the bias. They should be equal or almost equal when you
add up the variance and the bias. For the project it may be useful to
make a plot similar to figure 2.11 of Hastie et al where complexity is the order of the polynomial.
For the terrain data, depending on the size of your area, it may be difficult to fit a polynomial with good MSE if the area
is too large. Try to make the area smaller. We will post some code examples for this later.
In the meantime, best wishes to everybody and see you at the lab tomorrow.
Bendik, Kristine, Morten and ?yvind