Messages
Takk for innleverte prosjektrapporter (med de mange sider!). Muntligdelen av eksamen arrangeres alts? fredag 21. juni, i rom B62 i Matematikkbygningen, og kj?replanen er som f?lger.
09:00 Kristina Haarr Vidvand
09:35 Geir W?hler Gustavsen
10:10 Torunn Heggland
10:45 Emil Huster
11:20 Emil Mogstad
11:55 Sofie S?bu?deg?rd
13:00 Trond Arild Ydersbond
13:35 Peder ?stbye
1. As agreed upon during our last lecture, I'm generously allowing 24 more hours for the exam project; rather than making the exam project available Friday June 7, it will materialise Thursday June 6. The date for delivering your report (in duplicate) is unchanged, i.e. Monday June 17 at 13:59 or earlier. Note that you will be asked to hand in also Special Page A and Special Page B (as explained in the exam set). There will also be certain extra files uploaded to this course site at the time where the exam set is made available; please report immediately to me if anything is not going right with any of these.
2. I believe I have recited the curriculum list during the lectures, along with suitable comments on its parts and details, but here is at any rate the written version. The curriculum is based on (i) the Claeskens and Hjort book (Cambridge, 2008); (ii) all exercises solved and discussed during the course (see the Course Not...
Wed May 15 we went through Exam stk 4160 June 2011 Exercises 1 and 2, discussing related issues along the way, e.g. concerning model averaging and risk functions.
I have uploaded com28c, which does the FIC and AFIC for the ldl data. Check that you understand this programme and can modify it on your own for later purposes. In particular it uses certain tricks to go through the full list of 2^q submodels in one go. The programme also fits the data to a certain heteroskedastic model which does quite a bit better than the eight models used in the Exercise 2.
Next week is Abelian (six million kroner to Deligne), but we meet Wed May 22, perhaps two hours rather than three, and go through the rest of Exam stk 4160 2011.
Today, liberation & victory day Wed May 8, we went through the most important results of issues of Sections 7.1-7.4. Next week Wed May 15 we first go through the Shakespeare and Undset exercises #1 & #2 from Exam stk 4160 2011, and then devote time for discussing issues of Section 7.4.
From Ch 7, only Sections 7.1-7.4 are active curriculum. Ch 8 are not part of the curriculum either, and Ch 9 is largely a list of worked-out illustrations of our methodology, so we're now at the proverbial oppløpsside.
Det blir undervisning Wed Apr 25, da advokatene og aktoratet og forsvarerne og den tiltalte og dommerne klarte å føre sin strevsomme sak tilstrekkelig raskt i Tingretten idag. We proceed with Ch 7, where Sections 7.1-7.5 are the active curriculum part; the rest of that chapter belongs to the cursory curriculum. A clear pensumliste will see light soon.
I've uploaded com9c, with AFIC for the pupil attendance data. cf. Exam 2009 Exercise 3. There's a weight(z) function to be played with, meant to reflect which types of pupils are deemed more important when deciding on a good final model. Also uploaded is com17c, with log-expansion models for Davis and Bøkko, cf. Exam 2009 Exercises 1-2. It does pointwise FIC for estimating the log-density.
Wed Apr 17 we discussed AFIC and rounded off Ch 6. We also went through various details related to Exam 2009 Exercise 3, (a)-(c).
Sections 6.8 and 6.10 are not active parts of the curriculum, but rather inside the "cursory curriculum" (where you should know what the sections are about, and why, but where detailed knowledge is not required).
Exercises for Wed Apr 24: First Exam 2009 Exercise 3, where we focus on the FIC point (c) and the AFIC point (e), where you are also invited to play with different importance weight functions w(z,z) for the positions (z,z) in the covariate space of (z1,z2). Then Exercise 1 from the same set, followed by its application to this follow-up question to Exercise 2(c): Use FIC to estimate the log-density log f(y0), for a range of y0 values.
Jeg er innkalt legdommer i en straffesak i Tingretten tirsdag 23.4, og vil f?rst i l?pet av den dagen vite om saken ogs&arin...
1. Wed Apr 10 we discussed various aspects and details pertaining to FIC, and also did the car sales value exercise with four different nonlinear regression models. Next week we discuss AFIC.
2. I've uploaded com22c (four models for used cars) and com24a and com24b (computing tolerance radius for the f1(y,xi,sigma,a1) model w.r.t. a2).
3. Exercises for Wed Apr 10: First Exam stk 4160 2009, Exercise #2, (a)-(c). We've done (a) earlier but the point now is the FIC point (c). Then Exercise #1 from the same exam set.
4. As agreed upon earlier, the exam project is made available Fri June 7, with reports to be handed in Mon June 17. Part II of the exam is a 30 minute oral examination, which will take place Fri June 21 [this is the corrected date; the earlier given date Thu June 20 was incorrect].
1. Wed Apr 3 we discussed the basics of FIC: the master theorem about the behaviour of muhat(S); the consequent expressions for risk functions (mean squared error, i.e. variance plus squared bias); how to estimate these. The model is selected corresponding to the smallest estimated risk. We also went through different numerical methods for calculating the tolerance level kappa(a1)/rootn for the log-linear expansion model of Exam stk 4160 2009 Exercises 1-2.
2. Exercise for Wed Apr 10: Access again the car sales data of (xi,yi), of Exam stk 4020 2012 Exercise 2. Consider the wide model where yi = m(xi)*epsi, where m(x) = exp(beta0 + beta1 x + beta2 xsquared) and the epsi are iid Gamma(c,c). Take the narrow model to have m(x) = exp(beta1 x). Fit now each of the four models corresponding to pushing beta0 and beta2 in and out. For each estimate the focus paramaters mu1 = m(x0) for x0 = 5 years and mu2 = the half-time of a car, where m(mu2) = 0.5. Comp...
1. Wed Mar 13 we discussed various Ch 5 issues, including tolerance thresholds, squared bias + variance for narrow vs. wide estimation, etc. Next week I intend to finish our discussions of Ch 5 material.
2. Exercises for Wed Mar 20 are as follows. First, for the log-linear expansion model of Exam stk 4160 2009 Exercises 1-2, find the tolerance radius for how much a2 can deviate from zero, as seen from the three-parameter model f(y,xi,sigma,a1). Your answer will depend on a1 and needs to be computed numerically. Display the required threshold(a1) curve. Second, go to the Exam stk 4020 1012 set, Exercise 2, with data (xi, yi) for used cars and their sales value. Fit (i) ordinary linear regression and (ii) the indicated nonlinear regression model to these data, along perhaps with one or two more of your own invention. Compute aic, bic, model-robust aic, and cross validation scores. For each candidate mode...
1.Wed Mar 6 we discussed Section 4.2 (which is also the only section of Ch 4 landing on the list of active curriculum) and started Ch 5 (where all sections are curriculum).
2. I have uploaded com15b, pertaining to confidence intervals for quantiles of the menarche onset distribution etc.
3. Exercises for Wed Mar 13 are as follows. First, invent one or two further models for the 3918 Polish girls, and check with AIC and BIC whether your attempts are successful compared to those already treated in com15b. Second, do Nils Exercise #9, stretching the tail of Gauss. Third, do Exam 2009 Exercise 2. For that exercise, also find tolerance radius (1) for a1, seen as extension parameter for the two-parameter normal, and (2) for a2, seen as extension parameter for the three-parameter (xi,sigma,a1) model.
1. Wed Feb 27 we discussed the basics of BIC, and also started the analysis of the prototype setup of Section 4.2. Wed Mar 6 we finish this analysis and also start Ch 5.
2. From Ch 4, only Section 4.2 is in the "active curriculum", with the other parts being cursory curriculum. For Ch 3, Sections 3.5 and 3.6 are not curriculum, and Section 3.4 is cursory curriculum.
3. I have uploaded com10b, more cross validation and skewed-normal regression for babies, and com13a, some BIC related analysis for blood groups in man.
4. Exercises for Wed Mar 6 are as follows. First, e.g. via com13a, compute exact posterior probabilities for the two models in question, and compare with results from the BIC approximation. Second, go through the material of Section 4.2 and try to reconstruct Figures 4.1 and 4.2. This takes implementation and plotting of a certain probability function and a certain risk function. Play a bit with diffe...
Regarding our statistical attempts at understanding factors influencing chances of low birthweight, here's a recent paper discussed in Aftenposten and other media today, adding coffee drinking to the list of such factors: Check "Sammenheng mellom koffein og for lav fødselsvekt" at fhi.no to find: http://www.fhi.no/eway/default.aspx?pid=233&trg=MainLeft5588&MainArea5661=5588:0:15,2659:1:0:0:::0:0&MainLeft_5588=5544:101320::1:5569:1:::0:0
1. Wed Feb 20 we discussed some basics regarding Bayesian formalism for determining Pr(model | data). This leads via further approximations to the BIC. We continue next week with more from Ch 3 and also Section 4.2; the other Ch 4 sections are "cursory curriculum" only.
2. General message A: We've been through a perhaps challenging phase regarding R programming of logL functions, how to work one's way through several competing models, Jhat and Khat matrices, etc. There may be further things to learn in the weeks ahead, but we've more or less reached the "right level of expertise". In yet other words, it should be easier sailing from now on. General message B: By necessity some of our efforts so far in the course have been associated with "getting things done", fitting models to real data with new twists & turns, etc. But this ought not to create an impression that getting one's R programmes to w...
Alltid eksamensrelevant: Nils Lid Hjort holder "pop.vit."-foredrag idag, om Matematikk i Andeby: http://foreninger.uio.no/bg/program/dag5/hjort.html Men hva er "sannynlighetsteori"?
1. Today Wed Feb 13 we discussed cross validation, the connection to model-robust AIC via Jhat and Khat, and a bit of AIC asymptotics, including the "0.157 significance level" finding. Next week we start Ch 3 with BIC etc.
2. Note that I have uploaded com8b and com9a. Go through them and make sure you understand what they do, how they work, how they may be modified and extended.
3. Exercises for Wed Feb 20: First, go once more back to the birds on island with the 8 candidate models. For these 8 models, compute AIC (again), then the TIC, and the cross-validation log-density scores xv. Second, go to the "low birthweight" data at the book's web page, and organise data in your computer so that you have y = weight of baby (in kg), x1 = age of mother, x2 = weight of mother before pregnancy (in kg, please, not pounds), x3 = indicator for smoking. Your task is to go through the 8 models corresponding to using linear normal...
1. Today Wed Feb 6 we discussed (a) the chain of arguments leading to AIC and model-robust AIC and (b) the general setup with basic results for regression models. We also went through the two "Extra exercises" and part of the Poission regression for birds exercise.
2. Note that I have uploaded com2b, com7a, com8a. Go through them & check that you understand (and may be able to modify or extend, as required) their different parts.
3. Tentative dates for our exam: (a) The Exam Project is made available Fri June 7, and reports are to be handed in Mon June 17 (which means that two weekends are included in the project writing period). (b) The oral examinations will then take place Thu June 20 and/or Fri June 21.
4. Next week I intend to round off Ch 2. I will discuss cross validation and some but not all of the applications of Ch 2. Sections 2.6 and 2.10 are "cursory curriculum" (read through it, grasp what goes o...
One more exercise for Wed Feb 6 (which I mentioned during the lecture, but forgot to include when I posted yesterday's message): For the 141 Roman era Egyptian life-times studied in Nils Exercise 5, consider the Gamma(a,b) model, and construct two approximate 90% confidence ellipses for (a,b). The first should use "only the model", involving the J matrix; the second is the model-robust version involving both the J and the K matrix. For practicalities regarding how to squeeze an ellipse out of R, consult p. 141 in the Gerda-Nils book.
Wed 30 Jan we went through more basis materical of Ch 2 (ML analysis, delta method, stadard error estimation, confidence interval constructions, sandwich matrix) and also Nils Exercise #5. Note that I have uploaded com6a which deals with the details of that exercise.
For Wed 6 Feb, access the book webpage and get hold of the "birds on islands" dataset, and do the first parts of Nils Exercise #20, involving eight different Poisson regression models. Find the eight parameter estimates of mu(x0) = E(Y | x0), for the two values of x0 given there, also supplemented with standard errors and 90% confidence intervals for mu(x0). Use both "inverse information matrix" and "sandwich matrix" (cf. page 28).
Additional exercise: Invent and hold on to a certain density g(y) on (0,infty), e.g. of the type p1 dgamma(y,a1,b1) + p2 dgamma(y,a2,b2), to be considered the true density. Then for the five models of Nils Exercise #5, find the best parame...
Wed 23 Jan we went through more basis material from Ch 2, and also spent some time on various practical details related to the use of R. We then went through Nils Exercise #2, along with relevant discussion.
I have uploaded com1a (for Exercise #2) and com2a (for Exercise #4); please inform me if the liks do not work (there was apparently an earlier error connected to com1a).
For Wed 30 Jan we do Nils Exercises #4 and 5 and also the book's Exercise 2.1. The plan is otherwise to proceed with Ch 2 material, including the lifting of the i.i.d. theory dealt with up to now to regression models.
Today, Wed 16 Jan, I gave a general introduction to the course, and also started with material related to Ch 2. You are invited to read (and re-read) Ch 1, which is inside the curriculum, but where I will not go through all of its sections. Wed 23 Jan we continue with Ch 2 material, with these key words: Kullback-Leibler distance, ML and its approximate distribution, practical model fitting in R, AIC formula.
Exercises you should go through for Wed 23 Jan: The book's Exercise 2.1. Then #2 and #4 from Nils Exercises 2011 (see the course webpage for 2011). Regarding R programming for the logL function etc., see the book's Exercise 2.3.
Welcome to the Statistical Model Selection course for the spring semester 2013. We're starting up the course Wed January 16, 10:15 to 13:00, Seminar Room 738, 7th floor.
The curriculum will be based on Claeskens & Hjort's Model Selection and Model Averaging (Cambridge University Press, 2008); check the book webpage at http://www.econ.kuleuven.ac.be/public/ndbaf45/modelselection/
You should also go to this course's webpage from the spring semester 2011 and for your convenience print out Exercises and Lecture Notes (Version E).
Nils Lid Hjort