Beskjeder
Oral examinations are organised as follows, Thu June 11 in B91 and Fri June 12 in B81. Each examination is meant to last for about 30-35 minutes, and will involve questions from the project reports as well as other themes from the curriculum.
Thursday:
09:00 Harald Eikrem
09:45 Ingrid Hob?k Haff
10:30 Joachim Holth
11:15 Hilde Galleberg Johnsen
13:00 Navreet Kaur
13:45 B?rd ?yvind Kvaal
14:30 Rogelio Andrade Mancisidor
15:15 P?l Nordby
Friday:
09:00 Andreas Poole
09:45 Anahita Rahimi
10:30 Dag Sverre Sejebotn
11:15 Eivind Stordal
13:00 Nathalie St?r
13:45 Jiyuan Sun
14:30 Sam-Erik Walker
15:15 Gudmund Hermansen
I could have been clearer in Exercise 2, where you are to analyse the 1500-m data from the Adelskalenderen, but where I do not very explicitly mention the sample size. Please base your analysis on the n = 250 best skaters, as implied by the text accompanying Figure A, and the part of the text where I write "attempt to duplicate Figure A". This is also the very same subset (of the fuller n = 643 data set) we have used earlier on in the course.
There is unfortunately a slight typographical error in the Exam set's Exercise 1 (g) (of no further consequence): it should be a minus sign, not a plus sign, in front of the psi1 and psi2 functions in the formula for omega(y0).
The Exam Project is now ready for you; check under "Eksamen og vurderingsformer". You also need to print out and sign "Page A", the declaration form, and also to write up a "Page B" containing your own summing up of your work, including a brief self-assessment. Finally note that you need to copy the "attendance1-data" file to your computer, in connection with Exercise 3. Data for Exercise 2 are available at the Claeskens & Hjort book site; see the text for details. The deadline for delivering your report (in duplicate) is Mon June 8 at 14:00 at the latest.
You may be advised to check with the course website in a couple of days to see if there are any particular messages relating to the project. Information pertaining to the oral examinations will be placed here Mon June 8.
I wish everyone good luck & a fruitful struggle with the exam project.
I have now uploaded the Final Version of "Exercises & Course Notes", now comprising 21 exercises on 32 pages; please print out a copy for your convenience. This last version has also minor modifications regarding some of the details, compared to previous versions.
I have also updated two Nils lectures (by "public demand", as it were): one on FIC in survival analysis models (e.g. Cox and Aalen type models), and one on "Feelings in Research". These lectures are not considered curriculum for the course, though.
As previously agreed, the Exam Project exercises will be made available on this site Tue May 26, and reports need to be handed in Mon June 8, in duplicate.
There were some minor mistakes in files "com17c", "com18c", "com19a" (specifically, regarding the varr term for the narrow model). I have instead uploaded com17d, fic analysis etc. for pblack/pwhite in LogReg models; com18d, fic and afic analysis for onset of menarche; and com19b, fic and afic analysis for birds on islands, via Poisson regression models.
If you need to copy and edit from any of these programmes for the exam project, please check (a) that you have the latest version of the relevant programme; (b) that you actually understand what goes on in each part of the programme -- one often needs to carry out certain modifications and alterations, to make one programme suitable for a similar but different task.
There was a mistake in the file "com18a", which is now deleted. I have instead uploaded com18c (fic and afic analysis for onset of menarche data, using logistic regressions) and com19a (fic and afic analysis for birds on islands data, using Poisson regressions).
We will use the two final Mondays (18th and 25th of May) before the exam project to "look back", discussing central themes of the course, considering some of the "methods in action" illustrations, etc.
A final version of the "Notes & Exercises" file will be finished & uploaded shortly.
Here's a final exercise for us to go through.
F: (a) When there's only one extra parameter, from narrow to wide, show that AIC is large-sample equivalent to including \gamma if |D_n/\kappa| \ge \sqrt(2), in notation of Chs. 5 and 6. (b) Show that Pr(AIC selects wide) converges to pow(delta/kappa), where pow(u) = Pr(|N(u,1)| \ge sqrt(2)), and that the 50-50 line, where AIC selects narrow or wide with the same probability 1/2, is at |delta| = kappa c0, with c0 = 1.408. (c) Set up a simple simulation experiment as follows, intended at having about 50-50 balance between narrow and wide. The narrow model is y = beta0 + beta1 x...
1. We're on the "home stretch" of the course, and Mon 11 May I'll round off the material on "the Quiet Scandal of Statistics", from Ch 7.
2. Exercises to go through:
D: Consider the model F(y) = 1 - [1 - Phi((y - xi)/sigma)]^gamma for a cumulative distribution, where gamma = 1 corresponds to the classic normal. Compute the 3 x 3 Jwide matrix at the null model, and find the tolerance radius around the normal model.
E: Find the "birds on islands" data set from the book's web page, and let y = number of bird species, x = distance from Ecuador (in km), z1 = area [in thousands of sq km], z2 = elevation [in thousands of m], z3 = distance to nearest island [in km]. Consider the eight Poisson regression models for y in terms of keeping x protected and z1, z2, z3 open covariates. (a) Construct a table with AIC and BIC values, and comment. (b) Carry out FIC analysis for mu = E(Y|x0) for xlow = 100 and xhigh = 1300, with z1, z2, z3 kept at their average values. (...
The curriculum in our course is as follows.
1. From the Claeskens and Hjort (2008) book:
Ch 1: all. Ch 2: all, but 2.6 and 2.7 are "cursory". Ch 3: 3.1, 3.2, 3.3. Ch 4: 4.2. Ch 5: all. Ch 6: all, apart from 6.8 and 6.10, and 6.7 is cursory. Ch 7: 7.1, 7.2, 7.3, 7.4. Ch 9: all.
2. In addition, all exercises we have been through during the course are defined as inside the curriculum; cf. "Exercises & Course Notes".
For Mon 5 May, I intend to sum up Ch 6 and to go through the final technical part of our curriculum, from Sections 7.1-7.4, also about "the Quiet Scandal of Statistics".
For the exercise part, work through the following, related to the "Onset of menarche" data set from the book's webpage (see Example 6.2). The candidate models we shall consider are logistic regression models of polynomial order 1, 2, 3, 4, i.e. from the narrow model's p(x) = H(beta0 + beta1 z) to the wide's model p(x) = H(beta0 + beta1 z + beta2 z^2 + beta3 z^3 + beta4 z^4), with H the logistic transform. Also, we transform from x to z = x - 13.0 for numerical stability, as in the book. (a) Compute AIC and BIC scores. (b) Carry out FIC analysis for focus parameter mu = p(x0), with x0 = 11 yr and x0 = 15 yr. (c) Carry out AFIC analysis for mu = p(x0), with x0 taking on 25 values uniformly spread from 11 yr 0 mnths to 13 yr 0 mnths, and with each such x0 hav...
1. For Mon 27 April, we continue our FICology studies, i.e. from Ch 6. We shall also work through the following exercises:
B: The one listed below, in the 31.03.2009 message.
C: Going back to Exercise 2, with n = 250 simulated points from a certain nonlinear regression structure, let the list of candidate models be those corresponding to polynomial regressions of order 2, 3, 4, 5, 6 (with narrow = order 2 and wide = order 6). Carry out FIC analysis for two estimands, (a) mu = E(Y|x0) for x0 = 0.75; (b) mu = prob(Y > y0 | x0) for x0 = 0.75 and y0 = 3.0.
2. There will rather soon be a detailed curriculum list and a further updated version of my "Course Notes & Exercises".
3. I have uploaded two more R script files: com13a (computing J and kappa for the model that stretches Gauss's tail, Exercise 9) and com17c (FIC analysis for eight logistic regression models, with omega calculation for mu = p(small|black) / p(small|white).
1. God p?ske to everyone! So there's no teaching Mon 6 or Mon 13 April.
2. For Mon 20 April, we continue our Ch 6 efforts. Work through the following exercises:
A: For the 189 babies & mothers, let as before x1 = 1, x2 = mother's weight (in kg) before pregnancy, z1 = age, z2 = 1(race==2), z3 = 1(race==3), with race being 1 (white), or 2 (black), or 3 (other). Keep x1, x2 protected and z1, z2, z3 open, with 2^3 = 8 submodels. For each focus parameter mu (given in a minute): compute all eight estimates; estimate the mse; compute the FIC score; give a FIC plot (with FIC or estimated mse on x axis and estimates on y axis). For mu, take (i) probability pwhite of low birthweight, for white mother, age 33, weight 55; (ii) same probability pblack, but for black mother; (iii) the ratio pblack/pwhite.
B: For the n = 250 best speedskaters on the Adelskalenderen, with results y1, y2, y3, y4, we study linear regressions...
1. There is now a further extended Course Notes and Exercises, version D, comprising 17 pages; please print out for your convenience.
2. For Mon 30 March, work through Nils Exercises 10, 9, 8 (in approximately that order of priority).
3. I have uploaded pdf version of Magne Aldrin's two lectures.
4. For Mon 30 March I sum up Ch 5 and start on Ch 6.
For Mon 23 March, work through Exercises 5.1, 5.7(c), 5.8(a). For the Poisson stretching exercise, compute the required kappa and tolerance radius kappa/rootn for different situations corresponding to theta being equal to 3, 10, 25, 100.
We continue working with Ch 5, in particular Sections 5.4 and 5.7.
For Mon 16 March, please work through Exercises 5.7 (a), (c), 5.8 (a), (b).
We also continue working with Ch 5.
1. I have uploaded an extended version C of "Exercises & Course Notes" (now ten pages); please print out for your convenience. I have also uploaded R script files com3b (five models for nerve impulse data, cf. Nils Exercise 5) and com7a (seven models for 1500-m prediction via the Adelskalenderen data, cf. Nils Exercise 6).
2. For Mon 9 March, work through Nils Exercise 6 (with regression models admitting heterogeneous variances), and Example 3.11, where you instead of the DIC used there are to use BIC-exact and BIC. Compare these scores, and compute posterior model probabilities for the three models treated there.
3. Mon 9 March we start attacking Ch 5.
1. For Mon 2 March, we finish Exercise 3.1 (including the Beta envelope model), and also the following exercise: Download and organise the "Adelskalenderen men 2006" speedskating data set from the book's website, creating data vectors y1, y2, y3, y4, the personal best times, in seconds, for the best n = 250 speedskaters of the world, over the four classic distances 500 m, 1000 m, 5,000 m, 10,000 m. The task is to predict y2 from information about the skater's personal bests on the other three distances, i.e. y1, y3, y4. Go through the seven linear regression models 1, 3, 4, 13, 14, 34, 134, where e.g. 14 means the model that uses y1 and y4 as covariates. Compute AIC and BIC scores. Also produce a convenient table, summarising the most important aspects of each of the seven models. What is your best model, for the present purpose?
2. We have started Ch 3, which I plan to finish by Mon 2 Mars, including a brief going through of Section 4.2. The DIC is in th...
The exam for this course is as announced a combination of (1) an exam project, leading to a written report for each candidate, and (2) a thirty minute oral examination. We need to finalise exam dates, and the first iteration in this calendaric exercise is as follows:
Exam project is made available Tue May 26, with deadline for exam reports Mon June 8. Then oral examinations are held Thu and Fri June 11 and 12.
Please report to Nils immediately if these dates involve practical difficulties of any kind.
1. For Mon 23 Feb, work through (moderately extended versions of) Exercises 2.4 and 3.1.
For 2.4, re-do the analysis of Example 2.4, also supplementing the AIC with TIC numbers, involving explicit computation of Jn and Kn matrices, for each of the 2^3 = 8 models. Use glm(y ~ X, family = binomial) for the logistic regressions. Finally re-do all of this with focus on the bigger babies rather than the smaller babies, using y = 1 if birthweight is 3600 g or more.
For 3.1, include the "Beta envelope model" that has density f(x) = be(exp(-x), a, b) exp(-x). For each model, compute estimates of p = Pr(X > 0.333), with standard deviation. Here be(x, a, b) is the Beta(a, b) density at x.
2. For Mon 23 Feb I plan to use one hour to round off Ch 2, after which we start on Ch 3.
1. For Mon 16 Feb, work through Exercises 2.1, 2.4, 3.1. The nerve impulse data set (of size n = 799 measurements) for the latter exercise may be found here: http://www.stat.ncsu.edu/sas/sicl/data/nerve.dat . Use both AIC and BIC to assess the exponential, the Gamma and the Weibull models, perhaps supplemented by one of your own choice or construction. Also, present point estimates and 90% confidence intervals for the probability that a random nerve impulse needs more than 0.333 seconds.
2. Sections, 2.6, half of 2.7, 2.10 are defined as cursory curriculum, but the "bootstrap-AIC" of Section 2.7 is considered as belonging to the ordinary curriculum. Section 2.8 is also inside, but I choose to postpone going through that material until we start on Ch 5 (also to get us to Ch 3 more quickly).
For Mon 16 Feb I plan to go through Section 2.9 plus relevant parts of 2.7 and 2.10.
3. I have uploaded the R script...
Notate bene:
1. Unfortunately the Mon 2 Feb teaching for our course is being cancelled (as I am at a funeral). Please do not stop your invidivual momentum; use the time for working through Exercise 4 and the book's Exercise 2.1 carefully, along with what remains of Exercise 1.
2. Also read ahead in the book's Sections 2.2, 2.3, 2.4, enough for you to either complete or at least know what to do in Exercise 2.4, which will be worked through for Mon 16 Feb.
3. Mon 9 Feb we are back on track, with lectures from Sections 2.2--2.4 and us working through Exercises 4 and 2.1.
4. Pdf slides from Magne Aldrin's two lectures will soon be uploaded to the course web site.
Notate bene:
Due to unfortunate circumstances I cannot lecture Mon 26 Jan. In order for us and the course not to lose time & momentum I have arranged for Magne Aldrin (senior research statistician at NR) to give two special lectures on this Monday, 12:15 - 14:00. These will be considered an integral part of the course. These two topics are, briefly, as follows; the first pertains to model building and the second to cross validation, both themes we shall be returning to later in the course.
Theme 1: Modelling of fish disease spread between aqua culture plants [oppdrettsanlegg].
Theme 2: Cross validation. (See also Section 2.9.)
1. Please print out a copy of the "Course Notes & Exercises" (where there is an updated Version B of 19 Jan 09) for your convenience & use. For Mon 26 Jan, work through Exercise 4; we will also spend a little time completing Exercise 1.
2. I will upload various "com" files, of R script programme type, to the website; check "com1c" and "com1d" that pertain to Exercise 2, the running of ten regression models etc.
3. The lectures today were in correspondence with Sections 2.1-2.3, involving ML theory and the AIC formula. We go further with this next week, including a derivation of the AIC.
1. Please print out a copy of "Course Notes & Exercises" for your own convenience & use. The current version is a preliminary one -- later and extended versions will be uploaded as the course progresses.
For Mon 19 Jan, please work through as much as you manage of Exercises 1 and 2.
2. The lectures today were devoted to a broad, general introduction to the course and its themes, including a quick going-through of some of the material of Ch 1. Next week we start attacking Ch 2.
3. The curriculum list is not yet finalised, but to a first order of approximation it will consist of Chapters 1, 2, 3, 5, 6, 9 of the Claeskens & Hjort book.