Report writing and new deadline project 2
Hi all,
this is just a quick reminder that the final deadline for project 2 is now set to Friday the 18th (midnight). We pushed it from last Friday to Wednesday and then finally to Friday this week.
Also, feel free to come with suggestions for project topics for project 3 to be presented at Friday this week.
We will discuss this also during the lecture on Thursday.
Finally, here are some general observations from us about project 1. Hopefully these remarks can be of use when you wrap up the report for project 2 and work on project 3 as well.
Best wishes to you all,
Morten et al.
////. Comments about project 1
Summary after corrections:
* Many of you have written very nice codes! Thx, this part is very good. And there were many excellent results.
* However, many of you are not used to write scientific reports. Here are some inputs which may help with project 2 and project 3.
* Given the current grading form: focus on abstract, introduction, conclusion, and references. You potentially loose more points on this than If you don’t have time to finish the analysis or code. Please read again the structure we wish you to implement at https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md
* If you go through each exercise and answer them individually, you’re report will not be a good read. Try to find an overarching question: what am I going to answer in the project? For instance: can linear regression models be used to describe a topological surface? Keeping this question in mind would give you a better abstract, introduction, and conclusion.
* As a continuation of this: what you want to answer guides how you treat your data, e.g. scaling, downsampling etc. If you want to see if a linear regression model can describe a surface, then it doesn’t make sense to downsample to a degree were you don’t see the structures of the landscape.
* During the course you will learn that it’s not always easy to answer what is the best model. Should you use MSE, AUC or other metrics? And a separate question is: is the best model a good model? You’ll have to keep the original research question in mind, and try to answer wether the best model does a good job. We always recommend plotting the data and the model side by side, or with a ration plot. This communicates the goodness of fit much better than the MSE.
* Introduction: think in 3 paragraphs/section: 1) motivation/something general on ML, LR, 2) What are you investigating, 3) summary of essay
* Abstract: should contain the results, and the goodness of the results, i.e. what model gave the best results, quantified with MSE or relative error etc. And a conclusion on wether it is a good model.
* When you compare various metrics for score and use cross-validation, it is important to keep in mind that results may vary as function of the number of folds. It is important to comment upon that.The cross-validation gives you a way to estimate a given quantity and is meant to narrow down the spread in values. Idem for boostrap.
your cross-validation code with the one you got from your bootstrap code. Comment your results. Try 5 ? 10 folds.
* Many of you have spent a lot of time on your codes. And most codes are very good. To ease the reading however, try to include a good README to help the reader navigate through the code or rerun it. Suggestion: document your code while coding so it will be easier to make a readable and understandable README.
* you should comment all results you present. If you include a plot or a table, you should comment what they entail. If not, you may consider to not include them in the report. If you add additional results in an appendix, have figure/table captions and add some explanatory text. A raw dump of figures is not very meaningful.