Applied data analysis and statistical modeling for a kaggle-like competition or dataset
Within this project applied data analysis and predictive modeling will be carried out. A student is allowed to choose a competition or a data-set of interest for him/her on one of the popular data science platforms*: kaggle, topcoder or uci. Then preliminary data analysis should be performed, followed by careful statistical modeling, inference and eventually evaluation of predictions and explaining the results. The final report should be delivered in latex and should include description of the data and problem, the choice and specification of an appropriate statistical model, evaluation of the model in terms of predictions and (if applicable) explanations of the model.
*Note that it is important that the data is allowed to be freely used for research and educational purposes (stays in the description of the dataset). That can be checked with your supervisor. For example in a Titanic competition, it is explicitly stated:
DATA ACCESS AND USE: Competition Use and Academic, Non-Commercial Use Only
Published Jan. 31, 2022 5:41 PM
- Last modified Sep. 4, 2023 11:29 AM