Aim
Learn how to build machine learning models in R (using tidymodels), interpret them, and how to 'improve' model evaluation using cross-validation.
Content
The two algorithms that will be used as examples are linear discriminant analysis (LDA) and XGBoost. Important: the focus will be on building and evaluating machine learning models in R rather than an in-depth breakdown of specific algorithms. We will be building models to distinguish between different categories of text based on linguistic features (including number of nouns, adjectives, etc.)
- Exploratory data analysis
- Binary classification
- Feature importance
- Multiclass classification
- Cross-validation
- *Extra (if enough time)*
- Hyperparameter tuning
- PCA
- Cluster analysis
Target audience
This "workshop" is for UiO-affiliated students or researchers that are comfortable with using R and would like to learn more about machine learning (classification), how it can be used in research, but do not have a strong mathematical or data scientific background. Basic knowledge of descriptive statistics is a plus, and some knowledge of the tidyverse is preferable, as the main package used for this course (tidymodels) is based on tidyverse principles.
Duration
2 x 3 hours
Signing up
The course is full, but to be put on the waiting list, sign this form
Important: Participants must use their own PC or Mac (laptop) with both R and RStudio installed. B?de R (≥ 3.3.0) and RStudio are free and do not require a liscence. R can be installed from https://cran.r-project.org and RStudio from https://www.rstudio.com/products/rstudio/download/.
Contact IT-support from your faculty or department if you need help with installation. You can use UiO Programkiosk ("Statistikk fullskjerm") if it is not possible to install either R or RStudio on your own computer.
Install the following packages in R(studio) before the start of the course:
tidyverse, tidymodels, discrim, mda, xgboost, vip, patchwork
*extra packages* doParallel, factoextra
Number of participants
30
Language
The course will be held in english
Instructor
Luigi Maglanoc PhD
Contact information
If you have any questions about the course, send us an email: statistikk@usit.uio.no
Links to course material
- Dataset
- R-code (to be added)