Time and place:
The course consists of two sessions:
Tuesday 5th November, 12:15-15:00, in seminar room Prolog, Ole-Johan Dahls hus
Thursday 7th November, 12:15-15:00, in seminar room Postscript, Ole-Johan Dahls hus
Language:
English
Target audience:
UiO reseachers and students who want to get started with machine learning in Python.
A video (approximately 25 minutes) has been prepared that might be useful for those that are completely new to machine learning, with example use-cases in research.
Prerequisites:
Some familiary with Python is required (i.e. you can run python scripts from the REPL or an IDE). Basic knowledge of descriptive statistics and pandas is a plus.
Contents:
- Exploratory data analysis
- Binary classification
- Feature importance
- Multiclass classification
- Cross-validation
- Additional topics
- Preprocessing and pipelines
- Statistically comparing models
- Hyperparamater tuning
- Predicitng a continuous variable
Briefly about the course:
The focus will be on building and evaluating machine learning models in Python rather than an in-depth breakdown of specific algorithms using scikit-learn. We will be building models to distinguish between different categories of text based on linguistic features (including number of nouns, adjectives, etc.) using XGBoost.
Note: this is the equivalent of the R course using tidymodels
Important:
Participants must use their own PC or Mac (laptop) with Python (v >= 3.9).
Software requirements:
Download requirements.txt which contains all the required libraries. Install the libraries using the command below, preferably from within a virtual environment (e.g. conda, pyenv, poetry). Note: run the command from the directory/folder that contains requirements.txt
pip install -r requirements.txt
See this guide for a complete explanation on what is required, as well as suggested (optional) IDE setup (VS Code running Jupyter interactively) with conda virtual environment
Note: the python code can be run from any IDE (Spyder, Pycharm, etc).
Course material:
Click here for the whole course material (dataset, code, guide, requirements.txt)