Access to Saga
Our reference computing environment will be the Norwegian national Saga supercluster. All course participants will be granted access to Saga, where some 10,000 CPUs, compute nodes with up to one terabyte of main memory, and a (small) rack of massively parallel GPUs are available. To register your UiO account for Saga usage, please submit an on-line form, where you should ask for association with Notur project NN9851K (Language Technology Group; LTG) until June 30, 2022. It will usually take a day or two before account activation is complete, and you will receive status updates by email and text messages.
Once you have received confirmation of account activation on Saga, you need to connect using ssh (e.g. from the Linux command-line, or any suitable secure shell client), e.g.
ssh saga.sigma2.no
This will establish an interactive session on one of the Saga login nodes, which we can use for development, debugging, and testing. In a nutshell, moderate computation is fair game on the interactive login nodes, where we interpret ‘moderate’ as, say, using at most a handful of cores, up to 16 gigabytes of main memory, and run-times best measured in minutes.
If you feel there is a need to familiarize yourself with working in Linux command line environment, we recommend reading this tutorial and/or taking some on-line Linux basics course. The Missing Semester of Your CS Education by MIT is another good course on mastering the command line and other basics.
Python Modules
Python 3 is the main programming language for this course. Once logged into Saga, there is an NLP-specific repository of the relevant Python 3 add-on modules. Most of them are "branded" with NLPL, that is, "Nordic Language Processing Laboratory". To activate this environment, one needs to execute the following commands:
module purge module use -a /cluster/shared/nlpl/software/eb/etc/all/ module load nlpl-nlptools/2021.01-gomkl-2019b-Python-3.7.4 module load nlpl-gensim/3.8.3-gomkl-2019b-Python-3.7.4 module load nlpl-pytorch/1.7.1-gomkl-2019b-cuda-10.1.243-Python-3.7.4
We recommend you add the above lines to your personal .bashrc configuration file in your home directory. You can check that you have a sane working environment by issuing the commands above and then running our sanity test script. It will try to import all the necessary Python packages. In case the test it produces some warnings or errors, send a question to our collective mailbox, or raise an issue in our UiO GitHub repository.
List of all modules available in the NLPL Virtual Laboratory
Please do not try to install any Python modules locally to your user directory on Saga. This may conflict (in subtle, intransparent ways) with the environment we have prepared for the course. In other words, make sure that the ~/.local/ directory inside your home directory on Saga is empty (unless you are absolutely sure you know what you are doing).
Saga Foundations
Computations that will run for more than a few minutes, require multiple CPUs or very large amounts of memory, we need to be submitted through the Saga queue management system, using SLURM files. To learn how to deal with this system, read the Saga job system overview and the Saga Getting Started Guide. We will have a quick workshop on Saga technicalities during one of the first group sessions. We also provide an example of a SLURM file, which you can use as a template.
Recommended Editors
You can either develop your code on your computers and then copy it to Saga for larger runs, or you can work with the code on Saga itself. There is no shortage of text editors with various levels of Python support, and we will respect whatever choices you make. Most of the course teachers are fond of Vim editor, and will be happy to assist in getting maximum productivity in these environments.
If you need help in working with Vim, run the vimtutor command. An example of Vim configuration for convenient working with Python code can be found in the repository. Simply copy this file to your home directory on Saga, if you do not yet have a file by that name already, or merge the contents of our file into yours.