IN4080 - Natural Language Processing

Computing setup

Although the obligatory assignments for IN4080 can all be completed on a personal computer, some of the tasks will be completed much faster if you run your code on Fox, a high-performance computing cluster (and as a side product, you will learn how to practically work on such clusters - an important technical skill for AI practitioners!)

Access to Fox

Our reference computing environment will be the Fox high-performance computing cluster belonging to the University of Oslo. All course participants will be granted access to Fox, where about 3,000 CPU cores, compute nodes with up to 512 GByte of main memory, and a (small) rack of massively parallel GPUs are available. To register for Fox usage follow the steps on this page. You will apply for membership in project ec403.

If you have ID-Porten (Norwegian BankID or similar), you can register on your own. If you don't, we have to sign you up manually. In this case, please write an email to yves.scherrer@ifi.uio.no with the following information:

Full name
Birth date
Passport or ID number
Country issuing the passport/ID

For the end date of your membership to the project, you should indicate the following:

If you are a master student in language technology, you can set the end date to June 2026 (as you might use Fox for your master thesis work)
If you are not a language technology master student, but are taking the IN5550 course next semester, you can set the end date to June 2025 (as IN5550 will also use Fox).
Otherwise, you can set the end date to December 2024.

It will usually take a day or two before account activation is complete, and you will receive status updates by email and text messages.

Running jobs in your web browser

Once you have received confirmation of account activation on Fox, you will be able to run (small) jobs on Fox as Jupyter notebooks in your web browser. For this, you should use the EduCloud OnDemand web service:

Login to EduCloud OnDemand using your Fox credentials.
In the app dashboard, choose "Jupyter"
On the next page, make sure your project is "ec403". For the resources, you should select "GPU" with 64 Gb RAM if you are plan to run an LLM. Otherwise, for smaller processing tasks that do not require a GPU, a "small" or "medium" allocation should be sufficient.
In the runtime field, indicate the maximum duration for your allocation. Pay attention that you will be automatically disconnected once you reach the end time of your allocation. In other words, it is better to put a duration that is a bit longer than the time you expect to use, in order to avoid losing the allocation in the middle of your work (you can always release the allocation if you finish early).
Select the Jupyter module "JupyterLab/4.0.3-GCCore-12.2.0" and include the following additional module path: "/fp/projects01/ec30/software/easybuild/modules/all/" (this is required to be able to load NLPL modules).
In the "additional modules" field, you should specify the Fox modules you want to load on the node. In your case, the following should be what you need:

nlpl-pytorch/2.1.2-foss-2022b-cuda-12.0.0-Python-3.10.8 nlpl-nlptools/02-foss-2022b-Python-3.10.8 nlpl-sentence-transformers/3.1.1-foss-2022b-Python-3.10.8.lua nlpl-accelerate/0.27.2-foss-2022b-Python-3.10.8 nlpl-bm25s/0.2.1-foss-2022b-Python-3.10.8

Press "Launch" and wait until the job status changes to "running" (might take a minute or less) and press "Connect to Jupyter".
Et voila! You are in a Jupyter Lab with easy access to your home directory on Fox, and you can run any Jupyter notebooks: either located on Fox, or uploaded via the browser.

Alternative: Using SSH

You can also connect to Fox using ssh (e.g., from the Linux command line or any suitable secure shell client), e.g.,

 ssh fox.educloud.no

This will establish an interactive session on one of the Fox login nodes, which we can use for development, debugging, and testing. In a nutshell, moderate computation is fair game on the interactive login nodes, where we interpret ‘moderate’ as using at most a handful of cores, up to 16 gigabytes of main memory, and run-times best measured in minutes.

Once logged into Fox, there is an NLP-specific repository of the relevant Python 3 add-on modules. Most of them are "branded" with NLPL, that is, "Nordic Language Processing Laboratory". To activate this environment, one needs to execute the following commands:

module purge
module use -a /fp/projects01/ec30/software/easybuild/modules/all/
module load nlpl-pytorch/2.1.2-foss-2022b-cuda-12.0.0-Python-3.10.8 nlpl-nlptools/02-foss-2022b-Python-3.10.8 nlpl-sentence-transformers/3.1.1-foss-2022b-Python-3.10.8.lua nlpl-accelerate/0.27.2-foss-2022b-Python-3.10.8 nlpl-bm25s/0.2.1-foss-2022b-Python-3.10.8

We recommend you add the above lines to your personal .bashrc configuration file in your home directory. You can check that you have a sane working environment by issuing the commands above and then running our sanity test script. It will try to import all the necessary Python packages. If the test script produces some warnings or errors, contact us.

Please do not try to install any Python modules locally to your user directory on Fox. This may conflict (in subtle, non-transparent ways) with the environment we have prepared for the course. In other words, make sure that the ~/.local/ directory inside your home directory on Fox is empty (unless you are absolutely sure you know what you are doing).

Computations that will run for more than a few minutes, require a GPU, multiple CPUs, or very large amounts of memory, must be submitted through the Fox queue management system using the so-called SLURM scripts. Read the Fox job system overview to learn how to deal with this system. We will have a quick workshop on Fox technicalities during one of the first group sessions. We also provide an example of a SLURM script, which you can use as a template.

Published Sep. 30, 2024 1:08 PM - Last modified Oct. 1, 2024 3:29 PM