Jupyter Notebooks for Machine learning

Guide on using Jupyter Lab on AI Hub infrastructure for interactive development and testing.

This guide assumes that you have access to the AI Hub machine learning infrastructure. In this guide we will use ML3 as an example, but everything in this guide should work on every machine in the AI Hub.

The Jupyter Lab environment is meant for development and easy visualization. For longer running tasks we recommend users to run scripts in a batch fashion to easier facilitate sharing of GPU resources.

How to start a Jupyter Notebook session on ML3.

Windows users

There were some reported issues when using windows command line when using port forwarding. So we recommend you to use Git for Windows (https://gitforwindows.org/) software instead.

Login to ml3.hpc.uio.no
- ```
 ssh -J <USER_NAME>@gothmog.uio.no <USER_NAME>@ml3.hpc.uio.no
```
- Note that we could use any ML node for this, to ensure that you understand the guide try to follow the guide on another ML node

Load the module

module purge
module load JupyterLab/3.2.8-GCCcore-10.3.0

# Optional: if you want to use TensorFlow 
module load TensorFlow/2.6.0-foss-2021a-CUDA-11.3.1
# Optional: if you want to use PyTorch
module load PyTorch/1.11.0-foss-2021a-CUDA-11.3.1

Start Jypyter
- ```
jupyter-lab --no-browser
```
- This will print something similar to what's below.
- Please note the URL with the token (highlighted in red below), this is the key to start the notebook

[W 08:36:08.506 LabApp] JupyterLab server extension not enabled, manually loading...
[I 08:36:08.516 LabApp] JupyterLab extension loaded from /storage/software/JupyterLab/2.2.8-GCCcore-8.3.0/lib/python3.
7/site-packages/jupyterlab
[I 08:36:08.517 LabApp] JupyterLab application directory is /storage/software/JupyterLab/2.2.8-GCCcore-8.3.0/share/jup
yter/lab
[I 08:36:08.519 LabApp] Serving notebooks from local directory: /itf-fi-ml/home/jorgehn
[I 08:36:08.519 LabApp] The Jupyter Notebook is running at:
[I 08:36:08.519 LabApp] http://localhost:8888/?token=79197a78ada474fb34680a8beaebb55207b01b92b9ed4c02
[I 08:36:08.519 LabApp]  or http://127.0.0.1:8888/?token=79197a78ada474fb34680a8beaebb55207b01b92b9ed4c02
[I 08:36:08.519 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 08:36:08.583 LabApp] 
    
    To access the notebook, open this file in a browser:
        file:///itf-fi-ml/home/jorgehn/.local/share/jupyter/runtime/nbserver-918461-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=79197a78ada474fb34680a8beaebb55207b01b92b9ed4c02
     or http://127.0.0.1:8888/?token=79197a78ada474fb34680a8beaebb55207b01b92b9ed4c02

Do not close the terminal or press CTRL-C on the jupyter-lab instance until you are done using jupyter-lab

4. Next we will need to SSH into ML3 again, but this time we will forward the port used by Jupyter

Note the port used by Jupyter from the URL above, in this case the URL was
```
http://localhost:8888/?token=79197a78ada474fb34680a8beaebb55207b01b92b9ed4c02
```
and the port is 8888
Without closing the terminal where Jupyter-lab is running, open a new terminal and do the following steps

Next we will SSH into ML3 with this port exposed to our local machine, this is accomplished with the following template:

ssh -L <port>:localhost:<port> -J <username>@gothmog.uio.no <username>@ml3.hpc.uio.no

Or to use the actual port we found above:

ssh -L 8888:localhost:8888 -J <username>@gothmog.uio.no <username>@ml3.hpc.uio.no

5. Then we are finally ready to open the URL printed by Jupyter on our local machine. Simply open the URL in a browser on your local machine.

Jupyter_example

6. When you are finished please remember to shutdown Jupyter and logout from the machine. This will release the port for use by others.

Using a specific GPU

Since the AI Hub machines are shared resources, with a first come first serve setup, it can sometimes be useful to tell Jupyter to only use one GPU or a specific GPU that is less tasked.

To accomplish this we can set the variable CUDA_VISIBLE_DEVICES.

First we will use nvidia-smi to show the available GPUs in the system and their current load. Simply run nvidia-smi after logging in, the output should look something like the following:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:18:00.0 Off |                  N/A |
| 34%   48C    P2    54W / 250W |  10096MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:3B:00.0 Off |                  N/A |
| 32%   45C    P2    84W / 250W |   7422MiB / 11019MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  On   | 00000000:86:00.0 Off |                  N/A |
| 28%   27C    P8    11W / 250W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  On   | 00000000:AF:00.0 Off |                  N/A |
| 28%   29C    P8    34W / 250W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

From the above we can see that there are four GPUs connected to the machine, luckily for us none of them are heavily loaded, however, to show the concept we will restrict ourselves to using only GPU 3 (GPU 0 is usually the default so using something else is usually a good tip to avoid having other users take up our resources).
Next we will load Jupyter and TensorFlow as described above, but when launching jupyter-lab we will add the following to restrict our usage to only one GPU:
```
CUDA_VISIBLE_DEVICES=3 jupyter-lab --no-browser
```
The result will be that only a single GPU is visible to TensorFlow in our Jupyter instance. Note that when we used 3 above, it means that we want only GPU with ID 3 to show up in TensorFlow. We could also have used CUDA_VISIBLE_DEVICES=1,3 to show both GPU 1 and GPU 3. The default is to use all devices and if only one GPU is needed GPU 0 is usually used.
In the image below we show the expected behavior in TensorFlow. Note that only one GPU is shown, it is given ID 0, even though we requested ID 3, but that is simply TensorFlow renaming the GPU.

Debugging issues

Following error when trying to login

ssh -L 8880:localhost:8880 -J <USER_NAME>@gothmog.uio.no <USER_NAME>@ml3.hpc.uio.no

bind [127.0.0.1]:8880: Address already in use
channel_setup_fwd_listener_tcpip: cannot listen to port: 8880

Reason: you already have a port forwarding setup
Solution: Logout from all other sessions
Solution 2: (if you have lost terminal, e,g. Forgetting to logoff before summer holidays and trying to use it when you return.
- See if there are already running sessions and kill them
- ps aux ..

Published Mar. 30, 2021 3:59 PM - Last modified Dec. 14, 2023 9:35 AM