Job Files, Quotas and Working Directories in Fox

A job typically uses several types of files, including:

the job script itself
the Slurm output file (default: slurm-<job-ID>.out)
input files
temporary files
output files

There are multiple choices for where to keep files.

Name	Path	Size	Description
Project area	`/fp/projects01/<project-name>`	quota per project	main project area, for permanent files
Project work area	`/cluster/work/projects/<project-name>`	no quota	for temporary project files on Fox
Job scratch area (`$SCRATCH`)	`/localscratch/<job-ID>`	3.5 TiB per node	a fast disk on the node where the job runs

Each location has its advantages and disadvantages, depending on usage. The parallel file system (project area and project work area) is by nature slow for random read & write operations and metadata operations (handling of large number of files). The local file system ($SCRATCH) is far better suited for this. In addition the parallel file system needs to serve all users, so placing very high metadata load on it make the file system slow for all users. On the other hand, the local file system is local to each compute node, and cannot easily be shared between nodes (but see below).

Checking Quotas

Project quotas start at 1 TiB, while storage in $HOME is capped at 50 GiB. To check your quota usage, you may use df -h . in your home and project directories.

Recommendations

We recommend that the job script itself and the Slurm output file (slurm-<jobid>.log) are kept in the project area. The default location for the Slurm output file is the directory where one runs sbatch. You can also keep both of these files in your home directory, but be aware that the disk quota for home directories is quite small.

Input files

Where to keep input files depends on how they are used.

If an input file is read sequentially (i.e., from start to end), it is best to keep it in the project area.

If there is a lot of random read of an input file, it is best to let the job script copy the file to $SCRATCH.

Temporary files

By temporary files we mean files created by the job, and that are not needed after the job has finished.

Temporary files should normally be created in $SCRATCH, since this is the fastest disk. This is especially important if there is a lot of random read and/or write of the files.

If other users need access to files while a job runs, or if you would like to keep the files after the job has finished, you should create files in the project work area. Files here can be made available to users in the same project.

Note!
Files in the project work area are deleted after 30 days.

Output files

By output files we mean files created by the job, and that are needed after the job has finished.

As with input files, if an output file is written sequentially (i.e., from start to end), it is best to create it in the project area.

If there is a lot of random writes (or reads) of an output file, it is best to create it in $SCRATCH, and let the job script copy the file to the project area when the job finishes. This can be done with the savefile command (see below).

Files in `$SCRATCH`

The $SCRATCH area (/localscratch/<job-ID>) for each job is created automatically when the job starts, and deleted afterwards. It is located on solid state storage (NVMe uisng PCIe) on the compute nodes. Such memory based storage is magnitudes faster than normal disk storage for random access operations. For streaming operations like writing or reading large sequential amount of data the parallel file system is comparable, even tape drives are comparable for streaming data/sequential access.

A potential limitation of the scratch area is its limited size. As memory has higher cost than spinning disks, the scratch area is limited to 3.5 TiB for batch compute nodes and 7 TiB for the interactive nodes. This is shared between all jobs running in the node.

Files placed in $SCRATCH will automatically be deleted after the job finishes.

Output files

Output files can also be placed in $SCRATCH for increased speed (see above). In order to ensure that they are saved when the job finishes, you can use the command savefile filename in the job script, where filename is the name of the file, relative to the $SCRATCH area. The command should be placed before the main computational commands in the script.

I.e.,

savefile MyOuputFile
MyProgram > MyOutputFile

This ensures that the file /localscratch/<jobid>/MyOutputFile is copied back to the submit directory (the directory you were in when you ran the sbatch command). The file will be copied back even if the job crashes (however, if the compute node itself crashes, the file will not be copied back).

If you want more flexibility, it is possible to register a command to be run to copy the file where you want it by using cleanup <commandline> instead of using the savefile command. It should also be placed before the main computational commands. I.e.,

cleanup cp MyOutputFile /cluster/projects/ec<N>/mydir
MyProgram > MyOutputFile

Both commands should be used in the job script before starting the main computation. Also, if they contain any special characters like *, these should be quoted.

Jobs using more than one node

As the $SCRATCH area is local to each node, files cannot be shared between nodes using $SCRATCH. A job running on several nodes will get one $SCRATCH area on each node.

Slurm provide utilities for distributing files to local scratch areas on several nodes and gather files back again. Here is an example to illustrate how this might look:

#!/bin/bash
#SBATCH --account=ec11
#SBATCH --ntasks-per-node=2
#SBATCH --nodes=2
#SBATCH --mem-per-cpu=500M
#SBATCH --time=00:02:0

## Print the hostnames where each task is running:
srun hostname

## This copies "hello.c" fro your submit dir to $SCRATCH on each node:
sbcast hello.c ${SCRATCH}/hello.c

## Simulate output files created on the $SCRATCH areas on each node
## by copy $SCRATCH/hello.c to $SCRATCH/bye.c once on each node:
srun --ntasks-per-node=1 --ntasks=$SLURM_NNODES cp  ${SCRATCH}/hello.c ${SCRATCH}/bye.c

## This copies the "bye.c" files back to the submit dir:
sgather ${SCRATCH}/bye.c  bye.c

Slurm sgather will append $HOSTNAME to each of the files gathered to avoid overwriting anything. Note that you have to set up ssh keys with an empty passphrase on Fox for sgather to work, because under the hood, it uses scp to transfer the files.

Files in project work directory

Each project has a project work directory, where one can store files temporarily. The directory is /cluster/work/projects/ec<N>. The area is open for all users in the project, so it is possible to share files within the project here.

It is recommended to create a subdirectory with your username in the work area and keep your files there. That reduces the likelihood of file name conflicts. It is also a good idea to set the permissions of this subdirectory to something restrictive unless you want to share your files with the rest of the project.

Old files are automatically deleted in the work area, so this is not a place for permanent storage. Use the ordinary project area for that (/fp/projects01/ec<N>).

^{CC Attribution: This page is maintained by the University of Oslo IT FFU-BT group.
It has either been modified from, or is a derivative of, "Job work directory"
by NRIS under CC-BY-4.0.
Changes: Major rewording and additions to all sections.}

Job Files, Quotas and Working Directories in Fox

Checking Quotas

Recommendations

Input files

Temporary files

Output files

Files in $SCRATCH

Output files

Jobs using more than one node

Files in project work directory

Files in `$SCRATCH`