Managing data on Colossus

Available file systems on Colossus

File system	Path	Recommended use
Project directory	`/tsd/pXX/data/durable`	Software, job configurations, input files, processing data in a job. Use it for long-term data storage. Backup enabled.
Cluster directory	`/tsd/pXX/cluster` `(identical to /cluster/projects/pXX)`	Software, job configurations, input files, processing data in a job. Legacy directory that used to be on a separate filesystem.
Home directory	`/tsd/pXX/home/<pXX-user>` `or $HOME`	Software, job configurations. DO NOT use it for processing data in a job.
Scratch	`/cluster/work/jobs/jobid` `or $SCRATCH`	Processing data in a job, chkfile to retain output data.
Local disk space	`$LOCALTMP`	Processing data in a very high I/O job. 100-200 GiB disk quota

(*) Additional disk space can be requested.

Colossus

Project data is stored on the IBM Storage Scale file system. On Colossus this file system is concurrently available on all compute nodes using the global parallel file system (GPFS) over ultra-fast 56 Gbps Infiniband. On the compute nodes its mounted under /gpfs, with symlinks pointing to /cluster/projects/pXX,/ess/pXX for legacy purposes.

Submit host

On the submit hosts, the project data on the IBM Storage Scale file system is available over NFSv4 (with Kerberos authentication, see below) over 1 Gbps ethernet. It's mounted under /ess/pXX with symlinks pointing to several legacy paths. We advice to use references to /ess/pXX in your job scripts.

Project directory

Colossus has access to the project directory via a high-performance parallel filesystem: /tsd/pXX, where pXX is your project number. A single disk quota applies to all its subdirectories (data,home,cluster). The cluster directory no longer resides on a separate filesystem and therefore data does not have to copied there for processing on Colossus.

By default, the entire project directory is backed up. However, there is no backup of the data stored in directories which include no-backup in their paths (e.g. /tsd/pXX/data/no-backup), but daily snapshots are available for the last 7 days in the /tsd/pXX/.snapshots subdirectory.

If you plan to work on TiBs of data that'll change frequently as a result of processing on Colossus, you may copy the data to a no-backup directory for the duration of the analysis. This will exclude temporary file changes from the daily backups and reduce the impact on the backup system.

Serving and keeping track of a parallel cluster file system is a complicated task. While the hardware is high-end, once in a while the GPFS software will get in trouble. When this happens users might experience delay when doing simple commands as "ls" or even hangs. Usually these problems lasts for a very short time, but if there is a serious problem there will be an announcement on the TSD Operational Log and on the Colossus Users email list.

Kerberos authentication

Access to the /ess/pXX file system over NFSv4 requires a valid Kerberos ticket. A valid ticket will grant you access, whereas an expired, invalid ticket will deny access.

If you connect to the submit host (via ssh or PUTTY) you'll automatically be granted a ticket for a 10 hour session which is automatically renewed up to a week. If the ticket expires after a week, you'll have to log out and back in to restore access. This is the preferred method of obtaining a ticket.

You can also manually obtain a ticket using the kinit command. However, this ticket will not renew and expire after 10 hours. We advice you not to use this command. Obtaining automatic and manual tickets at the same time may result in permission denied errors if one of the tickets expires while the other is still valid.

Kerberos authentication requires password authentication, hence you'll not be given a ticket if you connect using ssh keys. Please disable ssh keys on the submit host and use password authentication instead.

You can list your current ticket status using:

klist

In the initial stage it may only list the entry for the Ticket Granting Ticket (TGT) indicating a successful password verification:

-bash-4.2$ klist
Ticket cache: FILE:/tmp/krb5cc_7927__Vx2FH
Default principal: p11-bartt@TSD.USIT.NO

Valid starting       Expires              Service principal
07/06/2020 14:50:43  07/07/2020 00:50:43  krbtgt/TSD.USIT.NO@TSD.USIT.NO
        renew until 07/13/2020 14:50:21

Once you access /cluster/projects/pXX, entries for access to nfs/ess01.tsd.suit.no will be added, indicating successful authorization to the nfs mount:

-bash-4.2$ klist
Ticket cache: FILE:/tmp/krb5cc_7927_Vx2FH
Default principal: p11-bartt@TSD.USIT.NO

Valid starting       Expires              Service principal
07/06/2020 14:50:43  07/07/2020 00:50:43  krbtgt/TSD.USIT.NO@TSD.USIT.NO
        renew until 07/13/2020 14:50:21
07/06/2020 14:52:44  07/07/2020 00:52:44  nfs/ess01.tsd.usit.no@TSD.USIT.NO
        renew until 07/13/2020 14:50:21

If your ticket expires, you'll need to re-authenticate to obtain a new ticket. Log out and back in. If you get permission denied or cannot list the contents when you access a directory for the first time, but you do have a valid ticket, Kerberos authentication may have been delayed and will succeed if you try again.

Home directory

Each user has a home directory ($HOME) on ESS. By default, the disk quota for the home directory is 100 GiB (see below).

The home directory is backed up regularly (see below), but anything inside directories named no-backup is skipped. Backup is slow and expensive, so please put temporary files, files that can be downloaded again, installed software and other files that can easily be recreated or do not need to be backed up inside a no-backup directory.

Also note that one is not supposed to use the home directory as read/write area for jobs, especially not I/O intensive jobs. Use the scratch area for that (see below).

Scratch disk space

While a job runs, it has access to a temporary scratch directory on /cluster/work/jobs/jobid which resides on the high-performance GPFS filesystem. The directory is individual for each job, is automatically created when the job starts, and is deleted when the job finishes (or gets requeued). There is no backup of this directory. The name of the directory is stored in the environment variable $SCRATCH, which is set within the job script.

In general, jobs should copy their work files to $SCRATCH and run there since it cleans up after itself.

If you need to access the scratch directory from outside the job (for instance for monitoring a running job), the directory is /cluster/work/jobs/jobid, where jobid is the job id of the job in question.

Local disk space

For very intensive IO, it can be useful to use the local drives on the compute nodes. The path to the directory is stored in the environment variable $LOCALTMP. The compute and GPU nodes have 100 GB and 200 GB of local storage, respectively. Add the following to your batch script to request (e.g. 20 GB) of local temporary storage on the node:

#SBATCH --gres=localtmp:20
cleanup cp $LOCALTMP/outputfile $SLURM_SUBMIT_DIR

Disk quota

Each project has a single disk quota for /tsd/pXX/. All This includes the data, home and cluster subdirectories. Projects in need of large volumes of storage may apply for extra disk space from Sigma2.

On Colossus and the submit hosts, the UNIX df utility can be used to query disk usage on the GPFS file system. A disk will be full if either the space or the number of inodes (files) runs out. To query disk space, use:

$ df -h /ess/p1337
Filesystem        Size  Used Avail Use% Mounted on
10.3.2.31:/p1337  2.0T  322G  1.7T  16% /ess/p1337

To query the number of inodes, use:

$ df -ih /ess/p1337
Filesystem       Inodes IUsed IFree IUse% Mounted on
10.3.2.31:/p1337   2.0M  516K  1.5M   26% /ess/p1337

Data compression

Millions of small files pose a challenge for GPFS and is to be avoided. If possible pack the small files in archives making operations on them easy. One way is to copy the archive to $SCRATCH or $LOCALTMP, unpacking them there and working on the local file tree.

A standard unix/linux utility is gzip. Read the man pages for more information.

gzip file.dta

This will produce a file file.dta.gz, hopefully a much smaller file. Not all types of data compress equally well. Text compresses well, jpg pictures not well at all. For files that are to be unpacked at Windows machines the zip utility can also be used. One limitation of zip is that neither the input files nor the resulting archive can be larger than 4 GB. For files larger that 4 GB use gzip. Giving it a numeric argument like -9 forces higher compression at the expense of longer compression time. A more efficient alternative is bzip2.

To unpack the file :

gunzip file.dta.gz

This will result in the original file in it's uncompressed form.

Backup and restore

See here for more information on backup and restore options.

Search the user manual

Contact support

Call us

Opening hours are weekdays from 08:30 to 17:00 and Saturdays from 10:00 to 15:00.

Phone number: 22 84 00 04

Your request can be sent to it-hjelp@uio.no.

Send email

Book a Zoom meeting

Students and employees can book a Zoom meeting with UiO Helpdesk. Available hours are Tuesday to Thursday between 11:00 and 13:00.

Book a video call

Chat with us

Our chat is open every weekday between 09:00 and 16:00.

Start chat

Did you find what you were looking for?

Published June 21, 2021 10:35 AM - Last modified June 21, 2023 2:30 PM