Norwegian version of this page

TSD Operational Log

Published Feb. 17, 2025 9:55 AM

The Nvivo license for TSD has expired. We are working on renewing it.

More information to follow.

Published Feb. 12, 2025 4:09 PM

We are currently experiencing issues with the TSD Self Service system for creating foreign users.

Currently, invited foreign users are being asked to sign in before their accounts are fully created after they follow the creation link received via email.

We are working on a solution. 

 

[14.02.25] The issue has now been resolved. It is now possible to successfully invite foreign users using the TSD Self Service system again.

Published Feb. 7, 2025 9:55 AM

On Monday February 24 2025 from 0800 we'll preform maintenance on the Colossus cluster. During the downtime, we will upgrade Slurm and update the OS software on the compute nodes. Therefore, you'll not be able to submit jobs during this downtime window.

As usual, a maintenance reservation has been set on the cluster. Any jobs that cannot complete before the downtime starts will remain pending in the queue until after the downtime completes.

During the downtime, please follow this operational log message for updates.

Published Jan. 6, 2025 9:17 AM

ID-porten are currently experiencing issues.

This means that you will not be able to log in using ID-porten. Digitaliseringsdirektoratet are working to resolve the issue.

09.28: Digitaliseringsdirektoratet have confirmed that the issue should now be fixed. Please try again.

Published Jan. 3, 2025 9:01 AM

There were previously issues with accepting new users into a project. Admins would experience nothing happening upon trying to confirm their application. This issue has now been resolved and we ask that you try again.

Published Dec. 2, 2024 1:11 PM

We are currently experiencing issues with data import/export.

We are working to resolve the issue.

Published Nov. 19, 2024 11:12 AM

We are currently having some trouble with the dataloader, so files in nettskjema-data are not being updated.

 

We are working to resolve the issue, and answers will be updated once resolved.

 

While the problems are ongoing, you can access the nettskjema-anwers using the internal nettskjema portal.

Published Nov. 12, 2024 4:24 PM

Another patch to TSD storage, which should resolve our sporadic issues. The upgrade will be performed node by node, which may cause temporary disconnection to virtual machines' access to data.

Starts at 07:00 wednesday 13th, will take about 10 minutes.

 

Update: The patching finished at 07:13, operation back to normal ~07:15

Published Nov. 12, 2024 7:55 AM

This affects NFS on TSD clients as well as central services. SMB (Windows) is not affected. We're restarting services to restore connections, and actively working with the 3rd party vendor to resolve the underlying issues of late.

Read & write is confirmed to be working from clients 08:22

Published Nov. 8, 2024 3:51 PM

Storage still have problems, and had a few events with NFS hangs in the afternoon:

14:45 We're notified of NFS-hangs, and notice one of two protocol nodes has a downed NFS service. Commence to restart it.

14:59 Node is back up again, mounts work.

15:36 Discovers the other protocol node have problems with write on its NFS exports. Proceed to restart this one as well.

15:46 Node 2 back up, production back to normal.

15:57 NFS went down again on node 1. Restarting services

16:07 Node back up, production back.

16:15 Again, discover that the other protocol node hangs NFS-hangs on write, and restart.

16:20 Both nodes are operational, and neither have hangs on write anymore....

Published Nov. 7, 2024 8:00 AM

After storage problems in the night, we discovered this morning that all clients still lacked write access. This affects NFS on TSD clients as well as central services. SMB (Windows) is not affected.

Measures are currently being taken to recover connection, and 3rd is notified in order to resolve the instability ASAP.

 

07:59 - 1/2 connections back up.
08:06 - Both connections back up, storage is in fully operational.

For now. We'll need to follow up with the vendor further.

Published Nov. 5, 2024 1:46 PM

The general storage issues have been resolved, this procedure is simply to apply finalizing touches on the network interfaces. This is expected to improve NFS performance.

The tuning will be applied node by node from 22:00, and takes about 10 minutes. This may cause temporary issues with virtual machines' access to data. We'll be following up any such issues.

 

Maintenance complete 22:15

Published Nov. 4, 2024 3:16 PM

In regards to the ongoing storage issues, an upgrade to the NFS implementation will be installed on central storage tomorrow morning from 07:00.

The maintenance requires full takedown for the duration of the upgrade, approximately 10 minutes.
TSD will therefore have complete downtime for both Linux and Windows clients.

We apologize for the exceptionally short notice, unfortunately this is a required measure to aid us out of a critical situation.

 

07:30: Due to initial issues with installation, the downtime is extended to 07:40. We apologize for the inconvenience.
07:38: Storage is upgraded, and production is resumed.

Published Nov. 1, 2024 9:16 AM

This affects NFS on TSD clients as well as central services. SMB (Windows) is not affected. We're actively working with the 3rd party vendor to resolve the issue.

Update 04.11.2024

File import and export functionality is restored.

Published Oct. 29, 2024 2:32 PM

TSD will apply a security patch from IBM to the storage system, which may cause temporary issues with virtual machines' access to data. We'll be following up any such issues.

Published Oct. 21, 2024 2:01 PM

There are currently issues with storage in TSD, causing some users to experience problems with logging in, a blank screen or sessions to disconnect abruptly. We are actively working to resolve the issue.

Update 23-10-2024: Most issues have been resolved, but we're still looking at individual cases and will continue work with the third party vendor.

Published Oct. 17, 2024 10:46 AM

We are currently experiencing technical difficulties with the storage services, and are working to resolve the issue.

Update 14:15: The issue has been resolved.

Published Sep. 30, 2024 2:18 PM

We are performing storage system maintenance on Wednesday 9 October from 16:00 CET to apply security updates recommended by IBM. During this time period there will be service interruptions to virtual machines.

A maintenance reservation for the downtime has been set on Colossus starting at 15:00. Jobs that cannot complete before the downtime will not be scheduled until after the downtime. Any running jobs and processes will be killed. If you had running jobs or processes at the start of the downtime, please check your job output for errors. To be on the safe side, save your work, cancel running jobs and log off prior to the downtime.  (Note that this is one hour earlier than the storage maintenance - we will use the opportunity to upgrade Slurm on Colossus, instead of having a separate downtime for that.)

During the downtime, follow this opslog for updates.

Published Sep. 30, 2024 1:30 PM

There were issues with project storage in TSD on Monday the 30th of September, between approximately 12:45 AM and 1:30 PM. TSD was partially inaccessible during this time period, and users who were logged in experienced the project storage being unavailable. The issue was quickly resolved, and should not have any further consequences. If you experience any issues, please contact us.

Published Sep. 26, 2024 9:49 AM

GPU-3 in the UiO allocation of Colossus is down and will need replacement parts. This means there's currently only one GPU node available. Until its restored, expect longer queue times in the UiO allocation when requesting GPUs.

Update 2024-11-04: The node is back in production.

Published Sep. 9, 2024 8:14 AM

TSD is unavailable for all users due to planned maintenance.

 

It will be unavailable for most of the day. This notice will be updated once maintenance is complete.

Published Sep. 3, 2024 9:58 AM

After a brief hiccup in the system yesterday, many projects are receiving the error "This desktop currently has no desktop sources available. Please try connecting to this desktop again later, or contact your system administrator." when trying to log in.

 

[RESOLVED]: A fix for was issued just after 10:00 on September 3rd. If you still receive the same error, please send an email to tsd-drift@usit.uio.no, and we can reboot your machine manually.
 

Published Aug. 31, 2024 12:26 PM

The certificate for the public TSD consent portal has expired, causing https://consent-portal.tsd.usit.no/ to be unavailable for TSD users. We are actively working to resolve the issue.

Published Aug. 28, 2024 2:04 PM

The EL7 submit vms have been powered off. Please use the new EL9 pXX-hpc-01 submit hosts instead for submitting jobs to Colossus.

Published Aug. 16, 2024 11:03 AM

Colossus is currently not able to contact the TSD license server, which means that it is not possible to use licensed software on Colossus.

We are working on solving the issue.