TSD Operational Log - Page 3
Due to an ESS upgrade at 13:30 we're experiencing some storage instability. This affects the internal mirrors (CRAN) too. Some vms will be rebooted in the process.
TSD services might be unstable for the moment - we are working to fix it.
Dear TSD-users,
At 07:00 the upcoming Tuesday we will be doing upgrades of the databases of our core services, and the databases in the following projects:
p11
p14
p23
p47
p57
p58
p96
p110
p166
p174
p189
p206
p302
p588
p594
p827
p874
p969
p1075
p1859
p2184
If all goes according to plan should be done around 11:00 at the latest.
During this time our services will be partially or fully unavailable.
--
The TSD team
The backend of several of our services is down.
This will affect file import and export, publication portal, nettskjema delivery and more.
--
TSD
We're experience instability with the TSD, affecting the timeliness of Nettskjema attachment delivery, and file import and export. We're working on solving it.
TSD Self Service is currently unreachable.
The bigmem nodes on Colossus has been reserved for a single project until the end of September, which means that no other projects can use the bigmem partition until then. This has been done on the request of Sigma2, which owns the bigmem nodes.
Starting at 09:30 on 2023.07.10 we will be upgrading the databases for our core services.
Due to this our services will at times be partially or fully unavailable at times during this upgrade.
We will update this message as we go along, and notify you when it's done.
--
On behalf of TSD
We are currently experiencing instability in access to storage for multiple projects, affecting all services.
We are still investigating the problems.
-----
Update 09:05:
We have remounted the storage for the affected machines, and they seem to work now.
The instability affected around 30 projects from around 4am this morning.
-----
Update 10:00:
There are still reports of instability, and we will investigate further.
-----
Update 2023-07-07:
The reason for the instability was found and addressed yesterday. All systems should have worked normally since about 11am yesterday.
Maintenance is being performed on our storage systems. We expect minimal issues. Some linux hosts may need to be rebooted.
Any paths under /cluster (e.g. software and projects) are unavailable. This affects software modules and project areas on Linux submit hosts (and other hosts with a /cluster directory). The cluster directory can still be reached via /tsd/pxx/cluster instead.
SCCM group will be upgrading internal SCCM-site database in TSD on Thursday 2023-06-08 Software Center on all Windows VMs in TSD will be unavailable between 12:00 and 16:00.
We are currently experiencing instability in access to storage for multiple projects, affecting all services.
We are still investigating the problems.
-----
Update: The reason for the unstability is identified and resolved.
We are working to fix the issue
The TSD Identity Management System will go for maintenance for a short time today 24.05.2023 from 17-18
We're experiencing login issues through vmware on Linux and slow storage over NFS.
The Consent System will be temporarily unavailable for upgrade.
https://data.tsd.usit.no was not accessible between approximately 12:51 and 14:45 today due to an expired security key. This affected both login and data import/export. The issue has been resolved since then.
Since around 09:45 we're experiencing NFS storage issues.
We're working to fix it and hosts will be rebooted in the process.
From 10:00 the NFS servers in TSD will be upgraded.
We expect this to take approximately 1 hour and may cause network/NFS interruptions.
We advice you to keep an eye on this operational log for any updates.
The Consent System will be temporarily unavailable for upgrade. The upgrade will last until the end of the day. After the upgrade, only the consents from the last 2 months will be available initially. In the next day, the old data will appear (there is no data lost).
The availability of nodes (compute, bigmem, accel, dragen) will be reduced on Colossus over the next several days for maintenance.
A handful of nodes will be taken down for maintenance at a time. We apologize for any inconvenience this may cause.
Like yesterday, we are experiencing storage problems due to our storage provider, IBM.
We are currently experiencing technical difficulties with core services like file import/export, and are working to restore operations. Sorry for the inconvenience caused by this.