TSD Operational Log - Page 10
There are a few remaining projects which need help with printing and GPUs, but the rest of the work is completed.
We are busy with Windows maintenance, which will cause interruptions to login sessions throughout the day.
We are experiencing some technical issues with tsd-fx03, and the service is currently unavailable as a result.
We are investigating the cause and working on fixing the issue as soon as possible.
Difi has notified us that they are experiencing issues with BankID mobile for Telenor customers, which therefore may affect selfservice login for some TSD users.
Dear TSD User
We will perform some internal network maintenance at 10:00. We do not expect any interruptions to services, but please let us know if you experience any issues.
UPDATE:
The problem should be resolved.
UPDATE:
We are experiencing problems with access to view.tsd.usit.no. We are working on resolving this.
UPDATES ON ONGOING MAINTENANCE:
09:30: Unmounting /cluster on all machines.
11:00: New machine is up, currently running tests.
12:15: Starting up services on project machines to allow access to /cluster again.
12:50: Colossus services and /cluster exports running as normal, now with 10 times the bandwidth.
The NFS-exporter for Colossus crashed again, on the brink of our planned maintenance and switch to the new machine tomorrow morning.
We've restarted the services now, and will be restarting the machines which are now hanging due to this promptly.
Our apologies for the inconvenience.
--
Best regards,
The TSD Team
UPDATE, 09:30, starting unmounting of NFS-shares from /cluster on all machines.
We have now solved the problems we encountered on Monday, and are now ready to replace the NFS-exporter.
The work will start on Thursday 3rd October at 09:00 CET. We expect to be finished by the end of the day, possibly earlier.
During the maintenance, we have to unmount /cluster on all virtual machines (VMs) that mount it. This means that the /cluster/projects/pXX areas will be unavailable on the VMs, and it will not be possible to use the module load system for software on the VMs. Some VMs might also require a reboot.
Jobs on Colossus will continue to run as normal, but it will not be possible to submit new jobs during the stop.
Do not run jobs on VMs that need data from /cluster or software modules. If you do so, we will have to kill them to unmount the /cluster area. Also, if the VM needs to be rebooted, all ru...
We are currently performing maintenance on the self service and data portal
UPDATE: Unfortunately, we encountered some unforeseen problems, and were not able to switch to the new NFS-exporter today. The system is now back in normal production using the old exporter, and you can continue to work as normal again. We hope to solve the problems quickly, and come back with a new day soon for replacing the NFS-exporter.
We are sorry for the inconveniency.
We will replace the existing NFS-exporter on Colossus starting on Monday, 30th September 09:00 CET, and continue working throughout the day.
We will stop the NFS-export by dismounting it on all Virtual Machines, and some may also require a reboot.
You will not be able to run jobs on VMs that need data from /cluster or software modules. If we have to reboot the VM to unmount /cluster, the running jobs will also be killed.
Please save your data before the maintenance window, and follow our Operational Log for the update.
The...
We are experiencing issues with some services, which may lead to users being unable to login to TSD through VMWare Horizon Client with an error "all available desktop sources are currently busy". We are investigating the cause of this and working on a fix.
Dear TSD User
Due to issues with part of login infrastructure which is preventing some projects from logging in, we need to perform unplanned maintenance on the view-ous login gateway. This will mean that login sessions for p22, p149, p191, p192, p321, and p410 will be suspended while we reboot. Apologies for the inconvenience.
We are experiencing issues with some services, which may lead to users being unable to login to TSD through VMWare Horizon Client with an error "all available desktop sources are currently bussy". We are investigating the cause of this and working on fix.
Dear TSD User
We are experiencing issues with Windows login and are working to fix it.
Dear TSD User
We are experiencing issues with Colossus, which is delaying jobs from being run. We are working to fix the problem.
As previously announced, we are starting today at 09:00, and continue working throughout the day. Colossus will not be available during this period. The maintenance will include an upgrade of both network and NFS-export.
Please note that this means the /cluster file system will be unavailable during the maintenance stop, and some of the VMs mounting /cluster might need to be rebooted.
No currently running jobs will be canceled due to the stop, but jobs that will not be able to finish before 09:00 on Monday, will be held in the queue until after the maintenance.
Update:
10:56
We have partially completed the upgrade, and Colossus is ready to use again. Due to a hardware-error, we were unable to replace it from the NFS-export machine. We will address this issue later. Also, we managed to run a command that will prevent similar crashes as the one happened yesterday.
We are having issues with Windows and linux login, and are working to fix the issue
Dear TSD User
The nfs export of /cluster to project VMs is currently down, we are diagnosing the issue and working to fix it.
Dear TSD users,
The /cluster file system was down between 10:15 and 11:10 due to a crash of one of the file system daemons. The file system is now up again, but many jobs on colossus have likely crashed in the mean time, so please check your jobs. The VMs mounting /cluster will also have experienced problems.
Things should be back to normal again now, but please don't hesitate to contact us if you're still experiencing problems.
Our apologies for the inconvenience.
--
The TSD team.
We are experiencing issues with some services, which may lead to users being unable to login to TSD through VMWare Horizon Client. We are investigating the cause of this and working on fix.
Update:
- https://view.tsd.usit.no/ is up again.
Dear TSD User
We discovered that due to infrastructure issues, the selfservice portal's QR code generation did not work as intended from Monday up until today at 12:00. If you tried to reset your QR code during this period, we kindly ask you to do so again.
We are experiencing issues with some services, which may lead to some users being unable to login to TSD through ThinLinc. We are investigating the cause of this and working on fix.
TSDs self service portal will be unavailable for a short period at 9.15 2019-07-04. We will update this notice with more information and more precise time frames shortly.
Our apologies for any inconvenience this might cause.
TSDs self service portal will be unavailable for a short period at 10:00, 2019-06-28. We will update this notice with more information and more precise time frames shortly.
Our apologies for any inconvenience this might cause.
--
Best regards,
TSD
The self service portal will be unavailable for a short period, while the database group is performing an upgrade.
Dear TSD User
As planned and announced, we have shut down sftp data transfers to and from TSD. For data import and export, please use https://data.tsd.usit.no - the new data transfer service works from all major browsers as long as javascript and cookies are enabled. If you prefer to use the command-line, or need further assistance please contact our user support.