TSD Operational Log - Page 19
- This work has been finished and we are back in production (08.50 - 6/5-15)
Due to USIT starting use of a new certificate when speaking to minID (idporte/difi) there will be a restart of Nettskjema at 6/5-15, 08:30. Estimated downtime is about 20 minutes.
Nettskjema will not work during this downtime. We will try to update here when back on line again.
Best TSD@UST
Spice-proxy was accidentally rebooted today at 15:40 today. This incident caused a 5 min downtime for remote connections to linux machines in TSD using spice. We are sorry for any inconvenience this may have caused.
Spice proxy is now up and running again.
Dear TSD users
As usual things does not go as planned. The machine has been moved, but we can not get the SPICE connection working. We will come back to this shortly. Login first to a windows machine and then putty to your linux VM works.
We are moving the SPICE proxy machine today to vmware at 14.00, so there will be a short downtime, up no later than 1430. All connections will be lost if using SPICE. If you log in on windows and then use SSH to your computer you will not be affected.
We will update this logpost when done.
Dear TSD users
We have fixed the the LDAP issue in TSD. Everything should work, except a known truble with p21 in the filesluice.
We are very sorry for the downtime. If there are any more issues, please report to tsd-drift@usit.uio.no.
Best regards
Gard@TSD
Dear TSD-users
NB : Amount of data inside the file-sluice was so large that we must encrease downtime for the file-sluice until 1700 today.
We are moving one of the file-sluice-machine tomorrow morning to vmware, thus, no files can be imported or exported tomorrow morning from about 0900-1200. Data will not be lost, but all jobs and connections will be cut off at 0900 tomorrow morning. Nettskjema answers from this period will pop up inside TSD once we are back online with the server.
Nettskjema will be stopped 19/3-15 from 0830 to 10.30. No answers can be handed in during this time and one can not log in to create or change Nettskjemas at nettskjema.uio.no. This downtime is due to a major upgrade to Nettskjema version 15.
Best regards
Gard
We had a DNS issue yesterday at about 1530. It was quickfixed yesterday afternoon, and a permanent fix was in place today at 10.15.
The reason why was that after migrating machines to vmware, these machines were left in RHEV as turned off. Some eager users had restarted these machines (we totally understand why, as you believed they where down) and this caused duplicate machines regarding names and IP addresses. This again caused DNS to panic.
Sorry for the inconvenience.
Gard
We have solved the issue with the import - export.
Sorry for the delay in with the fix
Gard
Dear TSD-users
You may logging to TSD now. Problem solved.
TSD-team
Dear TSD-users
You may experience problems with logging to TSD. We are on the case and working to solve it as soon as possible.
Sorry for the inconvenience.
TSD-Team
We had a disk run full on one our login machines late last night. This caused logins to fail. Case has been solved. Login is now enabled again.
Gard
Dear TSD users
TSD is now back up. All windows machines will/have be(en) restarted. We are restarting a few linuxboxes now, assuming most users can start their own.
Best regards
Gard@TSD
We have detected a problem with the copying of data in and out of TSD. We know what is wrong, but not why. We are working on it and will upate here once solved.
Problem is now solved. If errors still, please report to tsd-drift@usit.uio.no with filename and project number.
The HPC resource Colossus is still down due to security update.
We hope that it is back up again during Monday 2/2-15
Update: Our work with Colossus yesterday unfortunately had to give way to an emergency maintenance stop of Abel. New ETA is late today, Tuesday 3.
Update : Colossus came back up yesterday at about 1600, unfortunatelty queued jobs did not manage to start and has to be resubmitted.
As the glibc error showup up yesterday worldwide we also had to fix this issue during our downtime.
Things have not gone as smooth as planned so we must prolong our downtime until further notice.
We have several people working on the case right now so we hope to see a solution fairly soon.
We will post update on the email-list and the website as soon as we know more.
We are sorry for the inconvenience
Vi m? ha et lite vedlikeholdsvindu p? filslusen torsdag 15/1-15. Kl 12-13. Det er mulig import/eksport tjensten f?r ned i tidsrommet.
Beklager problemene dette m?tte medf?re.
We are currently experiencing a disk hang on Colossus. This makes jobs hang when they read from or write to the disk.
We are working on fixing the problem. When the issue is resolved, jobs will hopefully continue, however some jobs might crash.
Update: The issue has been resolved, and jobs should be running again.
Dear User,
the problem regarding the file lock in TSD has been solved. The import/export is allowed again.
Regards,
TSD@USIT
Dear TSD-user,
Dear TSD-user,
we are experiencing problems in the TSD system and currently it is not possible to log to any TSD project. We are working to solve the problem as soon as possible. You will receive a message from us when the system is back.
Sorry for the inconvenience.
Bests Regards,
TSD@USIT