TSD Operational Log - Page 18
Dear TSD-users,
we have finished the maintenance of the TSD infrastructure. The TSD machines (linux and windows) are available for login. Jobs queuing on Colossus will start soon according to the cluster capability. TSD is back in production.
Enjoy TSD: it is safe!
Regards,
Francesca@TSD
Dear TSD-users,
as anticipated today we do maintenance of the TSD infrastructure. The work is organized in such a way that the outage will never be complete, but we are not responsible for possible failures. The submission to cluster is possible but the jobs will queue until the end of the maintenance. We expect to finish in a couple of hours. Please check the operational log for up-to-date info during the morning.
We apologize for the inconvenience,
Regards,
Francesca@TSD
Dear TSD- users,
the thinlinc login is again available.
Sorry for the inconvenience,
Regards,
Francesca
Dear TSD-users,
we are having problem with the Thinlinc server, and the login via thinlinc is not available at the moment. We will try to solve the problem as soon as possible. In the meantime you can log into your windows VM (pXX-win01) and from there connect to the linux VM (either pXX-tl01-l or pXX-<username>-l) with putty. More info here:
/english/services/it/research/hpc/colossus/windows-user.html
We apologize for the inconvenience.
Francesca@TSD
Dear TSD-user,
The thinlinc problem has been SOLVED.
the login to Thinlinc server is not available at the moment. We are working to solve the problem. More info will come as soon as possible.
We apologize for the inconvenience.
Regards,
Francesca@TSD
Due to maintenance of "Nettskjema" service, users may experience delay in receiving answers from "Nettskejma " to TSD.
The maintenance will be finished by Thursday noon.
We apologize for the inconvenience.
Nihal@TSD
Dear TSD-users,
Windows issue was fixed 03.06 yesterday after tweaking some DNS settings.
Thanks to Erik, Dag-Erling, Francesca and the TSD-team.
...since today 02/02-16 at 16:00 pm we are having network issue affecting many windows project servers and windows admin servers inside TSD. The cause of the problem has been identified this evening and will be solved as soon as possible during tomorrow (03/02-16) morning. However at the moment the project windows VMs are not reachable neither by using Horizon View nor by using the old shh+RDP methods. Users can instead log into the linux servers by using the thinlinc protocol. At the moment of writing we do not know whether the problem has affected the colossus infrastructure. More info will come during the day of tomorrow (03/02-16)
We apologize for the inconvenience.
Francesca@TSD
Dear TSD-user,
due to windows license issue, the login to TSD via Horizon View is not possible. We are working hard to find the solution.
We apologize for the inconvenience.
Francesca@TSD
Dear TSD-user,
the fx service for import/export is not working at the moment. We are on the case and we will solve the problem as soon as possible.
The problem has been SOLVED!
Sorry for the inconvenience,
Francesca@TSD
Dear TSD users,
we have found and solved the network problem occurred at 20:00 today. The TSD is back to normal status.
Sorry for the inconvenience.
Regards,
Francesca@TSD
Dear TSD users
Sorry to announce that we have some network issue in TSD causing windows computers not to see storage and thinlinc not working.
We are on the case, updates will come.
Best
Gard@TSD
Hi,
the NFS problem that was causing the freezing of the opened sections in TSD has been identified and possibly solved. The TSD service shall be stable and opened up again for production today. More news soon.
Sorry for the inconvenience
Regards,
TSD@USIT
Dear TSD-user
the problem occurred today was solved by replacing the failing switches with two new ones. Almost all the components have been moved to the new switches. The moving will continue tomorrow as some reconfiguration of the internal network may be needed. At the moment the windows machines shall be up and running and users can log into their VMs via PCoIP and ssh+RDP. However some instability might be expected, since we are still reconfiguring the internal network. Linux VMs might still hanging and we are working to resolve this problem too.
Please follow the update on the operational log on the TSD webpage.
Sorry for the inconvenience
TSD@USIT
Dear TSD-users,
unfortunately we are having a network failure and this has caused a strong instability since yesterday 16/12 at 18:00. We need to shift to new routers and this implies an unscheduled downtime now. We do not know at the moment how long the outage will be but we are working very hard to get the problem solved as soon as possible during the day.
Sorry for the inconvenience,
Regards,
Francesca
Hi
We do still see glitches in the network due to either some infrastructure failing or the power being unstable.
We are working on this, sorry for the inconvenience.
TSD@USIT
Hi
we have had a power or a network failure that caused an unplanned reboot of many components in TSD, including several project VMs. The situation is stabilized and the machines are normally running now. We are investigating the causes of this important failure.
We apologize for the inconvenience.
TSD@USIT
Hi
It seems as we have had a power or a network failure inside or outside TSD which caused 100+ machines to reboot and some or the services has stalled. We are working on it.
TSD@USIT
TSD. WMware service is up and running now.
Sorry for the inconvenience.
TSD@USIT
Hi
Some patch or unknown factor has taken down our TSD view server yesterday. We are working full time on fixing it. Hope to have the service back very soon.
Best
TSD@USIT
Dear TSD-user,
the update of the VMWare security server has been successfully completed. All the windows VMs are now again accessible with the PCoIP protocol.
Enjoy TSD!
Regards,
Francesca
Dear TSD-user,
on Today the 30/11-2015 between 13:00 and 15:00 CET we will upgrade the VMWare View security server. During the upgrade the login to the windows machines in TSD via the PCoIP protocol will not be available. Login to the windows servers will be therefore only possible vis ssh+RDP connection (http://www.uio.no/tjenester/it/forskning/sensitiv/hjelp/brukermanual/ssh-og-rdp/index.html). However be aware that the ssh+RDP connection will only work if you do “Log off" from your last session opened with PCoIP.
The windows and linux VMs will not be affect by the upgrade and the processes running on the machines will keep running. Jobs on Colossus will not be affected.
Regards,
Francesca
Dear Colossus User
the maintenance has been successfully completed and the cluster is up and running. The hugemem nodes still need to come up, and probably will not be available until Monday next week. However all the jobs that were queuing during the downtime are already running.
Happy computing!
Francesca
Today (19/11) from 8:00 am Colossus will be stopped for maintenance. The outage shall last for two days.
Francesca
Dear TSD-user,
the maintenance stop of Colossus was successfully complete and the cluster is back in production.
As previously informed, there will be one more downtime the 19 Nov 2015 from 8:00 am. The downtime will last at max two days. This second downtime is needed to complete the work initiated now, namely setting up a new configuration that will significantly improve the I/O in the cluster.
Please notice that if you schedule a job with running-time longer then 14 days, then the job will not start before the end of next downtime.
Happy computing!
Francesca@TSD