TSD-Colossus unplanned outage: SOLVED on 02/06 at 15:00 pm

 

Dear Colossus-users,

we have worked very hard in the last week  and eventually we have solved the problem. The service is back in production. 

We apologise for the inconvenience.

--------------------------

We have been working very hard to solve the security issue reported below, and now we are close to deploy the solution. But before doing so we need to do the final testing by removing  the export of the partitions /cluster/etc/modulefiles, /cluster/software/VERSIONS, /cluster/var/accounting and setting up the new export rules. 
We will remove the exports tomorrow Wednesday 01/06 from 9:00 am, and after that you will not be able to use software from the Colossus software portfolio, you will not be able to use the modulefile and use the tools to inspect the quota and the queue. If you have jobs running on your local machine and using the software on Colossus, please be aware that they will die out. You might wish to terminate the process yourself before it will be killed.

If the solution will be successful then the Colossus service will be back in production as soon as possible during the day. Please notice that before we go in production we might need to reboot the machine but you will receive more info during the day. 

We apologize for the inconvenience.
Regards,
Francesca

Published May 31, 2016 4:13 PM - Last modified June 2, 2016 3:27 PM