Educloud Research Operational Log - Page 2
Due to an infrastructure issue, some Educloud web services are experiencing instability. We're working to diagnose and fix the problem.
Thursday 2023-09-07 15:45-17:00 there will be a downtime due to maintenance on the storage system.
During this period the home and project directories on Educloud VDIs will be inaccessible. Processes will hang and continue afterwards or terminate unexpectedly. To be on the safe side, save your work and log off prior to the downtime.
A maintenance reservation for the downtime has been set on Fox. Jobs that cannot complete before the downtime will not be scheduled until after the downtime. Any running jobs and processes will be killed and all nodes will be rebooted, including the login and interactive nodes. If you had running jobs or processes at the start of the downtime, please check your job output for errors. To be on the safe side, save your work, cancel running jobs and log off prior to the downtime.
Update 17:50: There have been delays and the downtime is expected to last several hours longer.
The operating system on Fox will be upgraded starting September 25. This is a major upgrade, and will take a week. During the upgrade, Fox will be unavailable.
Fox currently runs Rocky 8, which will be upgraded to Rocky 9, the latest version. (Rocky is a RedHat clone.) We will also upgrade the queue system (Slurm) to its latest version.
Jobs that cannot complete before the downtime will not be scheduled. Because there is a high risk that jobs submitted before the upgrade will need modification to work after the upgrade (for instance, module versions will change), all pending jobs will be cancelled when the maintenance starts.
The software available via "module load" will be reinstalled. We will install the 2021a toolchain and newer. This rebuilt software will be available on VDIs running RedHat 9 during the Fox maintenance.
Your home directories and project areas will be ava...
Fox compute nodes went offline for a short period today, causing running jobs to crash.
The root cause has been found, and a fix is put in place.
Please check any jobs that were running and re-submit if necessary.
We apologize for the inconvenience this has caused.
There is a login problem in all portals due to an expired certificate. We are working as fast as we can to solve the issue.
There is possibility for downtime in the Educloud cluster this Wednesday from 17h on, which may disrupt Aspasia, the file server, and for Fox nodes accessing Educloud storage.
UPDATE: maintenance has been completed. No services look like they were affected.
Educloud Share (https://share.educloud.no) vil ikke v?re tilgjengelig 23. mars fom. kl 13, Nedetiden er n?dvendig pga database migrering til UiO sin infrastruktur.
Nedetiden forventes ikke ? v?re mer enn 30 minutter.
The availability of some GPUs will be reduced on Fox over the next four days for maintenance. We apologize for any inconvenience this may cause.
The OPT login will be temporary unavailable due to DB migration.
Database transition will be performed tommorow 01.03.2023 from 8.30 till 9.30. The affected services will be:
* Selfservice
* data.educlould.no
* Nettskjema
The server instances hosting the share.educloud.no frontend will be upgraded Thursday February 23rd at 09:00.
The service will be unavailable during the upgrade which is expected to last until 10:00.
This morning, the shared file system (project areas, home dirs, etc.) went missing on most login and interactive nodes. This has been fixed now, and we are looking into why it happened. It did not happen on compute nodes, so jobs should not have been affected.
Update: the root cause has been found, and steps are being taken to prevent it from happening again.
2023-01-12 18:10 view.educloud.no is up and running with new certificates.
The certificates will be updated Thursday January 12 at 18:00
2023-01-11 23:15 view.educloud.no is up and running.
2023-01-11 21:30 view.educloud.no is currently inaccessible. We are working to fix the issue.
The entrypoint for the VDI's in Educloud (view.educloud.no) will get new certificates Wednesday January 11th at 20:00 (GMT+1).
- This will require a server restart, and it will not be possible to log in for 15 minutes.
- No running VDI sessions will be terminated.
There is an issue with the log-on function. We are working hard to solve the issue.
New Educloud users cannot fetch OTP correctly. We are working to solve the problem.
We are currently having problems with ID-porten and feide login, due to patching of security vulnerabilities.
Self Service will not available in the next couple of hours to test migration to MFA.