Educloud Research Operational Log
The Galaxy platfom on FOX was down due to some issues in the slurm-drmaa library. They have been now fixed and Galaxy is working properly.
We are going to upgrade Educloud Share on Wednesday, October 9th 2024 at 10:30. The service will be unavailable for about 15-30 minutes.
On 2024-10-09 from 16:00 we'll perform maintenance on the storage system.
During this period the home and project directories on Educloud VDIs will be inaccessible. Processes will hang and continue afterwards or terminate unexpectedly. To be on the safe side, save your work and log off prior to the downtime.
A maintenance reservation for the downtime has been set on Fox starting at 15:00. Jobs that cannot complete before the downtime will not be scheduled until after the downtime. Any running jobs and processes will be killed. (Note that this is one hour before the storage maintenance starts - we will use the opportunity to adjust the Slurm setup instead of having a separate downtime for it.) Users will be kicked out of the login and interactive nodes at 16:00. If you had running jobs or processes at the start of the downtime, please check your job output for errors. To be on the safe side, save your work, cancel running jobs and...
Update, 2024-09-25 09:17: The upgrade has been completed.
The queue system (Slurm) on Fox will be upgraded on Wednesday, September 25 at 09:00.
The upgrade is expected to last only five to ten minutes. Running jobs will be suspended during the upgrade, and the Slurm commands (squeue, sbatch, etc.) will be unavailable during parts of the upgrade. Apart from that, users should not be affected by the upgrade.
The upgrade will bring a few user visible new features, like squeue --notme.
03rd September from 08:00 to 12:00 CET share.educloud.no (Nextcloud) will be down for maintenance as its storage (S3) will be upgraded.
This message will be updated once the downtime is completed.
Issues occurred while listing users in research.educloud.no. We are working to solve the issue promptly
We are going to upgrade Educloud Share on Friday, May 24th 2024 at 15:00. The service will be unavailable for about 15-30 minutes.
An issue with the storage is being investigated. Fox and VDI (view.educloud.no) is currently unavailable. More details to follow.
On 2024-05-08 from 16:00 there will be a downtime due to maintenance on the storage system.
During this period the home and project directories on Educloud VDIs will be inaccessible. Processes will hang and continue afterwards or terminate unexpectedly. To be on the safe side, save your work and log off prior to the downtime.
A maintenance reservation for the downtime has been set on Fox starting at 16:00. Jobs that cannot complete before the downtime will not be scheduled until after the downtime. Any running jobs and processes will be killed. Users will be kicked out of the login and interactive nodes at 16:00. If you had running jobs or processes at the start of the downtime, please check your job output for errors. To be on the safe side, save your work, cancel running jobs and log off prior to the downtime.
During the downtime, follow this opslog for updates.
We are going to upgrade Educloud Share on Tuesday, March, 19th 2024 at 10:00. The service will be unavailable for about 10 minutes.
There is a problem with the Educloud Ondemand service.
Investigations are ongoing.
The issue has been resolved.
The scheduled downtime that was supposed to take place today between 14:00 and 16:00 CET has been postponed to another day. Users will be notified of the new date when we know more.
The service was not affected and is running as normal.
Educloud Ondemand has been upgraded to the latest version. In addition there is a new version of the Jupyter app set as default, which makes it easier for you to customize your environment.
Should you experience problems logging in, please try clearing your browser cache and cookies.
As always, please get in touch if you have any issues.
Happy computing from the Fox team!
[2024-02-09 09:20: update] The problem has been fixed now, and the nodes are back in production.
Currently the interactive nodes (int-[1-4]) and three of the GPU nodes (gpu-[4-6]) are unavaialable due to network problems.
We are investigating.
On Thursday, 2024-02-08 starting at 15:30, we will upgrade Nextcloud to the latest version, which will make a downtime of ca. 30-45 min necessary.
It's been roughly a year since the first "pilot" users were allowed onto Educloud OnDemand, and we're very happy to see that it has been so well recieved.
Enabling HPC for both new uses and new users fits right into the core of the UiO IT department's mission, and while we are happy with what we have done so far - we are not done.
We have been working behind the scenes to improve our setup and get ready for another year, and while most of this should be invisible to users we believe this will make the service better and more powerful than ever.
A short summary of changes follows:
Improvements:
- Multiple backend improvements to workflow, configuration management, app development, etc
- GPU direct rendering support for desktop applications such as Matlab, Paraview and 3D Slicer
- Ability to choose between differen...
[2024-01-31 13:45: UPDATE] The problem has been fixed now.
Since the upgrade of Slurm on Fox earlier today, there has been difficulties submitting jobs to the accel partitions; they were denied with a message like `GPU specification required, but not provided`.
We have implemented a temporary workaround now, so submitting GPU just should work again.
We hope to fix the problem properly tomorrow or Wednesday.
[2024-01-29 10:20 update] The upgrade is now finished
[2024-01-29 10:00: update] The upgrade has now started.
The queue system on Fox will be upgraded on Monday (January 29) at 10:00. During the upgrade, running jobs will be suspended, and slurm commands (squeue, sbatch, etc) will not work. We expect the upgrade to take no more than 15 minutes.
On Friday, 2024-01-19 starting at 14:30, we will upgrade Nextcloud to the latest version, which will make a downtime of ca. 30min necessary.
On 2023-12-13 from 16:00 there will be a downtime due to maintenance on the storage system.
During this period the home and project directories on Educloud VDIs will be inaccessible. Processes will hang and continue afterwards or terminate unexpectedly. To be on the safe side, save your work and log off prior to the downtime.
A maintenance reservation for the downtime has been set on Fox starting at 15:45. Jobs that cannot complete before the downtime will not be scheduled until after the downtime. Any running jobs and processes will be killed. Users will be kicked out of the login and interactive nodes at 16:00. If you had running jobs or processes at the start of the downtime, please check your job output for errors. To be on the safe side, save your work, cancel running jobs and log off prior to the downtime.
During the downtime, follow this opslog for updates.
Educloud login is currently partially down due to network issues. We are working to resolve the problem and restore normal operation.
VMWare Horizon and FOX login is working as normal, but unfortunately user management, file import/export, and Nettskjema functionality is currently unavailable.
Update:The upgrade is now done, and we have opened up Fox again. Note that there are still some things not working 100%. We are working on fixing those.
The operating system on Fox will be upgraded starting September 25 from 08:00. This is a major upgrade, and will take a week. During the upgrade, Fox will be unavailable.
Fox currently runs Rocky 8, which will be upgraded to Rocky 9, the latest version. (Rocky is a RedHat clone.) We will also upgrade the queue system (Slurm) to its latest version.
Jobs that cannot complete before the downtime will not be scheduled. Because there is a high risk that jobs submitted before the upgrade will need modification to work after the upgrade (for instance, module versions will change), all pending jobs will be cancelled when the maintenance starts.
The software available via "mod...
Due to an infrastructure issue, some Educloud web services are experiencing instability. We're working to diagnose and fix the problem.