NOMAD for Data Curation

This project focused on designing and deploying a digital solution for curation, analysis and publication of data produced in the Temporal Analysis of Products project by exploring and implementing NOMAD within EduCloud. Key outcomes included a successful private deployment of NOMAD, a shift towards a modular code design for improved extensibility, and the provision of deployment instructions and example scripts for ease of access.

Abstract

This project was initiated by Dr. Evgeniy Redekop to design and deploy a digital solution for curating and analyzing experimental and simulated data from Temporal Analysis of Products (TAP) project. The research group belongs to the Catalysis and Organic Chemistry Section (Catalysis and Organic Chemistry - Department of Chemistry) at the Department of Chemistry, University of Oslo. The primary objectives for the project included exploring NOMAD as a potential solution, assessing its deployment on EduCloud, and making design improvements to the existing codebase for long-term extensibility. After evaluating various options, we successfully tested and deployed a private NOMAD instance on EduCloud, granting restricted access to select collaborators. Concurrently, our review of the original XML etree-based code structure revealed its inadequacy for future extensions. Consequently, we recommended adopting a class-based modular design, which facilitates independent module development and reduces maintenance efforts. To further support the development process, we provided reusable instructions for deploying NOMAD on EduCloud, along with example scripts for programmatic access through NOMAD's API. After a successful deployment and testing phase (currently ongoing), the NOMAD solution has a high potential to be extended to other data curation needs in the Catalysis section and beyond (e.g. gas chromatography data, mass spectrometry data, operando spectroscopy data). This highlights the scope of positive impact this project will have on the digital workflow and scientific advancements in the Section.

Background

The NOMAD (Novel Materials Discovery) platform is widely used for storing and analyzing computational materials data. While the public NOMAD repository offers a powerful framework, it does not provide the support for setting up a private space where a research group can collaborate. EduCloud, with its secure infrastructure, provides an environment where NOMAD can be deployed privately and restricted to specific collaborators. In addition to platform considerations, the existing code used to handle datasets relied on an ElementTree XML API-based solution. Although functional, this approach was rigid and difficult to extend, limiting the ability to incorporate new workflows or adapt to changing requirements. To achieve both secure data management and sustainable software design, a more flexible system was needed. For more background information on the Temporal Analysis of Products (TAP) project visit the following link.

Methodology

The project started by evaluating the suitability of NOMAD for EduCloud deployment. A private instance was set up and tested, while making it compatible with EduCloud networking requirements. Having a private setup in EduCloud gives the advantage of secure access, backup, and usage limitation to collaborators. Next, the existing codebase was reviewed to compare two possible design directions: continuing with the XML etree approach or refactoring into a modular class-based design. The etree solution was found to be hard to extend, whereas the class-based design allowed modules to be developed independently, saving time and improving maintainability. Finally, a reusable introduction for setting up NOMAD on EduCloud was created, enabling future teams to reproduce the deployment. To help automation, example scripts were also developed to access NOMAD programmatically through its API, allowing integration into other workflows.

Bildet kan inneholde: tekst, skrifttype, skjermbilde, teknologi, programvare.

All scripts and documentation can be found on our GitHub page.

If you would like to contribute to or use the project, please contact Evgeniy for access.

Published Aug. 29, 2025 2:53 PM - Last modified Aug. 29, 2025 2:53 PM