ICHEC, CeADAR to standardise Earth observation datasets for AI applications
The Irish Centre for High-End Computing (ICHEC) at NUI Galway, the national centre for High-Performance Computing (HPC), in collaboration with Ireland’s Centre for Applied AI (CeADAR) announced the first results of a project funded by the European Space Agency (ESA) to address the lack of standardisation in Earth Observation (EO) datasets for machine learning.
“The value of satellite data to projects which inform environmental policies, climate knowledge and mitigation strategies is unique,” said Dr Jenny Hanafin, Earth observation programme manager at ICHEC. “The quality and quantity of Earth Observation data has increased drastically over the past decade and the combined fleet of Sentinel-1, Sentinel-2 and Sentinel-3 currently produce an estimated data volume of ∼20 TB per day. With these volumes, EO is a prime candidate for the use of AI to assist in analysis. However, until now there have been bottlenecks holding back the use of this data in such applications.”
The AIREO project addresses some of these issues by introducing new specifications and best-practice guidelines for creating datasets. This work will vastly improve the ability to share training data for scientific research, and for the commercial and technical AI community as well as lower the cost of sharing this data. By providing common specifications so that training datasets follow FAIR principles (Findable, Accessible, Interoperable and Reusable), data produced for one application will be made available for other users and uses.
“This project set out to produce resources to support the training and development of machine learning models on EO data,” said Alastair McKinstry, environmental programme manager at ICHEC. “The aim is to move towards implementing FAIR data principles for training data in EO, ensuring that datasets are properly documented and available to other users. Each dataset is a valuable resource, costing many man-hours and facilitating the understanding and sharing of these data resources is the main goal.
“An additional goal is to make EO training data sets self-explanatory (“AI-ready”) in order to expose challenging problems to a wider audience that does not have expert geospatial knowledge. Key elements that are addressed in the AIREO specification are granular and interoperable metadata, innovative Quality Assurance metrics, data provenance and processing history as well as integrated feature engineering recipes that optimise platform independence. Several pilot datasets have been developed following the AIREO data specifications, including global forest biomass, sea ice detection and crop type classification. A Python library for the easy exploitation of these datasets has also been developed to allow the Training Datasets (TDS) to work against EO catalogs.”
The research was funded by European Space Agency (ESA). Ireland has been a member of ESA since 1975. Membership of ESA is a key element of the National Space Strategy for Enterprise, enabling Irish companies and research institutes to bid on ESA tenders to develop technologies, services and business applications.
ICHEC supports novel scientific research in using extreme datasets on the national supercomputer ‘Kay’ by enabling scalable AI and Big Data Analytics and leveraging the SPÉir online platform to allow national users to access the ESA Sentinel data archive.