CROSS Harmonization & HPC modelization of FOREST Datasets

Ontologies

Ontologies

Semantic Web technologies have emerged in the last decades as a way to publish heterogeneous data in a standard and interoperable way. These technologies allow to publish data in a self-describing way that can connect with data provided by the same or different entities. In the forestry domain, it permits to open the data to the general public, and link with related ontologies about geographical and political territory, or description of species.

In the Cross-Forest project a set of seven ontologies was created within this task, accumulating a total of 295 classes, 91 object properties, 35 data type properties and 20,932 named individuals. These ontologies make use of the Best Practices of Spatial Data on the Web, as well as design standards, which guarantee its correctness and usability. These ontologies to represent and publish forest inventory and cartographic data can be reused any country to publish its forest data, in an open, standard format. These data will be self-describing and interoperable, allowing them to be used by the general public, and to connect with other data, be it forest data from different locations or any other type of data, enriching its possibilities of use. Two of the core ontologies allow representing the data of the Spanish National Forest Inventory and the Spanish Forest Map, while the other three permit to represent positions and measures, which are needed in the first two ones. These ontologies are interrelated between them and enriched linked to external ontologies.

The goal of these ontologies is to serve as the schema for the data that will be generated in the next steps (activity 2). The ontologies and data, in addition to being published for public use, will be used as input data for the pilots generated as a result of activity 3: FRAME (Forest fiRes Advanced ModElization) and CAMBrIc (CAlidad de la Madera en Bosques mIxtos).

Semantic Web technologies have emerged in the last decades as a way to publish heterogeneous data in a standard and interoperable way. These technologies allow to publish data in a self-describing way that can connect with data provided by the same or different entities. In the forestry domain, it permits to open the data to the general public, and link with related ontologies about geographical and political territory, or description of species.

The ontologies have a modular design. They are divided in three main groups:

  • 5 core modules, used to describe the data of forest inventories and maps. From those, two of them (ifn-core and mfe-core) are devoted fo forestry data, while the other three (position-core, measures-core, and epsg-core) are higher level ontologies to describe needed concepts about measures and positions, and can be of interest outside of forestry domain.
  • 2 modules that we call “raw”. The goal of these ontologies is to give provenance data about where in the original tables was the data. This helps to ensure correctness and reproducibility of the task.
  • A number of alignment modules to link data of the core modules to external ontologies or datasets. These modules make use of the Subproperty of an external Property and Subclass of an external Class design patterns. They are divided in three for each pair (core module, external ontology):

– Tbox links: These modules link the schema of the module with the schema of an external ontology, making use of rdfs:subPropertyOf, rdfs:subClassOf, owl:equivalentProperty, and/or owl:equivalentClass properties.

– sameAs links: These modules link the individuals of a module with the individuals of an external dataset, using the owl:sameAs property.

– Abox links: These modules link the indivuduals of a module with the individuals of an external dataset, using schema:sameAs (or a specialization of it) property.

This design has two main goals:

  1. Increase the reuse by external agents: Each of the ontologies is easier to understand separately, and the separation makes it possible for any user to load the data they need, disregarding the rest. Additionally, since several ontologies and datasets exist for the domains we use, by creating our own terms and linking them to several external ontologies we don’t limit the user to one of them: they can choose the set of alignment modules of their preference.
  2. Make safe reuse of external ontologies and data. While reuse of data is a well-known principle in the Semantic Web, making explicit alignment (that is, linking the terms of the ontology with terms of the external ontology by means of the properties mentioned before) is considered a better practice that direct reuse of the terms. Direct reuse has the risk of what is known as ontology hijacking, that is, giving additional semantics to external terms that can negatively impact the semantics of the data (e.g., generating undesirable inferences).

Details of the development of the ontologies can be checked in the deliverable D1.1 and D3.2, available below for access.