CROSS Harmonization & HPC modelization of FOREST Datasets

Ontologies

Semantic Web technologies have emerged in the last decades as a way to publish heterogeneous data in a standard and interoperable way. These technologies allow to publish data in a self-describing way that can connect with data provided by the same or different entities. In the forestry domain, it permits to open the data to the general public, and link with related ontologies about geographical and political territory, or description of species.

In the Cross-Forest project, a set of eleven ontologies was created within this task. These ontologies make use of the Best Practices of Spatial Data on the Web, as well as design standards, which guarantee its correctness and usability. These ontologies to represent and publish forest inventory and cartographic data can be reused any country to publish its forest data, in an open, standard format. These data will be self-describing and interoperable, allowing them to be used by the general public, and to connect with other data, be it forest data from different locations or any other type of data, enriching its possibilities of use.

The main ontologies published in Crossforest are:

  • Third Spanish Forest Inventory (Tercer Inventario Forestal Nacional de España – IFN3 for Spain)
  • Spanish Land Cover Map 1:50.000 (Mapa Forestal Nacional de España 1:50.000 – MFE50 for Spain)
  • Spanish Soil Erosion Inventory (Inventario Nacional Erosión de Suelos de España – INES for Spain)
  • Sixth Portuguese Forest Inventory (Inventário Florestal Nacional – IFN6 for Portugal)
  • Portuguese Land Cover Map 2018 (Carta de Uso e Ocupação do Solo de Portugal Continental – COS18 for Portugal),
  • And Iberian Forest Fires Statistics

These ontologies are interrelated between them and enriched linked to external ontologies (see Figures below).

figure 1
Ontologies in Crossforest
figure 2
Established links to other ontologies

The goal of these ontologies is to serve as the schema for the data that has been generated in the next steps (activity 2). The ontologies and data, in addition to being published for public use, has been used as input data for the pilots generated as a result of activity 3: FRAME (Forest fiRes Advanced ModElization) and CAMBrIc (CAlidad de la Madera en Bosques mIxtos).

The ontologies have a modular design. They are divided in six main groups:

  • 3 high level ontologies (position-core, measures-core, and epsg-core) that describe concepts about measures and positions, needed across all datasets.
  • 6 forestry modules, used to describe the data of forest inventories and land cover maps. There are two general ontologies that provide broad concepts usable by any country (ifi-core and ilu-core), two ontologies that describe Spanish data (ifn-core and mfe-core), and two ontologies that describe Portuguese data (ifn-pt-core and cos-core).
  • An ontology for describing data about types of soil erosion (ines).
  • An ontology for the publication of data about Iberian Forest Fires Statistics (incendios-forestales).
  • A number of alignment modules to link data of the core modules to external ontologies or datasets. These modules make use of the Subproperty of an external Property and Subclass of an external Class design patterns. They are divided in three for each pair (core module, external ontology):
    • Tbox links: These modules link the schema of the module with the schema of an external ontology, making use of rdfs:subPropertyOf, rdfs:subClassOf, owl:equivalentProperty, and/or owl:equivalentClass
    • sameAs links: These modules link the individuals of a module with the individuals of an external dataset, using the owl:sameAs
    • Abox links: These modules link the individuals of a module with the individuals of an external dataset, using schema:sameAs (or a specialization of it) property.
  • An ontology for the publication of raster data on three different grids of cells, with cell lengths of 25 meters, 1 kilometer, and 10 kilometers.

 

This design has two main goals:

Increase the reuse by external agents: Each of the ontologies is easier to understand separately, and the separation makes it possible for any user to load the data they need, disregarding the rest. Additionally, since several ontologies and datasets exist for the domains we use, by creating our own terms and linking them to several external ontologies, we do not limit the user to one of them: they can choose the set of alignment modules of their preference.
Make safe reuse of external ontologies and data. While reuse of data is a well-known principle in the Semantic Web, making explicit alignment (that is, linking the terms of the ontology with terms of the external ontology by means of the properties mentioned before) is considered a better practice that direct reuse of the terms. Direct reuse has the risk of what is known as ontology hijacking, that is, giving additional semantics to external terms that can negatively impact the semantics of the data (e.g., generating undesirable inferences).


Details of the development of the ontologies can be checked in the deliverable D1.1 and D2.3 available below for access.

The set of ontologies are published with open access under CC BY 4.0 License. They can be currently accessed at https://github.com/Cross-Forest/Ontologies.