CROSS Harmonization & HPC modelization of FOREST Datasets

High Performance Computing (HPC)

High Performance Computing (HPC)

Supercomputers have a large number of processors that can be used simultaneously by the programs to calculate complex tasks in much shorter times. To achieve that a program uses several processors in its execution it needs to be designed to use some type of simultaneous or parallelism execution. There are several technologies to fulfil this goal: MPI, OpenMP, MPI+OpenMP and simultaneous execution of multiple copies of the program through the use of Job Arrays managed by the HPC infrastructure job manager. From the beginning, the migration to open technology environments and standards has been considered.

Scayle’s own and common tools for the management of users/clients and of the works sent to Calendula Supercomputer, are used by the pilots of the project and are also used by the members of the partnership to send their simulations and feed new data to the platform set up. These are the Job Manager and the application manager that allow a strict control of the traceability of the tasks and jobs requested by the customers, both internal and external. 

Calendula

The models for simulations of fire propagation and their effects together with fire suppression technics (use case/demonstrator FRAME), and for forecasting wood quality in mixed forests on big surfaces (use case/demonstrator CAMBrIc) are also considered. The performance of the models and the simulations has been evaluated to identify the best adaptation of the algorithms to the calculation environment.

HPC Infrastructure

In the case of the pilot CAMBrIc, the software was initially designed for use in personal computers and workstations with Microsoft Windows environments, so it was essential to make a prior work of adaptation to the HPC environment. The study phase of the code to be ported to the Python programming language was started, fully supported by the supercomputer operating system and using the Dask framework. Dask is presented as the most suitable option for the needs of the migration as it allows to face the development of the parts of the code that can be parallelized, at the same time that it allows to generate the version to be executed in personal computers and workstations. As an important added advantage, Dask has libraries that allow integration with the task manager installed in the HPC infrastructure. The SIMANFOR simulator is used to carry out two of the main tasks of the CamBrIc pilot (calculating stocks and simulating their evolution). SIMANFOR is a web application that allows the simulation of sustainable forest management alternatives. It integrates different modules to manage forest inventories, simulate and project different stand conditions (through algorithms and formulas for prediction and projection), query systems, simulation outputs and security system.

For the HPC integration of the FRAME software, two parallel approaches have been considered. On one hand, the adaptation of the calculation routines was considered, excepting the graphic visualization routines that do not make sense in a supercomputing environment, to be executed inside SLURM job arrays. In parallel, work was done to study the migration of the code written in C# to C/C++, which will increase cross-platform compatibility and facilitate the development of parallel code using MPI libraries.

Initially, an attempt was made to accomplish this task by installing MONO (an open source implementation of Microsoft’s .NET Framework based on the ECMA standards for C#), but this proved unsatisfactory and was subsequently discarded by uninstalling this software from the computers on which it had previously been installed.

Subsequently, the effort was made to rewrite the code to .NET Core giving very satisfactory results and achieving the required stability in the tests.

.NET Core is a general-purpose open source development platform maintained by Microsoft and the .NET community on GitHub. It is cross-platform, supports Windows, MacOS and Linux and can be used to compile device, cloud and IoT applications, which were used positively by TRAGSATEC to achieve its objectives.

The operational needs of the pilots were assessed and appropriate HPC capabilities were sized. In the case of the pilots, both share the same structure and characteristics with respect to the hardware for each of the resources that were designated for each one. In the case of the software, it has been necessary to provide them with different characteristics since the two pilots came from different development environments and, in addition, neither of them had, in their initial design, a codification foreseen for a future parallelization of their code nor a simultaneity of simulations.

An easily scalable storage infrastructure, including backup and simulation history restoration procedures, has been designed and implemented in order to store the simulations carried out for future study.

Moreover, virtualization resources were used to host an end point and also serve the forest viewer/scouting device (Forest Explorer).

Vmware_design_blog_virtualizacion

The HPC infrastructures required for the execution of the project have been installed and have been in operation for several months. The partners involved in the different tasks have access accounts to the SCAYLE systems, and the tests and applications are working properly.

 

More information: 
Operational analysis and specifications.