How we used Steep to improve our training pipeline for AI-based forest type classification

Steep can help train a neural network in an efficient, comprehensible, and flexible way. Our AI model can be used to respond to climate change.

Climate change is no longer something we can address in the future. It has arrived in the present. Its effects can be felt in many places: sea levels are rising, extreme weather events happen more often, and entire ecosystems are changing. Comprehensive monitoring is important to be able to respond to these transformations. We need to know where and how extreme things change. This enables us to identify trends early and respond appropriately.

One tool for monitoring is artificial intelligence (AI). It is able to recognize patterns on large amounts of data and derive knowledge. The problem is that training an AI is very complex. Large amounts of data have to be analyzed, a lot of compute time has to be spent, and in the end, training processes have to be well documented so they remain comprehensible. This is where Steep can help.

In this article, we demonstrate how we have used Steep to train a neural network for forest type classification on satellite imagery. Our training data were pairs of known forest areas and matching satellite images. Our aim was that the final model should have learned how forests look like, so it would later be able to perform semantic segmentation on new images. For training, we wrote a workflow in Steep.

We started by searching for relevant satellite imagery from Sentinel 2 in the Copernicus Open Access Hub. Criteria were cloud coverage and a bounding box of the relevant area. The parameters were passed to the workflow. This way, we could esily change them to train on another region. The identified satellite images were downloaded and processed in parallel. Instead of Sentinel 2, by replacing the download service, other data sources could be used, such as Landsat data or high-resolution images from Planet. For each image, its extent was extracted and corresponding ground truth data was fetched from an external API and rasterized. The subsequent Split service created tiles from the generated ground truth image, ensuring the resolution required by the AI model.

Additionally, every downloaded image was filtered to remove unneeded frequency bands, converted to PNG and cropped into tiles, similar to the ground truth image. The Combine service copied the pairs of ground truth and satellite images into a common directory. The final TensorFlow service took this directory together with a model definition and learned how different forest types look in satellite images.

“Steep automatically parallelized the calculations, which saved us valuable time.” — Hendrik M. Würz

During the development of our AI model, we executed the workflow several times with slight modifications. For example, we tested which frequency bands were important, what influence seasons had on the classification, or which AI model performed best. All of these tests were managed by Steep and we could directly compare the accuracies of the trained networks with each other.

Another benefit was that Steep automatically allocated all required resources for us. For preprocessing, no expensive GPU is needed, so Steep scheduled these steps on low-cost CPU machines. The training, on the other hand, was executed on graphics cards. We often started the workflow several times to try out different configurations simultaneously. Steep automatically parallelized the calculations, which saved us valuable time.

We published our results in two papers, where you can find more background information regarding the infrastructure, our findings on the different AI models, and the resulting quality of the AI:

Kocon, K., Krämer, M., and Würz, H. M.: Comparison of CNN-based segmentation models for forest type classification, AGILE GIScience Ser., 3, 42, https://doi.org/10.5194/agile-giss-3-42-2022, 2022.

Würz, H. M., Kocon, K., Pedretscher, B., Klien, E., and Eggeling, E.: A Scalable AI Training Platform for Remote Sensing Data, AGILE GIScience Ser., 4, 53, https://doi.org/10.5194/agile-giss-4-53-2023, 2023.

(Parts of this article were taken from the second paper.)

How we used Steep to improve our training pipeline for AI-based forest type classification

← Showcase

Deutsche Telekom