PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests

Charles Gaydon, Floryne Roche; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 5895-5904

Abstract


Sustainable forest management is a cornerstone of climate and environmental action. Responsible management relies on forest models such as biomass or fire vulnerability estimates which depend on mapping the spatial distribution of tree species. Because forest mapping relies heavily on visual identification it is a time-consuming endeavor that would benefit from more automation. To develop scalable automated methods researchers need suitable benchmarks but unfortunately tree species classification datasets are scarce. Most of them contain only imagery data which capture signal only at the canopy level and ignore the three-dimensional structure of the forest making it very difficult to distinguish between different species. Lidar which penetrates the canopy and captures the geometry of trees provides a rich signal for distinguishing tree species. As an increasingly available and cost-effective remote sensing modality it could become the new standard for large-scale tree species mapping. However current Lidar benchmarks for tree species classification are extremely limited and all insufficient in size and diversity; with less than 1500 annotated trees from at most a dozen different sites they fall short from the requirements of deep learning development. To fill this data gap we release PureForest: a large-scale open multimodal dataset designed for tree species classification from both Aerial Lidar Scanning point clouds and Very High Resolution aerial images. PureForest covers 339 km2 in 449 different monospecific forests with verified labels for 18 tree species grouped into 13 semantic classes. It is the largest and most comprehensive Lidar dataset for tree species identification. By making PureForest publicly available we hope to provide a challenging benchmark to support the development of deep learning approaches for tree species identification from Lidar and aerial imagery. In this data paper we describe the annotation workflow and the dataset and we establish a baseline performance from both 2D and 3D modalities.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Gaydon_2025_WACV, author = {Gaydon, Charles and Roche, Floryne}, title = {PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {5895-5904} }