Addressing Data Scarcity in Materials Science Research with Deep Generative Models

Korkmaz, Yilmaz; Hegde, Deepti; Chintersingh, Kerri-lee; Alemohammad, Milad; Kilic, Velat; Flickinger, Michael R; Polk, Amee L.; Bokhoor, Megan; Peters, Cole; Knio, Rami O.; Hufnagel, Todd C; A Foster, Mark; Weihs, Timothy P; Patel, Vishal M.

Yilmaz Korkmaz, Deepti Hegde, Kerri-lee Chintersingh, Milad Alemohammad, Velat Kilic, Michael R Flickinger, Amee L. Polk, Megan Bokhoor, Cole Peters, Rami O. Knio, Todd C Hufnagel, Mark A Foster, Timothy P Weihs, Vishal M. Patel; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2026, pp. 1556-1564

Abstract

Developments in deep learning have facilitated the automatic visual analysis of scientific data, driving forward exploratory research. However, these approaches depend on large amounts of expert-annotated data for effective training, which is difficult to come by in narrow application domains. In this work, we address the challenges that come with performing visual analysis of high-speed x-ray phase contrast images of the combustion of molten metal particles. In this case, manual annotations of thousands of complex frames is highly impractical. To address this, we propose a synthetic data generation framework that eliminates the need for large-scale manual labelling by generating image-annotation pairs for the task of image segmentation. We first train a denoising diffusion model with a small number of annotated samples to generate image-binary mask pairs. We use the predictions of a fine-tuned segmentation foundation model to create a multi-class semantic annotations for the synthetic dataset. We apply our framework on x-ray phase contrast videos of particle combustion. From 200 manually annotated frames, we generate 10,000 synthetic image-annotation pairs. We demonstrate that training semantic segmentation models with our generated synthetic data yields significant boost in performance.

Related Material

[pdf]

[bibtex]

@InProceedings{Korkmaz_2026_WACV, author = {Korkmaz, Yilmaz and Hegde, Deepti and Chintersingh, Kerri-lee and Alemohammad, Milad and Kilic, Velat and Flickinger, Michael R and Polk, Amee L. and Bokhoor, Megan and Peters, Cole and Knio, Rami O. and Hufnagel, Todd C and A Foster, Mark and Weihs, Timothy P and Patel, Vishal M.}, title = {Addressing Data Scarcity in Materials Science Research with Deep Generative Models}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {March}, year = {2026}, pages = {1556-1564} }