Read and Tell - Speech Dataset for Scripted and Spontaneous Scenarios

Ewelina Bartuzi-Trokielewicz, Alicja Martinek, Joanna Gajewska, Michał J. Ołowski, Donat Stankiewicz, Adam Baran, Adrian Kordas, Michał Koźbiał, Elżbieta Gomulska, Jarosław Wójtowicz; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2026, pp. 241-250

Abstract


The growing need for robust and ethically balanced speech datasets is becoming increasingly critical in the development of voice-based AI systems or biometric authentication. In this paper, we present a novel speech corpus, Read and Tell (RaT), comprising recordings from 120 speakers, balanced by gender, and diversified in age and language. Each participant contributed approximately five minutes of read speech and five minutes of spontaneous speech (ten minutes in total per speaker), recorded using various mobile phones in comparable acoustic environments. The dataset enables a systematic comparison of speaking styles and their influence on model performance. We evaluate the corpus in three core tasks: speaker verification, gender classification, and age group classification. Our experiments present significant differences in model behavior depending on speech style. The proposed dataset is a valuable resource for researchers aiming to evaluate and benchmark speech technologies that are robust, fair, and representative of real-world usage conditions. By combining demographic balance, multilingual coverage, and two distinct speaking styles within a compact, well-documented design, RaT fills a critical gap as an evaluation set for testing speech technologies under realistic, style-mismatched, and demographically diverse conditions. The dataset, available for research under license, offers a challenging benchmark to drive progress in robust, fair, and inclusive speech processing.

Related Material


[pdf]
[bibtex]
@InProceedings{Bartuzi-Trokielewicz_2026_WACV, author = {Bartuzi-Trokielewicz, Ewelina and Martinek, Alicja and Gajewska, Joanna and O{\l}owski, Micha{\l} J. and Stankiewicz, Donat and Baran, Adam and Kordas, Adrian and Ko\'zbia{\l}, Micha{\l} and Gomulska, El\.zbieta and W\'ojtowicz, Jaros{\l}aw}, title = {Read and Tell - Speech Dataset for Scripted and Spontaneous Scenarios}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {March}, year = {2026}, pages = {241-250} }