Paper Image

Speech data for Senegalese languages

Published on:

2 April 2024

Primary Category:

Computation and Language

Paper Authors:

Elodie Gauthier,

Aminata Ndiaye,

Abdoulaye Guissé

Bullets

Key Details

125 hours of transcribed speech data on agriculture topics

55 hours in Wolof, 32 in Pulaar, 38 in Sereer

Text corpora also provided for Wolof and Pulaar

49K word Wolof pronunciation lexicon released

Resources enable speech tech for under-resourced languages

AI generated summary

Speech data for Senegalese languages

This paper introduces speech and text datasets for three major Senegalese languages - Wolof, Pulaar, and Sereer - to enable speech technology development, especially for agriculture. 125 hours of transcribed speech data focused on agriculture are provided across the languages. Additional text corpora and a Wolof pronunciation lexicon are also released.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up