8 February 2024
Computation and Language
William Alberto Cruz Castañeda,
Proposes method to build phonetic corpus for speech tech in Portuguese
Collects text data and selects sentences by triphone distribution
New classification by acoustic-articulatory features
Algorithm yields 55.8% more distinct triphones
Significantly outperforms prior phonetic-rich corpora
Constructing a corpus for speech synthesis
This paper proposes a methodology to create a corpus with broad phonetic coverage for Brazilian Portuguese, a low-resourced language, to improve speech technologies. It collects diverse text datasets and applies a sentence selection algorithm based on triphone distribution, outperforming prior phonetic-rich corpora. A new triphone classification by acoustic-articulatory features is introduced since distinct triphones alone do not guarantee adequate combinatorial representation.
No comments yet, be the first to start the conversation...
Sign up to comment on this paper