Paper Image

Constructing a corpus for speech synthesis

Published on:

8 February 2024

Primary Category:

Computation and Language

Paper Authors:

Marcellus Amadeus,

William Alberto Cruz Castañeda,

Wilmer Lobato,

Niasche Aquino

Bullets

Key Details

Proposes method to build phonetic corpus for speech tech in Portuguese

Collects text data and selects sentences by triphone distribution

New classification by acoustic-articulatory features

Algorithm yields 55.8% more distinct triphones

Significantly outperforms prior phonetic-rich corpora

AI generated summary

Constructing a corpus for speech synthesis

This paper proposes a methodology to create a corpus with broad phonetic coverage for Brazilian Portuguese, a low-resourced language, to improve speech technologies. It collects diverse text datasets and applies a sentence selection algorithm based on triphone distribution, outperforming prior phonetic-rich corpora. A new triphone classification by acoustic-articulatory features is introduced since distinct triphones alone do not guarantee adequate combinatorial representation.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up