NEURAL TEXT-TO-SPEECH FOR UZBEK WITH PROSODY TRANSFER AND SPEAKER ADAPTATION

Sukhrob Avezov Sobirovich

NEURAL TEXT-TO-SPEECH FOR UZBEK WITH PROSODY TRANSFER AND SPEAKER ADAPTATION

item.page.files

sobirovich_2025_neural_text-to-speech_for_uzbek_with_pro.pdf (312.21 KB)

item.page.date

2025-09-26

item.page.authors

Sukhrob Avezov Sobirovich

item.page.publisher

Bright Mind Publishing

item.page.abstract

In this article we present an open, data-efficient Uzbek TTS system that integrates a non-autoregressive acoustic model with a prosody encoder and few-shot speaker adaptation. Rule-based text normalization and grapheme-to-phoneme conversion handle challenges of Uzbek orthography (Latin/Cyrillic), agglutinative morphology, and interrogative clitics. On 55 hours of speech, the proposed model improves MOS, reduces ASR-based CER, and successfully transfers reference prosody across voices with minimal data. We also release recipes, tokenizers, and evaluation metrics to support reproducible benchmarking and rapid local adaptation.

item.page.subject

Uzbek, text-to-speech, prosody transfer, speaker adaptation, FastSpeech-style model, HiFi-GAN, low-resource, evaluation.

item.page.uri

https://brightmindpublishing.com/index.php/EI/article/view/1426
https://asianeducationindex.com/handle/123456789/6161

item.page.collections

Published Articles

item.page.link.full

NEURAL TEXT-TO-SPEECH FOR UZBEK WITH PROSODY TRANSFER AND SPEAKER ADAPTATION

item.page.files

item.page.date

item.page.authors

item.page.journal-title

item.page.journal-issn

item.page.volume-title

item.page.publisher

item.page.abstract

item.page.description

item.page.subject

item.page.citation

item.page.uri

item.page.collections

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced