NEURAL TEXT-TO-SPEECH FOR UZBEK WITH PROSODY TRANSFER AND SPEAKER ADAPTATION

dc.contributor.authorSukhrob Avezov Sobirovich
dc.date.accessioned2025-12-28T13:15:24Z
dc.date.issued2025-09-26
dc.description.abstractIn this article we present an open, data-efficient Uzbek TTS system that integrates a non-autoregressive acoustic model with a prosody encoder and few-shot speaker adaptation. Rule-based text normalization and grapheme-to-phoneme conversion handle challenges of Uzbek orthography (Latin/Cyrillic), agglutinative morphology, and interrogative clitics. On 55 hours of speech, the proposed model improves MOS, reduces ASR-based CER, and successfully transfers reference prosody across voices with minimal data. We also release recipes, tokenizers, and evaluation metrics to support reproducible benchmarking and rapid local adaptation.
dc.formatapplication/pdf
dc.identifier.urihttps://brightmindpublishing.com/index.php/EI/article/view/1426
dc.identifier.urihttps://asianeducationindex.com/handle/123456789/6161
dc.language.isoeng
dc.publisherBright Mind Publishing
dc.relationhttps://brightmindpublishing.com/index.php/EI/article/view/1426/1454
dc.rightshttps://creativecommons.org/licenses/by/4.0
dc.sourceEducator Insights: Journal of Teaching Theory and Practice; Vol. 1 No. 9 (2025); 121-125
dc.source3061-6964
dc.subjectUzbek, text-to-speech, prosody transfer, speaker adaptation, FastSpeech-style model, HiFi-GAN, low-resource, evaluation.
dc.titleNEURAL TEXT-TO-SPEECH FOR UZBEK WITH PROSODY TRANSFER AND SPEAKER ADAPTATION
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion
dc.typePeer-reviewed Article

item.page.files

item.page.filesection.original.bundle

pagination.showing.labelpagination.showing.detail
loading.default
thumbnail.default.alt
item.page.filesection.name
sobirovich_2025_neural_text-to-speech_for_uzbek_with_pro.pdf
item.page.filesection.size
312.21 KB
item.page.filesection.format
Adobe Portable Document Format

item.page.collections