END-TO-END UZBEK-RUSSIAN SPEECH TRANSLATION WITH SELF-SUPERVISED PRETRAINING
loading.default
item.page.date
item.page.authors
item.page.journal-title
item.page.journal-issn
item.page.volume-title
item.page.publisher
Web of Journals Publishing
item.page.abstract
In this article we study end-to-end Uzbek→Russian speech translation under realistic low-resource and code-switching conditions. We couple a wav2vec-style encoder pre-trained on unlabeled audio with a Transformer decoder, add multi-task ASR/CTC objectives, and distill from a strong cascade teacher. Script-aware tokenization and data augmentation reduce sparsity. On conversational and broadcast tests the model improves BLEU/chrF at fixed latency and yields fewer morphology and NE errors.