CLUSTERING OF SMALL-SCALE UZBEK TEXTS USING TF-IDF AND KMEANS: AN EMPIRICAL EVALUATION OF VECTORIZATION PARAMETERS

Elyor Hayitmamatovich Egamberdiyev

CLUSTERING OF SMALL-SCALE UZBEK TEXTS USING TF-IDF AND KMEANS: AN EMPIRICAL EVALUATION OF VECTORIZATION PARAMETERS

item.page.files

egamberdiyev_2025_clustering_of_small-scale_uzbek_texts_us.pdf (530.96 KB)

item.page.date

2025-07-26

item.page.authors

Elyor Hayitmamatovich Egamberdiyev

item.page.publisher

Modern American Journals

item.page.abstract

In this study, we conduct a systematic evaluation of TF-IDF vectorization parameters for clustering small-scale Uzbek-language textual data using the K Means algorithm. While TF-IDF is a widely-used and computationally efficient technique for text representation, it lacks the ability to capture semantic meaning—especially in low-resource languages like Uzbek where pretrained semantic models are limited or unavailable. The primary goal of this research is to assess the impact of various TF-IDF configuration parameters—including n-gram range, maximum and minimum document frequency thresholds, normalization techniques, and custom stopword filtering—on the quality of clustering short and domain-specific Uzbek texts. We designed a dataset of seven manually curated sentences grouped into three distinct semantic categories: tourism and relaxation, artificial intelligence, and aquatic life.

item.page.subject

TF-IDF vectorization, text clustering, Uzbek NLP, KMeans algorithm, short-text analysis, parameter tuning, semantic coherence, low-resource language processing.

item.page.uri

https://usajournals.org/index.php/2/article/view/742
https://asianeducationindex.com/handle/123456789/4297

item.page.collections

Modern American Journal of Engineering, Technology, and Innovation

item.page.link.full

CLUSTERING OF SMALL-SCALE UZBEK TEXTS USING TF-IDF AND KMEANS: AN EMPIRICAL EVALUATION OF VECTORIZATION PARAMETERS

item.page.files

item.page.date

item.page.authors

item.page.journal-title

item.page.journal-issn

item.page.volume-title

item.page.publisher

item.page.abstract

item.page.description

item.page.subject

item.page.citation

item.page.uri

item.page.collections

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced