CLUSTERING OF SMALL-SCALE UZBEK TEXTS USING TF-IDF AND KMEANS: AN EMPIRICAL EVALUATION OF VECTORIZATION PARAMETERS
| dc.contributor.author | Elyor Hayitmamatovich Egamberdiyev | |
| dc.date.accessioned | 2025-12-28T10:50:19Z | |
| dc.date.issued | 2025-07-26 | |
| dc.description.abstract | In this study, we conduct a systematic evaluation of TF-IDF vectorization parameters for clustering small-scale Uzbek-language textual data using the K Means algorithm. While TF-IDF is a widely-used and computationally efficient technique for text representation, it lacks the ability to capture semantic meaning—especially in low-resource languages like Uzbek where pretrained semantic models are limited or unavailable. The primary goal of this research is to assess the impact of various TF-IDF configuration parameters—including n-gram range, maximum and minimum document frequency thresholds, normalization techniques, and custom stopword filtering—on the quality of clustering short and domain-specific Uzbek texts. We designed a dataset of seven manually curated sentences grouped into three distinct semantic categories: tourism and relaxation, artificial intelligence, and aquatic life. | |
| dc.format | application/pdf | |
| dc.identifier.uri | https://usajournals.org/index.php/2/article/view/742 | |
| dc.identifier.uri | https://asianeducationindex.com/handle/123456789/4297 | |
| dc.language.iso | eng | |
| dc.publisher | Modern American Journals | |
| dc.relation | https://usajournals.org/index.php/2/article/view/742/815 | |
| dc.rights | https://creativecommons.org/licenses/by/4.0 | |
| dc.source | Modern American Journal of Engineering, Technology, and Innovation; Vol. 1 No. 4 (2025); 58-67 | |
| dc.source | 3067-7939 | |
| dc.subject | TF-IDF vectorization, text clustering, Uzbek NLP, KMeans algorithm, short-text analysis, parameter tuning, semantic coherence, low-resource language processing. | |
| dc.title | CLUSTERING OF SMALL-SCALE UZBEK TEXTS USING TF-IDF AND KMEANS: AN EMPIRICAL EVALUATION OF VECTORIZATION PARAMETERS | |
| dc.type | info:eu-repo/semantics/article | |
| dc.type | info:eu-repo/semantics/publishedVersion | |
| dc.type | Peer-reviewed Article |
item.page.files
item.page.filesection.original.bundle
pagination.showing.detail
loading.default
- item.page.filesection.name
- egamberdiyev_2025_clustering_of_small-scale_uzbek_texts_us.pdf
- item.page.filesection.size
- 530.96 KB
- item.page.filesection.format
- Adobe Portable Document Format