From classification to taxonomy: Automated structuring of vehicle repair names in multilingual corpora

Main Article Content

Sergii V. Mashtalir
Oleksandr V. Nikolenko

Abstract

This study introduces and rigorously validates a hybrid, five-stage Natural Language Processing pipeline that transforms unstructured, bilingual repair-order text into fully navigable, hierarchical action taxonomy – bridging the gap between flat keyword classification and business-grade knowledge organization. Addressing the limitations of both traditional and modern Natural Language Processing methods in technical, noisy, and domain-specific datasets, the proposed methodology integrates advanced lemmatization, manual core dictionary creation, semantic filtering, transformer-based classification, and embedding-driven clustering. Building on advanced Ukrainian lemmatization, dynamic semantic filtering, multilingual sentence embeddings, and density clustering, the pipeline systematically overcomes the noise, code-switching, and “long-tail” rarity that typify real-world automotive datasets. Tested on a corpus of over 4.3 million service records, the approach achieves over 92 % cluster coherence with minimal manual annotation. The resulting taxonomy unlocks four immediate industrial benefits: enterprise-wide repair analytics and benchmarking across branches and brands; intent-aware chatbots capable of precise service triage and automated quotation; inventory and workforce optimization through fine-grained job statistics; and a practical blueprint for industry-level standardization of repair nomenclature and data exchange. In sum, the work demonstrates that combining minimal expert input with modern embedding techniques and density clustering can automate taxonomy induction at industrial scale, setting a new benchmark for digital transformation initiatives that depend on accurate structuring of noisy technical language.

Downloads

Download data is not yet available.

Article Details

Topics

Section

Theoretical aspects of computer science, programming and data analysis

Authors

Author Biographies

Sergii V. Mashtalir, Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

Doctor of Engineering Science, professor, Informatics Department
Scopus Author ID: 36183980100

Oleksandr V. Nikolenko, Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

Specialist on Applied Mathematics. PhD student

Scopus Author ID: 59739709200 

Similar Articles

You may also start an advanced similarity search for this article.