From classification to taxonomy: Automated structuring of vehicle repair names in multilingual corpora

Sergii V.  Mashtalir; Oleksandr V.  Nikolenko

doi:10.15276/hait.08.2025.9

PDF

Published:
2025-06-27

DOI: https://doi.org/10.15276/hait.08.2025.9

Keywords:

Natural Language Processing, taxonomy induction, semantic clustering, machine learning, data analysis, applied intelligent systems, data-driven automation, knowledge organization, business process automation

PDF

How to cite

How to Cite

(1)

Mashtalir, S. V. .; Nikolenko, O. V. . From Classification to Taxonomy: Automated Structuring of Vehicle Repair Names in Multilingual Corpora. Herald of Advanced Information Technology 2025, 8 (2), 151-163. https://doi.org/10.15276/hait.08.2025.9.

This article was updated to correct the Conflict of Interest statement
11.02.2026

Sergii V. Mashtalir

Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

https://orcid.org/0000-0002-0917-6622

Oleksandr V. Nikolenko

Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

https://orcid.org/0000-0002-6422-7824

Abstract

This study introduces and rigorously validates a hybrid, five-stage Natural Language Processing pipeline that transforms unstructured, bilingual repair-order text into fully navigable, hierarchical action taxonomy – bridging the gap between flat keyword classification and business-grade knowledge organization. Addressing the limitations of both traditional and modern Natural Language Processing methods in technical, noisy, and domain-specific datasets, the proposed methodology integrates advanced lemmatization, manual core dictionary creation, semantic filtering, transformer-based classification, and embedding-driven clustering. Building on advanced Ukrainian lemmatization, dynamic semantic filtering, multilingual sentence embeddings, and density clustering, the pipeline systematically overcomes the noise, code-switching, and “long-tail” rarity that typify real-world automotive datasets. Tested on a corpus of over 4.3 million service records, the approach achieves over 92 % cluster coherence with minimal manual annotation. The resulting taxonomy unlocks four immediate industrial benefits: enterprise-wide repair analytics and benchmarking across branches and brands; intent-aware chatbots capable of precise service triage and automated quotation; inventory and workforce optimization through fine-grained job statistics; and a practical blueprint for industry-level standardization of repair nomenclature and data exchange. In sum, the work demonstrates that combining minimal expert input with modern embedding techniques and density clustering can automate taxonomy induction at industrial scale, setting a new benchmark for digital transformation initiatives that depend on accurate structuring of noisy technical language.

Downloads

Download data is not yet available.

Issue

Vol. 8 No. 2 (2025): Herald of Advanced Information Technology

Topics

Section

Theoretical aspects of computer science, programming and data analysis

Authors

Author Biographies

Sergii V. Mashtalir, Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

Doctor of Engineering Science, professor, Informatics Department
Scopus Author ID: 36183980100

Oleksandr V. Nikolenko, Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

Specialist on Applied Mathematics. PhD student

Scopus Author ID: 59739709200

From classification to taxonomy: Automated structuring of vehicle repair names in multilingual corpora

How to cite

How to Cite

Abstract

Downloads

Issue

Topics

Section

Authors

Author Biographies

Sergii V. Mashtalir, Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

Oleksandr V. Nikolenko, Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

Most read articles by the same author(s)

Similar Articles

Login

Menu

Article Sidebar

How to cite

How to Cite

Main Article Content

Abstract

Downloads

Article Details

Issue

Topics

Section

Authors

Author Biographies

Sergii V. Mashtalir, Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

Oleksandr V. Nikolenko, Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

Most read articles by the same author(s)

Similar Articles

Login

Menu