Optimizing hierarchical classifiers with parameter tuning and confidence scoring

Sergii V.  Mashtalir; Oleksandr V. Nikolenko

doi:10.15276/hait.07.2024.15

Authors

Sergii V. Mashtalir Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine
Oleksandr V. Nikolenko Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

DOI:

https://doi.org/10.15276/hait.07.2024.15

Keywords:

Natural language processing, tree-based classification, machine learning, data analysis, applied intelligent systems

Abstract

Hierarchical classifiers play a crucial role in addressing complex classification tasks by breaking them down into smaller, more manageable sub-tasks. This paper continues a series of works, focused on the technical Ukrainian texts hierarchical classification, specifically the classification of repair works and spare parts used in automobile maintenance and servicing. We tackle the challenges posed by multilingual data inputs – specifically Ukrainian, Russian, and their hybrid – and the lack of standard data cleaning models for the Ukrainian language. We developed a novel classification algorithm, which employs TF-IDF victimization with unigrams and bigrams, keyword selection, and cosine similarity for classification. This paper describes a method for training and evaluating a hierarchical classification model using parameter tuning for each node in a tree structure. The training process involves ini tializing weights for tokens in the class tree nodes and input strings, followed by iterative parameter tuning to optimize classification accuracy. Initial weights are assigned based on predefined rules, and the iterative process adjusts these weights to achieve optimal performance. The paper also addresses the challenge of interpreting multiple confidence scores from the classification process, proposing a machine learning approach using Scikit-learn's GradientBoostingClassifier to calculate a unified confidence score. This score helps assess the classification reliability, particularly for unlabeled data, by transforming input values, generating polynomial parameters, and using logarithmic transformations and scaling. The classifier is fine-tuned using hyper parameter optimization techniques, and the final model provides a robust confidence score for classification tasks, enabling the verification and classification results optimization across large datasets. Our experimental results demonstrate significant improvements in classification performance. Overall classification accuracy nearly doubled after training, reaching 92.38 %. This research not only advances the theoretical framework of hierarchical classifiers but also provides practical solutions for processing large-scale, unlabeled datasets in the automotive industry. The developed methodology can enhance various applications, including automated customer support systems, predictive maintenance, and decision-making processes for stakeholders like insurance companies and service centers. Future work will extend this approach to more complex tasks, such as extracting and classifying information from extensive text sources like telephone call transcriptions.

Downloads

Download data is not yet available.

Author Biographies

Sergii V. Mashtalir, Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

Doctor of Engineering Science. Professor, Informatics Department

Scopus Author ID: 36183980100

Oleksandr V. Nikolenko, Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

PhD student

Optimizing hierarchical classifiers with parameter tuning and confidence scoring

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Sergii V. Mashtalir, Kharkiv National University of Radio Electronics, 14, Nauky Ave. Kharkiv, 61166, Ukraine

Oleksandr V. Nikolenko, Uzhhorod National University, 14, University Str. Uzhhorod, 88000, Ukraine

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)