Method for building ensemble classifiers of structured and unstructured data based on a unified approach
Main Article Content
Abstract
Effectively classifying heterogeneous data, including structured and unstructured data, is essential in diverse fields such as healthcare, finance, information security, and audio content analysis. This study aims to develop a unified approach for constructing ensemble classifiers capable of handling diverse data formats within a single framework, enhancing classification accuracy and robustness. The methodology integrates feature extraction and data preprocessing techniques, transforming heterogeneous datasets to a standardized numerical format suitable for ensemble learning. Eight base classifiers including K-nearest neighbors, support vector machines, random forest, extreme gradient boosting, logistic regression, multilayer perception, convolution neural networks and long short-term memory networks–were trained with optimized hyperparameters. The ensemble classification uses stacking with various aggregation types such as hard voting, soft voting, and soft voting using Gompertz fuzzy ranking to effectively combine model predictions while accounting for uncertainty and noise. Experimental evaluation across five datasets, covering medical diagnosis, credit risk, emotion recognition, music genres and deepfake detection–demonstrates consistent improvement in accuracy and F1-score metrics, with gains up to 8 percent compared to the best individual classifiers. The approach proves particularly effective for unstructured audio data, where temporal and spectral dependencies pose significant challenges. The results underscore the versatility the proposed unified ensemble methodology in addressing class imbalance and noise offering a scalable solution adaptable to various domains. This work contributes a comprehensive framework facilitating the development of robust classifiers for complex real-world data and paves the way for future research integrating heterogeneous data sources within cohesive predictive models.