Heterogeneous ensemble classifier in computer systems for medical diagnostics
DOI:
https://doi.org/10.15276/hait.07.2024.26Keywords:
Medical diagnostics, ensemble classifier, base model, probabilistic classifier, symptom complex, expert information, model aggregations, decision support systemAbstract
The work is dedicated to the solution of an important scientific and technical problem: building a diagnostic decision-support system in medicine. The foundation of this system is a model developed as a heterogeneous ensemble classifier, which implements two primary approaches to formulating a diagnostic conclusion through basic models. The first of these approaches is probabilistic. It is based on the analysis of a training sample of patients with a confirmed diagnosis, which enables estimation of the probability of the presence of a particular disease based on available data. The second approach is expert-based, relying on expert information about the structure of symptom complexes that characterize each individual disease. It is important to note that both of these approaches address the same problem from different perspectives, and their combined use holds great promise for developing effective diagnostic systems. The purpose of this study is to synthesize a heterogeneous ensemble classifier that integrates both expert and probabilistic components into the diagnostic process. An analysis of various diagnostic methods used by doctors in alignment with the current requirements of evidence-based medicine was carried out as part of the study. Methods of constructing diagnostic decision rules in medical decision-support systems were also considered. Based on these studies, a mathematical model of a heterogeneous ensemble classifier was developed, with the choice of its constituent parts being justified. Widely used classification methods were selected as the probabilistic component in this system, particularly the standard comparison method, the k-nearest neighbors method, and the potential functions method. Expert knowledge concerning the structure of symptom complexes is formalized by expressing the symptom complexes of each disease in the form of numerical intervals. In this framework, linguistic variables are used, which can indicate “below the norm”, “norm”, or “above the norm”. Various strategies for aggregating different types of basic models within the heterogeneous ensemble classifier are reviewed. This approach preserves the advantages of each method and enhances the overall classification accuracy. Requirements for the developed system's functionality were formulated, design tools and the main development platform (Java) were defined, and the database management system (MySQL). The decision-support system was designed, and a comprehensive evaluation of the developed system was conducted on real medical data. The results of these tests confirmed the effectiveness of the system.