Current state of methods and algorithms for gene expression data clustering and biclustering: A survey

Authors

  • Oleg R. Yarema Ivan Franko National University of Lviv, 1, Universytetska St. Lviv 79000, Ukraine
  • Sergii A. Babichev Kherson State University, 14b Shevchenko Street, 77311, Ukraine

DOI:

https://doi.org/10.15276/hait.07.2024.24

Keywords:

Data mining, gene expression data, clustering, biclustering, decision-making system, ensemble-based methods, alternative voting

Abstract

The analysis of gene expression data has grown increasingly complex with the expansion of high-throughput techniques like bulk RNA-seq and scRNA-seq. These datasets challenge traditional clustering methods, which often struggle with the high dimensionality, noise, and variability in biological data. Consequently, biclustering methods, which group genes and conditions simultaneously, have gained popularity in bioinformatics. Biclustering is valuable for identifying co-regulated gene subsets under specific conditions, aiding in the exploration of transcriptional modules and gene-disease links. This review examines both traditional clustering and biclustering methods for gene expression analysis, covering applications such as patient stratification, gene network identification, and drug-gene interaction studies. Key biclustering algorithms are discussed, focusing on their strengths and challenges in handling complex profiles. The article highlights significant issues like hyperparameter optimization, scalability, and the need for biologically interpretable results. Emerging trends are also reviewed, such as consensus clustering and distance metrics for high-dimensional data, with attention to the limitations of evaluation metrics. The potential for these methods in diagnostic systems for diseases like cancer and neurodegenerative disorders is also considered. Finally, we outline future directions for enhancing clustering and biclustering algorithms to create a personalized medicine system based on gene expression data.

Downloads

Download data is not yet available.

Author Biographies

Oleg R. Yarema, Ivan Franko National University of Lviv, 1, Universytetska St. Lviv 79000, Ukraine

Candidate of Engineering Sciences, Associate Professor, Department of Digital economics and Business Analytics

Scopus Author ID: 59250847800

Sergii A. Babichev, Kherson State University, 14b Shevchenko Street, 77311, Ukraine

Doctor of Engineering Science, Professor, Department of Informatics, Jan Evangelista Purkyně
University in Ústí nad Labem, Pasteurova 3632/15, 400 96 Ústí nad Labem, Czech Republic
Professor of the Department of Physics, Kherson State University, 14b Shevchenko Street, Sivka-Voynylivska, Ivano-Frankivsk Oblast, 77311, Ukraine

Scopus Author ID: 57189091127

Downloads

Published

2024-11-21

How to Cite

Yarema, O. R., & Babichev, S. A. (2024). Current state of methods and algorithms for gene expression data clustering and biclustering: A survey. Herald of Advanced Information Technology, 7(4), 347–360. https://doi.org/10.15276/hait.07.2024.24