Current state of methods and algorithms for gene expression data clustering and biclustering: A survey
DOI:
https://doi.org/10.15276/hait.07.2024.24Keywords:
Data mining, gene expression data, clustering, biclustering, decision-making system, ensemble-based methods, alternative votingAbstract
The analysis of gene expression data has grown increasingly complex with the expansion of high-throughput techniques like bulk RNA-seq and scRNA-seq. These datasets challenge traditional clustering methods, which often struggle with the high dimensionality, noise, and variability in biological data. Consequently, biclustering methods, which group genes and conditions simultaneously, have gained popularity in bioinformatics. Biclustering is valuable for identifying co-regulated gene subsets under specific conditions, aiding in the exploration of transcriptional modules and gene-disease links. This review examines both traditional clustering and biclustering methods for gene expression analysis, covering applications such as patient stratification, gene network identification, and drug-gene interaction studies. Key biclustering algorithms are discussed, focusing on their strengths and challenges in handling complex profiles. The article highlights significant issues like hyperparameter optimization, scalability, and the need for biologically interpretable results. Emerging trends are also reviewed, such as consensus clustering and distance metrics for high-dimensional data, with attention to the limitations of evaluation metrics. The potential for these methods in diagnostic systems for diseases like cancer and neurodegenerative disorders is also considered. Finally, we outline future directions for enhancing clustering and biclustering algorithms to create a personalized medicine system based on gene expression data.