Partitioning the data space before applying hashing using clustering algorithms
DOI:
https://doi.org/10.15276/hait.8.2025.2Keywords:
adaptive encoding tree, bidirectional encoder representations from transformers cauterization, dimensionality reduction, approximate nearest neighbor, multimodal data; root nodeAbstract
This research presents a locality-sensitive hashing framework that enhances approximate nearest neighbor search efficiency by integrating adaptive encoding trees and BERT-based clusterization. The proposed method optimizes data space partitioning before applying hashing, improving retrieval accuracy while reducing computational complexity. First, multimodal data, such as images and textual descriptions, are transformed into a unified semantic space using pre-trained bidirectional encoder representations from transformers embeddings. this ensures cross-modal consistency and facilitates high-dimensional similarity comparisons. Second, dimensionality reduction techniques like Uniform Manifold Approximation and Projection or t-distributed stochastic neighbor embedding are applied to mitigate the curse of dimensionality while preserving key relationships between data points. Third, an adaptive encoding tree locality-sensitive hashing encoding tree is constructed, dynamically segmenting the data space based on statistical distribution, thereby enabling efficient hierarchical clustering. Each data point is converted into a symbolic representation, allowing fast retrieval using structured hashing. Fourth, locality-sensitive hashing is applied to the encoded dataset, leveraging p-stable distributions to maintain high search precision while reducing index size. The combination of encoding trees and Locality-Sensitive Hashing enables efficient candidate selection while minimizing search overhead. Experimental evaluations on the CarDD dataset, which includes car damage images and annotations, demonstrate that the proposed method outperforms state-of-the-art approximate nearest neighbor techniques in both indexing efficiency and retrieval accuracy. The results highlight its adaptability to large-scale, high-dimensional, and multimodal datasets, making it suitable for diagnostic models and real-time retrieval tasks.