Unsupervised Re-identification architecture based on segmented tracklets for animal behavior analysis
Main Article Content
Abstract
In this paper, we present Mask-TAUDL, an advanced unsupervised re-identification architecture that combines instance segmentation, unsupervised deep learning, and tracklet association for detailed analysis of object behavior in long-term recordings. It combines the Mask R-CNN dual-stream detector/segmenter with dual ResNet-18 backbones and the unsupervised deep learning module based on tracklet association (TAUDL). Mask R-CNN provides accurate object localization and binary masks from which we construct tracklets with improved segmentation. The two ResNet-18 streams use these masks to extract appearance and motion-sensitive features at the tracklet level, which are combined into a common feature descriptor. The TAUDL module operates directly on the masked tracklet features and co-trains discriminative embeddings and cross-session associations without manual labeling. The proposed Mask-TAUDL architecture trains a model so that features of a single individual remain close in embedding space over time, while providing a clear separation of features between different individuals. Integrating pure masked regions with temporally aggregated features helps suppress spurious variations caused by shadows, reflections, or overlapping objects. Long-term animal re-identification is challenging due to frequent overlaps, appearance drift, and subtle visual differences between individuals, and most existing solutions rely on large annotated datasets, which limits their applicability in real-world laboratory settings. The Mask-TAUDL architecture overcomes these limitations by explicitly modeling temporally consistent, mask-refined tracks and training embeddings that preserve identity in a fully unsupervised manner. Mask-TAUDL is designed for animal behavior studies, namely small laboratory species such as mice and fish observed in closed or semi-structured arenas, where reliable long-term identity tracking is essential for quantitative behavioral analysis, longitudinal experiments, and high-throughput screening.

