Accurate crowd counting for intelligent video surveillance systems
DOI:
https://doi.org/10.15276/hait.07.2024.17Keywords:
Crowd counting, intelligent video surveillance, deep learning, encoder-decoder architecture, density map estimation, hierarchical feature extraction, convolutional neural networks, public safety monitoringAbstract
The paper presents a novel deep learning approach for crowd counting in intelligent video surveillance systems, addressing the
growing need for accurate monitoring of public spaces in urban environments. The demand for precise crowd estimation arises from
challenges related to security, public safety, and efficiency in urban areas, particularly during large public events. Existing crowd
counting techniques, including feature-based object detection and regression-based methods, face limitations in high-density
environments due to occlusions, lighting variations, and diverse human figures. To overcome these challenges, the authors propose a
new deep encoder-decoder architecture based on VGG16, which incorporates hierarchical feature extraction with spatial and channel
attention mechanisms. This architecture enhances the model’s ability to manage variations in crowd density, leveraging adaptive
pooling and dilated convolutions to extract meaningful features from dense crowds. The model’s decoder is further refined to handle
sparse and crowded scenes through separate density maps, improving its adaptability and accuracy. Evaluations of the proposed
model on benchmark datasets, including Shanghai Tech and UCF CC 50, demonstrate superior performance over state-of-the-art
methods, with significant improvements in mean absolute error and mean squared error metrics. The paper emphasizes the
importance of addressing environmental variability and scale differences in crowded environments and shows that the proposed
model is effective in both sparse and dense crowd conditions. This research contributes to the advancement of intelligent video
surveillance systems by providing a more accurate and efficient method for crowd counting, with potential applications in public
safety, transportation management, and urban planning.