Paper Title
Adaptive Multisensory Perception For Human Behaviour Detection

Abstract
Multimodal human activity recognition attracts wide attention in human-computer interaction contributes abnormal activity detection and security purpose. However, in the collected multimodal signals, not all modal signals contain useful feature information; some irrelevant and redundant information may negatively impact the model’s performance, reducing the accuracy of activity recognition. This paper designs a hybrid of hand-crafted and deep learning-based feature extraction techniques, spatial and temporal features are meticulously extracted from video frames, including motion vectors, object detection outcomes, audio characteristics and facial expressions. This training phase enables the algorithms to discern intricate patterns indicative of abnormal behavior, thereby fostering heightened accuracy in subsequent detection efforts. The trained models are seamlessly integrated into existing surveillance infrastructures to facilitate real-time human activity detection. It dynamically learns the weights of different modalities, assigning higher weights to relatively important modalities. This process effectively fuses features extracted from individual modalities, resulting in a more comprehensive feature set. Through extensive experiments, we evaluate the performance of the proposed multimodal human activity recognition network from various perspectives. Keywords - Accuracy, streaming media, Feature Extraction, spatiotemporal phenomena, human activity recognition, multimedia communication.