<strong>Paper Title</strong><br>

EMOTRACE: A Multimodal Micro-Expression and Voice Analysis System for Early Mental Health and Deception Detection<br>

<br>


<strong>Abstract</strong><br>

Early detection of emotional distress and deceptive behavior is vital in psychological assessment, clinical support, and security contexts. Unimodal systems based solely on facial or vocal cues often fail to capture subtle or conflicting emotional states. This work presents EMOTRACE, a multimodal micro-expression and voice analysis system that integrates diarized speech from GPT-4o with facial-emotion recognition using the Residual Masking Network (RMN). A temporal alignment mechanism synchronizes audio and video cues to detect emotional inconsistencies and fleeting affective shifts. The system generates clinician-style summaries, timestamped notes, and emotion timelines in a consolidated PDF report. Experimental results show that multimodal fusion significantly improves emotion-consistency detection, temporal sensitivity, and robustness over audio-only and video-only baselines, demonstrating the feasibility of lightweight, real-time multimodal tools for mental-health monitoring and credibility assessment.

Keywords - Multimodal Systems, Affective Computing, Computer Vision, Speech Processing, Deep Learning.