Paper Title
A REAL-TIME, MULTI-MODAL DEEP LEARNING FRAMEWORK FOR DRIVER DROWSINESS AND INATTENTION DETECTION: AN ATTENTION-BASED SPATIO-TEMPORAL NETWORK (ASTN) APPROACH

Abstract
This paper presents a comprehensive, real-time Driver Drowsiness and Inattention Detection System combining Computer Vision and Deep Learning to enhance road safety. The system monitors driver states via live feed, simultaneously analysing physical fatigue and cognitive distraction. The architecture integrates three processing layers: a YOLOv8 Nano layer for driver localisation, a MediaPipe Face Mesh layer for Eye Aspect Ratio (EAR) computation, and a custom Attention- Based Spatio-Temporal Network (ASTN) for cognitive state classification. The ASTN employs a dual-branch architecture combining a CNN-based spatial feature extractor with Multi-Head Self-Attention and a parallel 1D-CNN temporal branch. Benchmark tests against baseline models (SVM, KNN, CNN-LSTM, etc.) show that the proposed ASTN achieves a superior classification accuracy of 90.54%. The pipeline operates at 30+ FPS on standard consumer hardware, providing a lowlatency, software-driven solution for real-world driver monitoring without requiring specialised physiological sensors. Keywords - Driver Monitoring Systems (DMS), Eye Aspect Ratio (EAR), Attention Mechanism, Deep Learning, Inattention Detection