Paper Title
Comparative Analysis of Speech Foundational Models
Abstract
Emotion detection in crowd scenarios is a challenging yet crucial task with applications in various fields such as marketing, entertainment, and security. We describe a comprehensive comparative analysis of different machine learning models focusing on their accuracy and F1-score metrics for crowd emotion detection.The evaluated models include wavLM, wave2vec, Hubert, X vectors (both with and without MFCC features), KNN, Random Forest, XGBClassifier, Neural Network, and ResNet. A dataset of crowd audio recordings annotated with emotions is used to train and test each model.The primary objective of this study is to identify the most effective model for crowd emotion detection tasks. To achieve this, we meticulously analyze the performance of each model across various metrics, paying particular attention to accuracy, which are commonly used to evaluate classification tasks. Our experimental results reveal compelling insights into the effectiveness of different machine learning models for crowd emotion detection. Notably, the X-vector with MFCC demonstrates superior performance compared to other models across all evaluated metrics. Research and practical applications of these findings have important implications for emotion recognition. By highlighting the effectiveness of the X-Vector, this study provides valuable guidance for researchers and practitioners seeking to develop more accurate and reliable crowd emotion detection systems. Moreover, our study sheds light on feature engineering and model selection as important factors in improving emotion recognition models, which may lead to future innovations.
Keywords - Component, Formatting, Style, Styling, Insert