Paper Title
AUTOMATED SPEECH RECOGNITION MODEL FOR TRANSCRIPTION GENERATION BASED ON HUGGING FACE TRANSFORMER

Abstract
Recently audio transcription fills the gap between verbal communication and text-based accessibility, and it is becoming crucial in today's digital environment. It guarantees correct verbal information documentation, facilitates searchability and analysis of audio content, andimproves accessibility for people with hearing impairments. Transcription plays a crucial part in SEO optimization, allowing content producers to reach larger audiences. In this paper, we propose a refined Wav2Vec2-based Automatic Speech Recognition (ASR) system trained using the Hugging Face Transformer for precise speech-to-text transcription on low-end device. In order to simulate long-range dependencies, the system preprocesses audio by resampling, normalizing, and tokenizing it into feature vectors. These feature vectors are then passed through a transformer encoder. Text sequences are linked with audio features using Connectionist Temporal Classification (CTC) loss, which eliminates the need for frame-level annotations. The model performed well when evaluated using Word Error Rate (WER), making it suitable for web-based automatic subtitle production.