Paper Title
DOMAIN-TUNED REAL-TIME SPEECH INTERFACE FOR INDIAN BANKING SUPPORT USING CONTEXT-AWARE NEURAL STT AND RAG-BASED QUERY RESOLUTION
Abstract
Banking call centers are pivotal for managing millions of customer interactions, yet traditional systems face significant challenges in handling high query volumes, especially during peak times, due to high operational costs, inconsistent service quality, and limited scalability. As voice-based automation becomes a cornerstone of customer service, the banking sector demands intelligent, real-time conversational agents capable of handling spoken queries with precision and speed. Bridging this gap, this study proposes an AI-powered voice agent designed to automate most of the repetitive, redundant queries commonly found on banking websites. Our system integrates a real-time, domain-specific Speech-to-Text (STT) engine, a LangChain-powered Retrieval-Augmented Generation (RAG) module, and a high-quality Text-to-Speech (TTS) component, providing efficient, 24/7 support. The STT module utilizes a hybrid RNN-CNN architecture, fine-tuned for Indian English, to ensure accurate transcription across diverse accents. Unlike generic STT systems, our model is designed to maintain session context, ensuring accurate follow-up handling and reducing query misinterpretationThe RAG framework leverages transformer-based embeddings to resolve complex banking queries and deliver structured, context-aware responses using a fine-tuned LLaMA model. Evaluated on 2500+ concurrent sessions, the system demonstrates a Word Error Rate (WER) of 12.5%, Intent Recognition Accuracy of 94.1%, and an average response time of 1.78 s. By outperforming earlier conversational frameworks in contextual robustness, real-time inference, and deployment scalability, this work establishes a robust foundation for multilingual, privacy-aligned, production-ready AI assistants in banking and regulation domains, which not only reduces operational costs but also enhances customer experience by efficiently handling routine inquiries and providing scalable, high-quality service.
Keywords - Speech-to-Text, RNN-CNN, Indian Banking, Retrieval-Augmented Generation, LangChain, TTS, Domain-specific AI, Conversational AI