A Comparative Study towards Implementation of an Effective Text-To-Speech System
Synthesizing speech using machines has been a long term goal of research for decades now. With the recent boom in artificial neural networks, Text-To-Speech synthesis has attained dimensions that were unimaginable before. However, to harness this state-of-art quality of synthesized speech, one can’t ignore the massive amount of computational cost that follows. This paper presents a twofold approach of first synthesizing the speech using MATLAB, along with the required Microsoft SAPI dependencies. The framework developed can recognize the character text and store it in the form of a text file, which later serves as an input to the MATLAB program which uses the Microsoft SAPI libraries to convert text into speech utterances. The later part of the paper is concerned with a more intelligible neural network framework, i.e., Google’s Tacotron which has been re-implemented using Tensorflow to understand the margin of differences in the output. For the purpose of comparing the output, the MOS (Mean Opinion Scores) of the methodologies used to conclude our result.
Keywords - Text-To-Speech, State-of-art, MATLAB, neural network, Tacotron, Tensorflow, MOS.