OCR on Sanskrit Language using Tesseract
Many different types of information have been written in diverse elements like stones, tree bark, and leaves throughout India's history. There is a finite shelf life for these resources, and as the aeons pass, these materials are degrading, and the important knowledge contained inside them is being lost forever. Using manuscripts that have been scanned, this article will be able to pass on the information to future generations. With the assistance of the Optical Character Recognition (OCR) technology, it is possible to retrieve content from these digital manuscripts by recognizing the characters (printed text). The OCR with Tesseract concept is used in this article to develop optical character recognition (OCR) for identifying printed Sanskrit letters in Devanagari script, and this developed model is referred to as sag_iitb in this research. The collected results have shown that the suggested system is capable of predicting a high identification rate for the input sequence. Various metric parameters such as accuracy or ASCII special symbol are considered, it becomes clear that sag_iitb is significantly more efficient. The results generated by sag_iitb are good, and it can detect the correct Sanskrit term from an image file the vast majority of the time.
Keywords - Tesseract, Text Recognition, Optical Character Recognition, Sanskrit