International Journal of Innovative Research in Computer and Communication Engineering
ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines
| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |
| TITLE | AI-Based Book Translation for Indian Languages Using Transfer Learning |
|---|---|
| ABSTRACT | India is home to extraordinary linguistic diversity, including 22 officially recognized languages and numerous regional dialects. While major languages such as Hindi, Marathi, and English have moderate digital resources, regional dialects like Khandeshi, Varhadi, and Gondi remain severely underrepresented in Natural Language Processing (NLP). This paper presents an AI-based neural architecture for automated book-length document translation across Indian languages, with special emphasis on Marathi and its regional dialects (Khandeshi and Varhadi) and the tribal language Gondi. The proposed framework leverages adversarial transfer learning and hierarchical document-level attention mechanisms to enable effective translation in low-resource scenarios. Knowledge is transferred from high-resource language pairs (English-Hindi, English-Marathi) to dialectal and tribal languages using cross-lingual representation learning. The system integrates a transformer-based encoder-decoder architecture with language-agnostic intermediate embeddings and coherence modeling for long-form translation. Experimental results demonstrate significant performance improvements in low-resource and dialectal translation tasks, particularly for Marathi-Khandeshi, Marathi-Varhadi, and Hindi-Gondi pairs. The proposed model improves BLEU scores by an average of 8-10% over baseline systems and introduces a novel Book Translation Quality Score (BTQS) for document-level evaluation. This research contributes toward digital preservation and accessibility of regional and tribal languages in India. |
| AUTHOR | ARCHANA V. BENDALE, SANIKA BABASAHEB DUKRE, MOEZ NISAR SHAIKH, SHAHDAAB MUSHTAQ SHEIKH, KASHISH PIYUSH PAREKH Asst. Prof., Department of Information Technology, Sandip Institute of Technology and Research Centre, Nashik, India Department of Information Technology, Sandip Institute of Technology and Research Centre, Nashik, India |
| VOLUME | 183 |
| DOI | DOI: 10.15680/IJIRCCE.2026.1404020 |
| pdf/20_AI-Based Book Translation for Indian Languages Using Transfer Learning.pdf | |
| KEYWORDS | |
| References | [1] Sutskever, I., Vinyals, O., and Le, Q. V., "Sequence to sequence learning with neural networks," in Proc. NeurIPS, 2014, pp. 3104-3112. [2] Bahdanau, D., Cho, K., and Bengio, Y., "Neural machine translation by jointly learning to align and translate," in Proc. ICLR, 2015. [3] Vaswani, A., et al., "Attention is all you need," in Proc. NeurIPS, 2017, pp. 5998-6008. [4] Choudhary, N., and Jha, G. N., "Creating English-Hindi parallel corpus using Hindi Treebank," in Proc. LREC, 2018, pp. 1201-1206. [5] Parida, S., et al., "BPC: Bharat Parallel Corpus for Indian languages," in Proc. ACL, 2021, pp. 4567-4580. [6] Ramesh, G., et al., "Samanantar: The largest publicly available parallel corpora collection for 11 Indic languages," in Proc. ACL-IJCNLP, 2022. [7] Dabre, R., Kumar, A., and Bhattacharyya, P., "IndicTrans: A multilingual transformer for Indian languages," in Proc. EMNLP, 2021, pp. 2345-2356. [8] Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. NAACL-HLT, 2019, pp. 4171-4186. [9] Zoph, B., et al., "Transfer learning for low-resource neural machine translation," in Proc. EMNLP, 2016, pp. 1568- 1575. [10] Ganin, Y., et al., "Domain-adversarial training of neural networks," Journal of Machine Learning Research, vol. 17, no. 1, pp. 2096-2030, 2016. [11] Aharoni, R., Johnson, M., and Firat, O., "Massively multilingual neural machine translation," in Proc. NAACL-HLT, 2019, pp. 3874-3884. [12] Miculicich, L., et al., "Document-level neural machine translation with hierarchical attention networks," in Proc. EMNLP, 2018, pp. 2947-2954. [13] Junczys-Dowmunt, M., "Microsoft's submission to the WMT2018 news translation task," in Proc. WMT, 2018, pp. 451-458. [14] Voita, E., et al., "Context-aware monolingual repair for neural machine translation," in Proc. EMNLP, 2019, pp. 8771-8784. [15] Kunchukuttan, A., et al., "AI4Bharat-IndicNLP: A framework for Indian language NLP," in Proc. EMNLP, 2020, pp. 4477-4482. [16] Haddow, B., et al., "Survey of low-resource machine translation," Computational Linguistics, vol. 48, no. 3, pp. 673- 732, 2022. [17] Papineni, K., et al., "BLEU: A method for automatic evaluation of machine translation," in Proc. ACL, 2002, pp. 311-318. [18] Popovic, M., "chrF++: Words helping character n-grams," in Proc. WMT, 2017, pp. 612-618. |