.:: Natural Sciences Publishing ::.

Login

New user?

Information Sciences Letters

An International Journal

ISL Home

For Authors

Editorial Board

Publication Ethics

Processing Charges

Indexing

Submit an Article

Content

Forthcoming Papers

Subscription

Content


	Volumes > Vol. 13 > No. 5


	Urdu-to-English Neural Machine Translation using Transformer with Subword Tokenization

	PP: 1-18

	doi:10.18576/isl/130501

	Author(s)

	Huma Israr, Novera Parvaz, Safdar Abbas Khan,

	Abstract

	Neural machine translation (NMT) model uses deep learning algorithms to translate text from one language to another. With continuous advancements in this field, numerous state-of-the-art techniques have been developed to make translations more accurate and faster. However, the development of Urdu-to-English (UR-EN) machine translation (MT) systems has remained limited compared to other language pairs. The complexity of Urdu language, characterized by its unique writing system and intricate morphology contributes to this limitation. Furthermore, the lack of large, standardized datasets and linguistic resources for Urdu makes it hard to create effective UR-EN translation models. This research introduces a specialized NMT model for translating Urdu text to English. It uses a transformer-based method with subword tokenization to improve the accuracy of previous Urdu-to-English translation models. This study achieved an impressive BLEU score of 45.58, showing that the transformer with subword tokenization performs well for UR-EN translation. The trained model outperformed the classical Transformer with word-level tokenization and the Transformer with attention-based dropout layer by +43.48 BLEU scores. This noteworthy achievement underscores the effectiveness of the proposed approach and demonstrates its potential for practical deployment in UR-EN translation tasks.

Home

Copyright naturalspublishing.com. All Rights Reserved