|
|
|
|
|
Urdu-to-English Neural Machine Translation using Transformer with Subword Tokenization |
|
PP: 1-18 |
|
doi:10.18576/isl/130501
|
|
Author(s) |
|
Huma Israr,
Novera Parvaz,
Safdar Abbas Khan,
|
|
Abstract |
|
Neural machine translation (NMT) model uses deep learning algorithms to translate text from one language to another. With continuous advancements in this field, numerous state-of-the-art techniques have been developed to make translations more accurate and faster. However, the development of Urdu-to-English (UR-EN) machine translation (MT) systems has remained limited compared to other language pairs. The complexity of Urdu language, characterized by its unique writing system and intricate morphology contributes to this limitation. Furthermore, the lack of large, standardized datasets and linguistic resources for Urdu makes it hard to create effective UR-EN translation models. This research introduces a specialized NMT model for translating Urdu text to English. It uses a transformer-based method with subword tokenization to improve the accuracy of previous Urdu-to-English translation models. This study achieved an impressive BLEU score of 45.58, showing that the transformer with subword tokenization performs well for UR-EN translation. The trained model outperformed the classical Transformer with word-level tokenization and the Transformer with attention-based dropout layer by +43.48 BLEU scores. This noteworthy achievement underscores the effectiveness of the proposed approach and demonstrates its potential for practical deployment in UR-EN translation tasks.
|
|
|
|
|
|