.:: Natural Sciences Publishing ::.

Login

New user?

Information Sciences Letters

An International Journal

ISL Home

For Authors

Editorial Board

Publication Ethics

Processing Charges

Indexing

Submit an Article

Content

Forthcoming Papers

Subscription

Content


	Volumes > Vol. 12 > No. 3


	Performance Analysis of Machine Learning Approaches in Automatic Classification of Arabic Language

	PP: 1563- 1578

	doi:10.18576/isl/120342

	Author(s)

	Fahd S. Alharithi,

	Abstract

	Text classification (TC) is a crucial subject. The number of digital files available on the internet is enormous. The goal of TC is to categorize texts into a series of predetermined groups. The number of studies conducted on the English database is significantly higher than the number of studies conducted on the Arabic database. Therefore, this research analyzes the performance of automatic TC of the Arabic language using Machine Learning (ML) approaches. Further, Single-label Arabic News Articles Datasets (SANAD) are introduced, which contain three different datasets, namely Akhbarona, Khaleej, and Arabiya. Initially, the collected texts are pre-processed in which tokenization and stemming occur. In this research, three kinds of stemming are employed, namely light stemming, Khoja stemming, and no- stemming, to evaluate the effect of the pre-processing technique on Arabic TC performance. Moreover, feature extraction and feature weighting are performed; in feature weighting, the term weighting process is completed by the term frequency- inverse document frequency (tf-idf) method. In addition, this research selects C4.5, Support Vector Machine (SVM), and Naïve Bayes (NB) as a classification algorithm. The results indicated that the SVM and NB methods had attained higher accuracy than the C4.5 method. NB achieved the maximum accuracy with a performance of 99.9%.

Home

Copyright naturalspublishing.com. All Rights Reserved