Login New user?  
04-Information Sciences Letters
An International Journal
               
 
 
 
 
 
 
 
 
 
 
 
 

Content
 

Volumes > Vol. 12 > No. 3

 
   

Performance Analysis of Machine Learning Approaches in Automatic Classification of Arabic Language

PP: 1563- 1578
doi:10.18576/isl/120342
Author(s)
Fahd S. Alharithi,
Abstract
Text classification (TC) is a crucial subject. The number of digital files available on the internet is enormous. The goal of TC is to categorize texts into a series of predetermined groups. The number of studies conducted on the English database is significantly higher than the number of studies conducted on the Arabic database. Therefore, this research analyzes the performance of automatic TC of the Arabic language using Machine Learning (ML) approaches. Further, Single-label Arabic News Articles Datasets (SANAD) are introduced, which contain three different datasets, namely Akhbarona, Khaleej, and Arabiya. Initially, the collected texts are pre-processed in which tokenization and stemming occur. In this research, three kinds of stemming are employed, namely light stemming, Khoja stemming, and no- stemming, to evaluate the effect of the pre-processing technique on Arabic TC performance. Moreover, feature extraction and feature weighting are performed; in feature weighting, the term weighting process is completed by the term frequency- inverse document frequency (tf-idf) method. In addition, this research selects C4.5, Support Vector Machine (SVM), and Naïve Bayes (NB) as a classification algorithm. The results indicated that the SVM and NB methods had attained higher accuracy than the C4.5 method. NB achieved the maximum accuracy with a performance of 99.9%.

  Home   About us   News   Journals   Conferences Contact us Copyright naturalspublishing.com. All Rights Reserved