Login New user?  
04-Information Sciences Letters
An International Journal
               
 
 
 
 
 
 
 
 
 
 
 
 

Content
 

Volumes > Vol. 13 > No. 3

 
   

A Malicious Website Detection Approach Using Random Forest and Pearson Correlation-based Feature Selection Method

PP: 593-608
Author(s)
Abdu H. Gumaei, Amr F. Shawish, Ahmed Emam,
Abstract
With the advancement of the Internet of Things (IoT), smart cities have evolved from traditional urbanization to contemporary urbanization of technology. IoT networks enable scattered smart devices to gather and analyze data through an open channel known as the Internet. As a result, security, privacy, centralization, scalability, and transparency in which smart cities may be developed. Detecting malicious Uniform Resource Locators (URLs) in an IoT context is critical for protecting the network and devices from security risks. Malicious URLs identification is an essential aspect of cybersecurity for interconnected devices of smart cities, employing various techniques and technologies to identify potentially harmful links. Recently, a number of machine learning methods has been used in several studies to classify URLs into malicious or benign classes based on their statistical characteristics and features. However, selecting the most significant features in the preprocessing stage plays a key role in improving the detection accuracy of trained machine learning classifiers. In this paper, we propose an effective approach for detecting malicious websites with a particular emphasis on attack payloads and broader feature space. The importance of URLs’ features is first obtained and ranked using a random forest-based comprehensive analysis. Then, Pearson’s correlation analysis is used to select the most important features that have a strong correlation with the class labels. The proposed approach is evaluated using four machine learning algorithms: k-nearest neighbor (k-NN), random forest (RF), support vector machine (SVM), and logistic regression (LR). The experimental results show the efficiency of our approach, achieving 96% accuracy using an RF algorithm.

  Home   About us   News   Journals   Conferences Contact us Copyright naturalspublishing.com. All Rights Reserved