|
|
|
|
|
A Malicious Website Detection Approach Using Random Forest and Pearson Correlation-based Feature Selection Method |
|
PP: 593-608 |
|
doi:10.18576/isl/130312
|
|
Author(s) |
|
Abdu H. Gumaei,
Amr F. Shawish,
Ahmed Emam,
|
|
Abstract |
|
With the advancement of the Internet of Things (IoT), smart cities have evolved from traditional urbanization to
contemporary urbanization of technology. IoT networks enable scattered smart devices to gather and analyze data through
an open channel known as the Internet. As a result, security, privacy, centralization, scalability, and transparency in which
smart cities may be developed. Detecting malicious Uniform Resource Locators (URLs) in an IoT context is critical for
protecting the network and devices from security risks. Malicious URLs identification is an essential aspect of
cybersecurity for interconnected devices of smart cities, employing various techniques and technologies to identify
potentially harmful links. Recently, a number of machine learning methods has been used in several studies to classify
URLs into malicious or benign classes based on their statistical characteristics and features. However, selecting the most
significant features in the preprocessing stage plays a key role in improving the detection accuracy of trained machine
learning classifiers. In this paper, we propose an effective approach for detecting malicious websites with a particular
emphasis on attack payloads and broader feature space. The importance of URLs’ features is first obtained and ranked
using a random forest-based comprehensive analysis. Then, Pearson’s correlation analysis is used to select the most
important features that have a strong correlation with the class labels. The proposed approach is evaluated using four
machine learning algorithms: k-nearest neighbor (k-NN), random forest (RF), support vector machine (SVM), and logistic
regression (LR). The experimental results show the efficiency of our approach, achieving 96% accuracy using an RF
algorithm. |
|
|
|
|
|