.:: Natural Sciences Publishing ::.

Login

New user?

Information Sciences Letters

An International Journal

ISL Home

For Authors

Editorial Board

Publication Ethics

Processing Charges

Indexing

Submit an Article

Content

Forthcoming Papers

Subscription

Content


	Volumes > Vol. 13 > No. 3


	A Malicious Website Detection Approach Using Random Forest and Pearson Correlation-based Feature Selection Method

	PP: 593-608

	doi:10.18576/isl/130312

	Author(s)

	Abdu H. Gumaei, Amr F. Shawish, Ahmed Emam,

	Abstract

	With the advancement of the Internet of Things (IoT), smart cities have evolved from traditional urbanization to contemporary urbanization of technology. IoT networks enable scattered smart devices to gather and analyze data through an open channel known as the Internet. As a result, security, privacy, centralization, scalability, and transparency in which smart cities may be developed. Detecting malicious Uniform Resource Locators (URLs) in an IoT context is critical for protecting the network and devices from security risks. Malicious URLs identification is an essential aspect of cybersecurity for interconnected devices of smart cities, employing various techniques and technologies to identify potentially harmful links. Recently, a number of machine learning methods has been used in several studies to classify URLs into malicious or benign classes based on their statistical characteristics and features. However, selecting the most significant features in the preprocessing stage plays a key role in improving the detection accuracy of trained machine learning classifiers. In this paper, we propose an effective approach for detecting malicious websites with a particular emphasis on attack payloads and broader feature space. The importance of URLs’ features is first obtained and ranked using a random forest-based comprehensive analysis. Then, Pearson’s correlation analysis is used to select the most important features that have a strong correlation with the class labels. The proposed approach is evaluated using four machine learning algorithms: k-nearest neighbor (k-NN), random forest (RF), support vector machine (SVM), and logistic regression (LR). The experimental results show the efficiency of our approach, achieving 96% accuracy using an RF algorithm.

Home

Copyright naturalspublishing.com. All Rights Reserved