|
|
|
|
|
Large-Scale Statistical Modelling via Machine Learning Classifiers |
|
PP: 203-222 |
|
Author(s) |
|
Christina Parpoula,
Krystallenia Drosou,
Christos Koukouvinos,
|
|
Abstract |
|
The problem of statistical modelling and identifying the significant variables in large data sets is common nowadays. This
paper deals with the statistical analysis of two large dimensional data sets; we firstly conduct a seismic hazard sensitivity analysis
using seismic data from Greece acquired during the years 1962−2003, and then analyze Trauma data collected in an annual registry
conducted during the year 2005 by the Hellenic Trauma and Emergency Surgery Society involving 30 General Hospitals in Greece.
The main purpose of both analyses is to extract high-level knowledge for the domain user or decision-maker. Eight non parametric
classifiers derived from data mining methods (Multilayer Perceptrons (MLP) Neural Networks, Radial Basis Function Neural (RBFN)
Networks, Bayesian Networks, Support Vector Machines (SVMs), Classification and Regression Tree (C&RT), Chi-square Automatic
Interaction Detection (CHAID), C5.0 algorithm and Quick, Unbiased, Efficient Statistical Tree (QUEST)) are employed in this work,
and are compared to Logistic Regression and ℓ1-norm SVM in terms of overall classification accuracy, sensitivity, specificity, and Area
under the ROC curve (AUROC). The goal of this paper is twofold; assess the importance of several input variables in order to detect
the possible risk factors of large earthquakes or to prevent trauma deaths, and examine which classifiers are most suited for a large
dimensional data analysis, detecting effectively complex nonlinear relationships and potentially lead to more accurate predictions. |
|
|
|
|
|