|
|
|
|
|
Using Correlation Coefficient to Solve Outliers Problem in Regression Analysis, with Practical Application |
|
PP: 347-361 |
|
doi:10.18576/jsap/100206
|
|
Author(s) |
|
Afrah Yahya AL-Rezami,
|
|
Abstract |
|
A new algorithm is presented on the basis of the partial and multiple correlation coefficient to estimate multiple outliers in the multiple linear regression model. One of the conditions for estimating multiple outliers is the true presence of outliers, which cannot be presented in the form of errors. Regression analysis was applied to a phenomenon, whose results are known in advance (The relationship between Semester GPA and Cumulative GPA). The results were misleading . After checking Ordinary Least Squares (OLS)) assumptions, outliers were identified by scatter plot for the standardized predicted values against Standardized residual, Studentized deleted residual, Cook’s D, and Hit Matrix. Influential cases were identified using box plot for overall influence measures (DFFITS, COVRATIO, and Cook’s D.). Thereafter, outliers are estimated using the proposed algorithm, which is compared with OLS before discovery outliers, trimmed mean, and weighted least squares (WLS). These methods were compared based on [(P-Value for i), (Adjusted R2), and assumptions of OLS]. The results proved that the proposed method is a robust solution for outliers estimation. Thus, it is recommended to use the proposed algorithm to estimate multiple outliers for any other similar phenomenon. (For example, the proposed method can be applied to a credit card transaction control system in a bank).
|
|
|
|
|
|