Bank Renaissance Capital reduces fraud by 20%
The IRB approach in the retail segment

Credit risk strategy evolution - A Risk Manager's perspective

Upgrading from SMV2 to Generation 3: NAG
Optimisation: Does Pareto Still Rule OK?
Everything you ever wanted to know about SMEs
Nordea chooses Optimization from Experian-Scorex
Realising maximum value from data driven marketing
Scoring to Share: Credit Scoring in Russia
Third annual Moscow Credit Risk Seminar most popular yet
Combating fraud in financial institutions in Russia & CIS
Risk control and customer conquering: Experian Day 2007
Making people our business
Investigating collinearity in scoring models
 
The Expert's Column: Investigating collinearity in scoring models

There is often confusion around correlation and collinearity and how they should be treated within model development.  The two concepts are closely related; collinearity refers to a linear or near-linear relationship between a set of predictor variables, whereas correlation signifies a (possibly more general) correspondence between just two characteristics. 

Large correlations between variables can often signify the existence of high collinearity in the data.  This, in turn, can be an issue because some modelling techniques perform sub-optimally in the presence of collinearity.

Recent trends within the scoring industry have seen an increasing number of independent variables being included within the statistical process.  The main drivers for the increase in model complexity include:

  • Improvements in data capture leading to more available data;
  • The expanding array of rich credit bureau information;
  • Increases in computer processing power allowing the use of larger samples;
  • The Basel II stipulation to consider all available data sources.

This means that it becomes prudent to consider the effects introduced by collinearity and correlations between the scorecard predictors.

Dealing with Collinearity

There are numerous ‘traditional’ methods for avoiding collinearity but care must be taken when utilising these.  It is possible to produce orthogonal principle components or eigenvectors for the data, but the resulting variables and models are no longer easy to interpret or monitor.  A common practice of dropping correlated or collinear variables must also be undertaken with caution as this can introduce omitted variable bias, making the final score prediction both biased and sub-optimal.  Such ‘cures’ for collinearity can do much more harm than good! 

Our Solution

Experian-Scorex is committed to using the optimum scoring methodology for our clients, and, as such, has a full understanding of these collinearity effects and how to mitigate them where necessary.  Some degree of collinearity is always present in real world credit scoring samples but Experian-Scorex’s best practice modelling methodology ensures that the model prediction remains accurate and unbiased, even in the presence of such effects.

Where strong collinearity exists, confidence intervals on individual parameter estimates widen, although the overall model output remains accurate.  To combat this, Experian-Scorex uses a number of logic checks, together with expert knowledge and variable selection methods, to ensure variable consistency in any final scorecard.

By using techniques which remain accurate and highly predictive even in the presence of collinearity, Experian-Scorex models allow all data sources to be included within the modelling process and thus enable our clients to leverage all information available within their scoring solutions.

Dr Paul Matthews - Business Consultant, Experian-Scorex

For further discussions about Experian-Scorex’s analytical methodologies please contact us and we will put you in contact with our Analytics Centre of Excellence.

Top

Does your rating system work? The challenges of Basel II validation in retail portfolios
Incorporating Macroeconomic Dynamics into Credit Risk Models