2. Models#

Credit scoring is a high-stakes use of machine learning in the financial services industry. It has transformed consumer lending and is used for tasks like evaluating loan applications, assessing loan performance, segmenting risk, and determining pricing. A credit scoring system can be understood as the totality of tools, processes, and systems used for the estimation of the credit risk.

image

A credit score is a measure of the likelihood of default risk appearing in the lending book. It’s often expressed as a probability, converted into a scorecard or mapped to a rating, or “blended” with other scores in a modular scoring system. When we talk about credit scores, most of the time a higher score means a lower probability of default (e.g., 700) and vice versa (e.g. 500 means a high default probability).

A practitioner’s goal is to contextualize the model with domain knowledge in the area credit risk and direct it towards the outcomes they aim for, and there are multiple ways to do so.

With growing amounts of financial transactions data, we can easily come across underfitting and overfitting model behavior often denoted as the bias-variance trade-off. When a model underfits the data, it demonstrates high bias failing to capture the relationships in the data. Bias can be reduced by using more complex, non-linear models or increasing the amount of training data. Conversely, overfitting occurs when a model exhibits high variance, resulting in the inability to generalize well beyond the training data. Regularization techniques are often employed to mitigate overfitting by constraining model complexity.

Image

Starting from low complexity and high bias area, the most common method to build a scoring model is using Logistic Regression, often advanced via a supervised binning technique called Weight-of-Evidence (WOE). WOE typically represents how well the evidence either weakens or reinforces a hypothesis about probability of default for a given group versus the average. A “gold standard” scoring model produced with linear techniques is an additive model with a limited number of main effects.

Such models are relatively easy to develop and validate. At the same time, such linear models do not handle interactions present in the data well and this is where other types of machine learning models for credit risk have proven useful.

Image

This simplified example shows how we can use probabilities to explain the logic of a scorecard with two features using the WOE technique (actual score estimation is performed with log odds values).

XGBoost, a powerful gradient boosting algorithm, excels in handling complex tabular data with many unique interactions. Its superior performance on tabular datasets has made it a popular choice for a large set of problems. Binary classification trees are constructed sequentially, forming an ensemble of weak learner trees (sometimes even as small as decision stumps) with the objective to minimize loss. In the context of binary classification, unless specified differently, XGBoost uses a logistic loss (Log Loss) as an objective function, which allows to interpret its leaf weights as log odds similar to WOE bins.

Image

In contrast, emerging neural networks for structured data, e.g. TabNet and DeepTables, aim to leverage deep learning capabilities in the finance domain. Some of the interesting benefits of these models is their ability to be trained on unlabeled data and built-in interpretability methods that are geared towards industry requirements. These models strive to capture complex relationships within structured datasets and offer an alternative to methods like XGBoost or Microsoft’s LightGBM.

Image

Graph-neural networks (GNNs) stand out as promising candidates for modeling relationships between customers with similar risk characteristics or connected through a certain payment network or a platform. Graph data has the potential to enhance the accuracy of risk assessments by incorporating structures that are not visible in tabular datasets.

This chapter looks at industry best practices and relevant research on credit scoring. Notebooks cover some examples of starting points of scoring model development using Weight-of-Evidence (WOE) logistic regression, XGBoost models, and neural networks for tabular data.

...

2.1. Linear Models#

Building Credit Scorecards using SAS and Python

The SAS Data Science Blog

  • Read Here

    • This blog post provides a step-by-step guide to building credit scorecards using the Weight-of-Evidence (WOE) logistic regression approach with SAS and Python.

Automating Interpretable Machine Learning Scorecards

Moody's Analytics

  • Read Here

    • This paper compares logistic regression with supervised binning to other popular machine learning methods. It shows that the modified logistic regression offers similar performance to challenger ML models while maintaining interpretability.

More resources to read
Explore additional resources and references for in-depth understanding of the topics covered in this section.

A Necessary Condition for a Good Binning Algorithm in Credit Scoring

OptBinning: The Python Optimal Binning library

Applied Logistic Regression

Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis

Log Loss Function - Dasha.AI

Decoding Logistic Regression using MLE - Analytics Vidhya

Logistic Regression - Data Automaton

Logistic Regression from Scratch - Alpha Quantum


2.2. Tree-Based Models#

Machine Learning in Credit Risk Modeling: Efficiency without Compromising Explainability

James - Credit Risk AI

  • Read Here

    • This forward-looking white-paper provides an overview of machine learning applications in Credit Risk Modeling. It covers linear models, decision trees, and ensemble methods (Random Forest and Gradient Boosting) in the context of credit risk applications.

Machine Learning in Retail Credit Risk: Algorithms, Infrastructure, and Alternative Data

NVIDIA

  • Watch Here

    • This lecture explores how machine learning is reshaping credit risk models, including new methods for transparent ML in highly regulated environments, the impact of deep learning on alternative financial data, and the acceleration of model development using on-premises GPU computing.

Machine Learning Approach for Credit Scoring

Illimity

  • Read Here

    • This paper proposes an end-to-end corporate rating model development methodology using machine learning. It features a core model architecture with a Light-GBM classifier, a probability calibrator, and a rating attribution system. It’s a must-read for understanding the potential of ML in Credit Risk Modeling. You can also find an example of the calibration method here based on Facebook’s seminar paper Practical Lessons from Predicting Clicks on Ads at Facebook.

More resources to read
Explore additional resources and references for in-depth understanding of the topics covered in this section.

Statistical Modeling: The Two Cultures

Additive Logistic Regression: a Statistical View of Boosting

XGBoost: A Scalable Tree Boosting System

Demystify Modern Gradient Boosting Trees: From Theory to Hands-On Examples

Gradient Boosting and XGBoost

Appendix to Gradient Boosting and XGBoost

How Does Extreme Gradient Boosting (XGBoost) Work?

Understanding Gradient Boosting Tree for Binary Classification

Information Gain

Guide to Credit Scoring in R

2.3. Deep Learning Models#

Predicting Consumer Default: A Deep Learning Approach

NBER

  • Read Here

    • This NBER paper introduces a novel approach to probability of default (PD) modeling using a hybrid approach based on a deep neural network and gradient boosted trees. It utilizes a dataset from Experian and introduces the concept of Value Added (VA) for lenders and borrowers to measure the economic benefits of model adoption.

Deep Neural Networks for Behavioral Credit Rating

  • Read Here

    • This paper presents a deep neural network model for behavioral credit risk assessment. It advocates for reconsidering regulatory requirements for model explainability to allow the usage of non-linear models for credit risk assessment. The paper also quantifies the difference in calibration accuracy for each class via the Brier score.

An End-to-End Deep Learning Approach to Credit Scoring using CNN + XGBoost on Transaction Data

  • Read Here

    • This paper argues that more detailed transactional information fed into models can enhance discriminatory power. Using machine learning algorithms to engineer features based on raw transactional data can reduce the application-behavioral performance gap in scoring models.

More resources to read
Explore additional resources and references for in-depth understanding of the topics covered in this section.

Modern Deep Learning for Tabular Data: Novel Approaches to Common Modeling Problems

A Robust Machine Learning Approach for Credit Risk Analysis of Large Loan Level Datasets Using Deep Learning and Extreme Gradient Boosting

2.4. Graph-Based Models#

Network Based Credit Risk Models

  • Read Here

    • This paper proposes an augmented logistic regression model that incorporates centrality measures derived from similarity networks among borrowers, deduced from their financial ratios. Inclusion of topological variables describing institutions centrality in similarity networks increases the predictive performance of the credit rating model.

Temporal-Aware Graph Neural Network for Credit Risk Prediction

  • Read Here

    • The authors build the dynamic graphs to predict defaults by collecting multiple lending events of users and ordering the events by the lending time. The proposed model incorporates static, temporal, and structural features within a dynamic graph to predict the user’s credit risk profile.

Loan Default Analysis with Multiplex Graph Learning

  • Read Here

    • In this paper, the authors analyze transfers and social relations between users to define the number of defaulted neighbors for each user and then split users into three distinct groups. Both social and transaction relations achieve good performance since people with similar credit risk tend to gather together, and such a pattern can be naturally modeled via a graph model.

Every Corporation Owns Its Structure: Corporate Credit Ratings via Graph Neural Networks

  • Read Here

    • This paper offers a new method named corporation-to-graph to explore the relations between features for corporate rating models. In this model, each corporation is represented as an individual graph from which feature-level interactions can be learned.

More resources to read
Explore additional resources and references for in-depth understanding of the topics covered in this section.

Graph Neural Networks for Credit Modeling - Katana Graph

Graph Neural Networks: Theory, Problem, and Approaches

Build a corporate Credit Ratings Classifier Using Graph Machine Learning in Amazon SageMaker JumpStart