Processing math: 100%
Colella, Sofia, and Harrison Jones. 2023. “Machine Learning and Ratemaking: Assessing Performance of Four Popular Algorithms for Modeling Auto Insurance Pure Premium.” CAS E-Forum Spring (March).
Download all (6)
  • Figure 1. Plot of actual claim amounts vs predictions
  • Figure 2. Plot of actual claim amounts vs predictions
  • Figure 3. Plot of actual claim amounts vs predictions
  • Figure 4. Plot of actual claim amounts vs predictions
  • Figure 5. Decile charts for all models
  • Figure 6. “Actuarial” Lorenz Curves


Machine learning applications for actuarial science is an increasingly popular subject. Notably, in the field of actuarial pricing, machine learning has been an avenue to higher predictive power for anticipating future claims. Insurers are now experimenting with these algorithms but are coming up against issues of model explainability and implementation costs.

The existing literature has begun to scratch the surface of this deep field of research. Fujita et al. (2020) published an experiment comparing the performance of predictive frequency models using AGLM, GLM, GAM, and GBM. König and Loser (2020) performed a similar exercise, except they compared performance of GLMs, neural networks, and XGBoost in predicting frequency. We sought to add to existing research by predicting pure premium instead of only frequency or severity, comparing four different approaches, and assessing and presenting many different quantitative and qualitative performance evaluation metrics.

This paper compares four different models for prediction of pure premium: generalized linear models (GLM), accurate GLM (AGLM), eXtreme gradient boosting (XGBoost), and neural networks. The research explores the quantitative and qualitative performance of the models on a test set of data as well as the pros and cons of each model in a P&C insurance pricing context.

Accepted: March 07, 2023 EDT