Predicting Quote Conversion in Insurance
An AI-powered tool that helps insurance underwriters focus on the quotes most likely to convert.
Overview
This project began after a series of brainstorming sessions with an insurance client, who wanted to explore new ways to capitalise on the data captured by their insurance marketplace platform.
One idea stood out, a tool that would let users upload their current pipeline of quotes and instantly see predictions on the likelihood of each quote binding.
At the time, only around 10% of submitted quotes were converting, meaning large amounts of time were being spent chasing low-probability deals.
We set out to build an AI model, API, and simple web interface that could quickly process quote data and present ranked predictions.
The result was a solution that prioritised each quote by its likelihood to convert, enabling brokers and underwriters to focus on the opportunities that matter most.
Data Extraction & Assembly
Before any analysis could take place, we first needed to collect and prepare the data.
Quote information was spread across several SQL tables, with some important fields, such as limits and premiums, stored as embedded JSON arrays within columns. To create a unified dataset, we used SQL to combine and “explode” these nested structures, transforming them into a consistent, tabular format suitable for analysis.
Once assembled, the combined dataset captured several years of quote history across multiple products and countries, providing a solid foundation for exploratory analysis and model development.
Exploratory Data Analysis
We explored the dataset in a Jupyter notebook using Python, combining visual inspection with statistical methods to understand relationships between features and the target variable.
Our first step was to evaluate basic distributions, missing values, and outliers. Summary statistics and boxplots quickly highlighted inconsistencies in numeric fields, including several quotes with unusually large premium values that appeared to be data entry errors.
Boxplots immediately showed issues with heavily skewed data for premiums and revenue, which is to be expected for financial data. You can see below that there are many values outwith the whiskers of the plots.
Premium
Revenue
Limit
To understand the predictive power of individual features, we calculated univariate separability scores using the Area Under the ROC Curve (AUC). This revealed that a field had an unusually high correlation with the outcome, indicating potential data leakage. After further review, we confirmed that the field value was only present after a quote was bound and should be removed.
We discovered that one product type behaved differently from all others. All quotes for this product were consistently binding, and including them would have dominated the model, causing it to focus on this single, perfectly predictable pattern instead of learning the more subtle relationships across other product types.
Finally, we analysed the industry variable, which was encoded using NAIC industry classification codes. The codes were found to be too granular, leading to hundreds of rarely occurring categories.
Through this analysis, we developed a deeper understanding of the data and identified the transformations required to make it reliable and predictive. The EDA process also validated our feature assumptions and helped guide the next phase of model development.
Data Preparation & Feature Engineering
Following the exploratory phase, we prepared the data for modelling by applying the insights gathered during analysis. The goal was to create a clean, consistent, and informative dataset that captured meaningful business patterns without introducing data leakage.
We began by removing problematic and irrelevant rows identified during EDA, including quotes with a product type that behaved differently, consistently binding. Several entries with implausibly high premium values were also removed. Columns that risked leaking the outcome, such as post-bind status indicators, were also excluded from the training set.
Categorical variables, including product type and country, were standardised and encoded using one-hot encoding to ensure they were compatible with machine learning algorithms. The industry NAIC codes were too granular for the available data, so we grouped them into broader industry categories and then one-hot encoded.
Feature Engineering
To improve model performance and interpretability, a shared feature engineering pipeline was built to convert raw quote data into structured, model-ready features. The same transformations are applied consistently at both training and inference time, ensuring alignment between model development and production scoring.
Temporal Features
Quote creation timestamps were used to derive several time-based indicators capturing seasonality and operational behaviour.
| Feature | Description |
|---|---|
| created_dow | Day of week the quote was created (0–6). |
| created_hour | Hour of day of creation, capturing working-hour effects. |
| created_month | Month of year to identify seasonal trends. |
| is_weekend / is_business_hours / is_morning / is_end_of_month | Binary indicators for operational timing patterns. |
Financial Normalisation and Scaling
Financial magnitudes were highly skewed, so log transformations were introduced to stabilise the data and improve comparability.
| Feature | Transformation | Purpose |
|---|---|---|
| revenue_log1p | log(1 + revenue) | Compress extreme values and improve scale. |
| premium_log1p | log(1 + premium) | Compress extreme values and improve scale. |
After applying log transformations to premium and revenue, the boxplots show a healthy distribution for modelling.
Premium log1p
Revenue log1p
Ratio and Relational Features
Ratios between premiums and limits were added to provide context on pricing and risk appetite.
| Feature | Description |
|---|---|
| premium_to_limit | Ratio of premium to insured limit. |
Industry Grouping
Industry codes were mapped to higher-level sectors to reduce sparsity and improve generalisation across unseen data.
| Feature | Description |
|---|---|
| industry_sector_name | Mapped from NAICS-style codes to broad sectors such as Manufacturing or Finance & Insurance. |
| industry_subsector_grouped | Rare subsectors grouped under “Other” based on minimum sample thresholds. |
Before training, the data was split into training and validation sets using stratified sampling to maintain the class balance of bound and non-bound quotes. This ensured the evaluation results reflected real-world behaviour and not random distribution effects.
The resulting dataset was clean, structured, and model-ready, providing a reliable foundation for the development and evaluation of predictive models.
Model Development & Evaluation
With the dataset prepared and validated, the next step was to identify which modelling approach would best capture the patterns that influence whether a quote binds. We evaluated a range of supervised classification algorithms, balancing model complexity, interpretability, and performance.
Model Selection
Because the dataset combined numeric, categorical, and engineered ratio features, we focused primarily on tree-based ensemble methods known for their ability to model non-linear relationships and handle mixed feature types with minimal preprocessing. For benchmarking, we also included a set of baseline models, linear and kernel-based methods, to illustrate the performance gap between traditional and gradient-boosting techniques.
| Model | Type |
|---|---|
| LightGBM | Gradient Boosting (Histogram-based) |
| XGBoost | Gradient Boosting (Tree-based) |
| Gradient Boosting (sklearn) | Baseline Boosting |
| Random Forest | Bagging Ensemble |
| Logistic Regression | Linear Model |
| Support Vector Machine | Kernel Method |
Model Comparison Routine
To ensure fairness across experiments, all models were trained using the same stratified train/validation split and evaluated on identical data. A custom model comparison script automated this process, standardising metrics, reproducibility settings, and evaluation outputs.
Each model was trained with a fixed random seed to guarantee repeatable results and tuned using a consistent set of hyperparameters optimised for generalisation rather than overfitting. The evaluation script computed key performance metrics: ROC-AUC, Precision-Recall AUC (PR-AUC), and Lift@50- providing a balanced view of ranking quality and calibration.
The table below summarises the final model comparison results:
Final Rankings (by PR-AUC)
| Model | ROC-AUC | PR-AUC | Lift@50 | Top50 Binds |
|---|---|---|---|---|
| LightGBM | 0.928911 | 0.702337 | 8.594118 | 15 |
| XGBoost | 0.952065 | 0.675751 | 8.021176 | 14 |
| Gradient Boosting | 0.919024 | 0.611051 | 8.021176 | 14 |
| Random Forest | 0.917647 | 0.439035 | 7.448235 | 13 |
| Logistic Regression | 0.537547 | 0.047643 | 2.291765 | 4 |
| SVM | 0.481977 | 0.043755 | 0.572941 | 1 |
The LightGBM model achieved the best overall balance of recall and precision, with a PR-AUC of 0.70 and over eight-fold lift at the top 50 predictions compared to random chance. Although XGBoost recorded a slightly higher ROC-AUC, LightGBM demonstrated superior precision-recall performance, making it the preferred choice for deployment.
The results also illustrate the strength of gradient-boosting techniques on structured business data and the limitations of traditional linear approaches in this context. Logistic Regression and SVM models struggled to capture the complex interactions between features, while ensemble tree methods handled them naturally.
LightGBM was selected as the production model for its combination of high accuracy and speed.
The Solution
We packaged the model into a simple, production-ready API that fits seamlessly into the client’s workflow:
- API: A lightweight FastAPI endpoint that accepts a CSV of new quotes and returns an ordered list ranked by bind probability.
- Demo Interface: A simple tool to demo the API functionality that allows users to drag and drop their CSV file to view ranked results.

What’s Next
We’ll continue to collect data and periodically re-visit the model comparisons to ensure we’re squeezing the largest amount of predictive power out of it.
We also plan to introduce interpretability dashboards to help users understand which factors most influence each prediction, building transparency and trust in AI assisted decision making.
Key Takeaways
- Identified high probability quotes using AI ranking
- Fast, self contained API and upload tool
- ROC-AUC: 0.93 and PR-AUC: 0.7 – strong predictive accuracy
- Improved broker efficiency and decision focus


