What does it mean to enrich a CRM lead?

Enriching a lead means adding useful, structured details that the lead did not provide directly. Here you ask an AI model to infer fields like industry, company size, and buying intent from the raw text a lead left behind, then save those fields back to the contact record.

Will the AI invent facts about my leads?

It can, which is why you cap the model's freedom. You ask only for inferences it can reasonably draw from the supplied text, give it an explicit 'unknown' option for every field, and store a confidence score so your sales team knows when to trust a value.

Do I need a paid OpenAI plan for this?

You need a funded OpenAI account because lead enrichment runs through the paid API, not the free chat product. Costs are small: classifying a single lead with a compact model usually costs a fraction of a cent.

How do I force the model to return clean JSON every time?

Use the API's structured-output mode by passing a response_format that names a JSON schema. The model is then constrained to return valid JSON matching your fields, so you never have to parse messy free text.

Is it safe to send lead data to an AI model?

Send only the fields you need to make the inference, never passwords or payment details. The OpenAI API does not train on data sent through the API by default, but you should still strip sensitive values before the call and check your provider's retention policy.

Enrich CRM Leads with AI in Python

This guide shows you how to turn a messy CRM lead into a tidy, structured record in under fifteen minutes: an AI model reads the raw text a lead left behind, infers the industry, company size, and buying intent, writes a one-line summary, and your Python script saves all of it back to the contact.

Most leads arrive half-empty. Someone fills in a name and an email, drops a sentence into a "tell us about your project" box, and that is all your sales team has to work with. The useful facts are buried in that sentence, plus the email domain and a job title, and nobody has time to read and tag hundreds of them by hand. An AI model can do that reading and tagging in seconds, and it can hand back results in a fixed shape your CRM understands.

Prerequisites

You only need a few things beyond a working Python install. If you have not set up Python yet, start with Create a Python Virtual Environment for AI and come back. New to calling AI models from code? The plain-English walkthrough in Understanding LLM APIs covers the request-and-response pattern this guide builds on.

You need Python 3.10 or newer, a funded OpenAI account, and the two packages below.

python -m pip install "openai>=1.40" python-dotenv

Create a file named .env in your project folder and add your key:

OPENAI_API_KEY=sk-your-key-here

Add .env to your .gitignore so your key is never committed to version control.

This guide focuses on the AI enrichment itself and prints the result. To push results into a live CRM, pair it with Sync HubSpot Contacts with Python, which covers the authenticated update call in detail.

Step 1: Define the fields you want back

Before you call any model, decide exactly what "enriched" means for your business. Vague requests get vague answers. A schema is just a list of the fields you want, their types, and the allowed values, written so the model has no room to improvise. Notice that every category includes an unknown option, so the model has an honest answer when the text gives it nothing to go on.

# schema.py
LEAD_SCHEMA = {
    "type": "object",
    "additionalProperties": False,
    "properties": {
        "industry": {
            "type": "string",
            "description": "Best guess at the lead's industry, e.g. 'SaaS', 'Healthcare', 'E-commerce'. Use 'unknown' if unclear.",
        },
        "company_size": {
            "type": "string",
            "enum": ["1-10", "11-50", "51-200", "201-1000", "1000+", "unknown"],
        },
        "intent": {
            "type": "string",
            "enum": ["ready_to_buy", "evaluating", "researching", "just_browsing", "unknown"],
        },
        "summary": {
            "type": "string",
            "description": "One plain sentence describing who the lead is and what they want.",
        },
        "confidence": {
            "type": "number",
            "description": "How confident you are in these inferences, from 0.0 to 1.0.",
        },
    },
    "required": ["industry", "company_size", "intent", "summary", "confidence"],
}

The enum lists matter. They turn open-ended guesses into a small set of values your sales filters can actually count on, so "company size" is always one of six tidy buckets rather than free text like "smallish" or "a few hundred people".

Keep the schema lean. Every field you add is one more thing the model has to reason about and one more column your team has to act on. Start with the four or five fields that drive a real decision, ship them, and add more only when a teammate asks for them. A bloated schema produces slower, costlier calls and rarely earns its keep.

Step 2: Send one lead to the model in structured-output mode

Structured-output mode is the part that makes this reliable. When you attach your schema with response_format, the model is forced to return valid JSON that matches your fields exactly, so you never scrape an answer out of a paragraph of prose. If you have ever wrestled with broken parsing, the background in Fix JSONDecodeError with AI API Responses in Python shows why this approach removes the problem at the source.

# enrich.py
import json
import os

from dotenv import load_dotenv
from openai import OpenAI

from schema import LEAD_SCHEMA

load_dotenv()
client = OpenAI()  # reads OPENAI_API_KEY from the environment

SYSTEM_PROMPT = (
    "You enrich raw CRM leads. Infer each field only from the text provided. "
    "Never invent specific facts. When the text does not support a value, "
    "use 'unknown' and lower the confidence score."
)


def enrich_lead(lead: dict) -> dict:
    """Send one raw lead to the model and return structured enrichment."""
    user_message = (
        f"Name: {lead.get('name', '')}\n"
        f"Email: {lead.get('email', '')}\n"
        f"Job title: {lead.get('job_title', '')}\n"
        f"Message: {lead.get('message', '')}"
    )

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "lead_enrichment",
                "schema": LEAD_SCHEMA,
                "strict": True,
            },
        },
    )

    return json.loads(response.choices[0].message.content)

The email domain is doing quiet work here. dana@acme-clinic.com nudges the model toward healthcare without you writing a single rule. Setting temperature=0 keeps results steady, so the same lead returns the same labels on Monday and Friday.

The system prompt is your guardrail against invented facts. The two instructions that matter most are "infer only from the text provided" and "use 'unknown' when the text does not support a value". Without them, a model will happily guess a headcount it has no way of knowing. With them, a sparse lead returns honest unknown values and a low confidence score, which is exactly what you want feeding into the gate in the next step.

Step 3: Validate before you trust the result

The model returns valid JSON, but "valid JSON" is not the same as "good data". A quick check protects your CRM from low-confidence guesses landing in fields your team treats as fact. Route anything below your threshold to a human queue instead of writing it blindly.

# validate.py
def is_trustworthy(enrichment: dict, threshold: float = 0.6) -> bool:
    """Decide whether enrichment is confident enough to auto-apply."""
    if enrichment["confidence"] < threshold:
        return False
    # An 'unknown' industry on a high-confidence record is contradictory.
    if enrichment["industry"] == "unknown" and enrichment["confidence"] > 0.8:
        return False
    return True

Tune the threshold to your appetite for risk. A team that prizes clean data sets it high and reviews more leads by hand; a team that wants every record tagged sets it low and accepts a few rough guesses.

Step 4: Write the enriched fields back to the CRM

The final step sends your parsed result to the CRM as an update on the matching contact. The shape of this call depends on your CRM, but the pattern is always the same: an authenticated request that maps your enrichment fields onto the right custom properties. Below is the HubSpot-style version; swap the URL and field names for your own system.

# writeback.py
import os

import httpx


def write_back(contact_id: str, enrichment: dict) -> None:
    """Save enrichment to custom properties on a CRM contact."""
    token = os.environ["HUBSPOT_TOKEN"]
    url = f"https://api.hubapi.com/crm/v3/objects/contacts/{contact_id}"
    payload = {
        "properties": {
            "ai_industry": enrichment["industry"],
            "ai_company_size": enrichment["company_size"],
            "ai_intent": enrichment["intent"],
            "ai_summary": enrichment["summary"],
        }
    }
    response = httpx.patch(
        url,
        json=payload,
        headers={"Authorization": f"Bearer {token}"},
        timeout=30.0,
    )
    response.raise_for_status()

Create the custom properties (ai_industry, ai_intent, and the rest) in your CRM settings before you run this, or the update will be rejected for unknown fields. Add your HUBSPOT_TOKEN to the same .env file and keep that file out of version control.

Worked example: enrich a batch end to end

This script ties the four steps together. It reads leads, enriches each one, applies the confidence gate, and either writes the result back or flags it for review. It uses a small sample list so you can run it immediately; replace SAMPLE_LEADS with rows pulled from your CRM.

# run.py
import time

from enrich import enrich_lead
from validate import is_trustworthy
# from writeback import write_back  # uncomment when your CRM fields exist

SAMPLE_LEADS = [
    {
        "id": "101",
        "name": "Dana Reyes",
        "email": "dana@acme-clinic.com",
        "job_title": "Operations Lead",
        "message": "We run three clinics and need to automate patient reminders soon.",
    },
    {
        "id": "102",
        "name": "Sam Cole",
        "email": "sam@gmail.com",
        "job_title": "",
        "message": "Just looking around, might come back later.",
    },
]


def main() -> None:
    for lead in SAMPLE_LEADS:
        enrichment = enrich_lead(lead)
        if is_trustworthy(enrichment):
            print(f"AUTO  {lead['name']}: {enrichment['summary']}")
            # write_back(lead["id"], enrichment)
        else:
            print(f"REVIEW {lead['name']}: low confidence -> human queue")
        time.sleep(0.5)  # gentle pacing keeps you under rate limits


if __name__ == "__main__":
    main()

Run it with python run.py. The clinic lead should come back as healthcare with a clear "evaluating" or "ready_to_buy" intent and a high confidence score, while the browsing lead should land in the review queue. The time.sleep(0.5) is your friend on large batches; for a deeper fix when volume climbs, see Fix the 429 Rate-Limit Error in Python.

Parameter quick reference

Parameter	Type	Default	Effect
`model`	string	`gpt-4o-mini`	The model that reads and classifies each lead. The compact mini model is cheap and accurate enough for this task; upgrade only if labels feel weak.
`temperature`	float	`0`	Controls randomness. Keep it at `0` so the same lead always gets the same labels.
`response_format`	object	none	Attaches your JSON schema. With `strict: True`, the model must return valid JSON matching your fields.
`threshold`	float	`0.6`	Your confidence cut-off. Records below it go to human review instead of being written back.

Troubleshooting

openai.AuthenticationError: 401 — Your key is missing or wrong. Confirm .env sits in the folder you run from and that load_dotenv() is called before OpenAI(). The step-by-step cure is in Fix the 401 Unauthorized Error in OpenAI Python.
BadRequestError mentioning response_format or schema — Your schema is malformed. With strict: True, every property must appear in the required list and additionalProperties must be False. Match the schema in Step 1 exactly.
Every field comes back unknown — The model has nothing to work with. Check that you are actually passing the message text and email into user_message; an empty string in, empty inferences out.
HubSpot returns 400 Property ... does not exist — You are writing to custom fields that have not been created yet. Add ai_industry, ai_intent, and the others in your CRM's property settings before running the write-back.

When to use this vs. alternatives

Use AI enrichment when the signal lives in free text — a project description, a support note, or a job title the model can interpret. This is exactly where rigid rules fall down and a language model shines.
Use a data-enrichment provider (Clearbit, Apollo) when you need verified firmographics — headcount, funding, exact revenue. Those services look companies up in a database; the AI here only infers from the text in front of it, so prefer a database when you need hard facts rather than smart guesses.
Use simple if/else rules when the mapping is fixed — for example, "any .edu email is the education segment". A rule is free, instant, and never wrong, so do not reach for a model when a one-line condition already settles it.

Back to CRM Data Integration with AI.

CRM Data Integration with AI — the main guide covering the full fetch, clean, enrich, and write-back loop.
Sync HubSpot Contacts with Python — pull and update contacts so enriched data has somewhere to land.
Summarize Sales Calls to Your CRM with Python — another AI-to-CRM workflow that pairs naturally with lead enrichment.
Understanding LLM APIs — the foundations of calling AI models from Python.

Enrich CRM Leads with AI in Python

Related pages in this content path

Summarize Sales Calls to Your CRM with Python

Sync HubSpot Contacts with Python