Why does json.loads raise JSONDecodeError on a perfectly good API response?

The HTTP response was valid JSON, but the model's text answer inside it was not. Language models often wrap their JSON in prose or markdown code fences, so the string you pass to json.loads is not pure JSON.

What does 'Expecting value: line 1 column 1 (char 0)' mean?

It means the very first character is not a valid JSON token, usually because the text starts with a word, a backtick fence, or is empty. Print the raw string before parsing to see what the model actually returned.

How do I force a model to return only JSON?

Set response_format to {'type': 'json_object'} on chat completions for OpenAI-compatible APIs and instruct the model to reply with JSON. This is the most reliable fix because the provider guarantees syntactically valid JSON.

Why do streamed responses break json.loads?

Streaming sends the answer in small chunks, so each chunk is only a fragment of the final JSON. You must collect every chunk into one string and parse only after the stream finishes.

Should I validate the parsed JSON even after json.loads succeeds?

Yes. json.loads only checks syntax, not whether the keys and types are what your code expects. A schema check with pydantic catches missing fields and wrong types before they cause errors deeper in your program.

Fix JSONDecodeError with AI API Responses

This guide shows you how to stop json.JSONDecodeError when you ask a language model (the AI behind tools like ChatGPT) for JSON and Python refuses to read it. You will get four ordered fixes with runnable Python 3.10+ code, and you can apply the first one in under five minutes.

The trap is subtle. The API call itself succeeds, and the HTTP response is valid JSON. But the text the model wrote inside that response is not — it added a sentence, a markdown fence, or only sent half of it. When you feed that text to json.loads, Python objects. This is one of the most common errors people hit right after they finish Understanding LLM APIs, so it is worth fixing properly once.

The exact error you are seeing

You wrote something like this and it blew up:

import json

reply = response.choices[0].message.content
data = json.loads(reply)   # <- raises here

Traceback (most recent call last):
  File "app.py", line 12, in <module>
    data = json.loads(reply)
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

json.JSONDecodeError is the exception json.loads raises when a string is not valid JSON. The message tells you where parsing failed, which is the fastest clue to what the model did wrong. line 1 column 1 (char 0) means it failed on the very first character — typically because the model started with a word like Here or with a backtick before any {. A message like Extra data: line 5 column 1 means the JSON itself was fine but the model kept typing afterwards, so the parser hit unexpected text once the object closed. An Expecting ',' delimiter or Unterminated string message usually means the response was cut off partway through, which is common with streaming or a too-short token limit.

Note that requests.exceptions.JSONDecodeError (raised by response.json()) is a subclass of the same standard-library exception, so the same diagnosis applies whether you parse the body yourself or let the SDK do it.

The first thing to do, always, is look at the raw string:

print(repr(reply))

repr() shows you hidden characters and the exact wrapper text, which tells you which of the fixes below you need.

Quick reference: cause to fix

What the model returned	Error you get	Fix
Prose then JSON (`Sure! Here is...`)	`Expecting value: line 1 column 1`	JSON mode (Fix 1) or extraction (Fix 2)
Markdown fence (```json)	`Expecting value: line 1 column 1`	Strip fences (Fix 2)
Valid JSON plus trailing text	`Extra data: line N`	Extraction (Fix 2)
Half a response from streaming	`Expecting ',' delimiter` / truncated	Join chunks before parsing (Fix 2)
JSON parses but a key is missing	No `JSONDecodeError`, fails later	Schema validation (Fix 3)

Prerequisites

You need Python 3.10+, the openai SDK, and pydantic for the validation step. Install them into a virtual environment so they stay isolated — if you have not set one up yet, see Create a Python Virtual Environment for AI.

pip install openai pydantic python-dotenv

Put your key in a .env file:

OPENAI_API_KEY=sk-your-key-here

Add .env to your .gitignore so you never commit your key. If your key itself is rejected, the symptom is different — see Fix the 401 Unauthorized Error in OpenAI Python.

Fix 1: Turn on JSON mode so the provider guarantees valid JSON

The strongest fix is to stop the model from ever wrapping its answer. OpenAI-compatible chat APIs accept a response_format parameter. Set it to {"type": "json_object"} and the provider guarantees the reply is syntactically valid JSON — no fences, no prose.

There is one rule: your prompt must contain the word "JSON", or the API rejects the request. Telling the model the shape you want is good practice anyway. Writing prompts that pin down the output shape is a skill in itself, covered in Write System Prompts that Control Output Format.

import json
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},  # the key line
    messages=[
        {"role": "system", "content": "Reply only with JSON."},
        {"role": "user", "content": "Give name and age for a fictional pilot as JSON."},
    ],
)

data = json.loads(response.choices[0].message.content)  # safe now
print(data)

With JSON mode on, json.loads will not raise a JSONDecodeError from formatting noise, because the provider validates the JSON on its side before sending it back. Use this whenever your provider supports it. Two caveats worth knowing: JSON mode guarantees syntax only, not that the keys match what you asked for — that is what Fix 3 handles — and it does not stop a response from being cut off if you set the token limit too low, which is what Fix 2's streaming path and the context-length guide below address. The next fix is your safety net for the providers and models that ignore response_format entirely.

Fix 2: Extract JSON robustly when the model still adds noise

Some models, free tiers, and local servers ignore or do not support response_format. For those, clean the string before you parse it. Two patterns cover almost every case: strip markdown code fences, then slice out the first complete JSON object using the positions of the first { and last }.

import json
import re


def extract_json(text: str) -> dict:
    """Pull a JSON object out of model text that may include prose or fences."""
    text = text.strip()

    # 1. Remove a leading/trailing markdown code fence if present.
    fence = re.match(r"^```(?:json)?\s*(.*?)\s*```$", text, re.DOTALL)
    if fence:
        text = fence.group(1).strip()

    # 2. Try parsing the cleaned text directly.
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass

    # 3. Fall back to slicing the outermost { ... } and parse that.
    start, end = text.find("{"), text.rfind("}")
    if start != -1 and end != -1 and end > start:
        return json.loads(text[start : end + 1])

    raise ValueError(f"No JSON object found in: {text!r}")

messy = "Sure! Here is the data:\n```json\n{\"name\": \"Mara\", \"age\": 34}\n```\nHope that helps!"
print(extract_json(messy))   # {'name': 'Mara', 'age': 34}

This also handles streaming. When you stream a response, each chunk is only a fragment, so you must join every chunk into one string and parse after the loop ends — never inside it:

chunks = []
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    stream=True,
    messages=[{"role": "user", "content": "Return a JSON object with one key 'ok'."}],
)
for event in stream:
    piece = event.choices[0].delta.content
    if piece:
        chunks.append(piece)

data = extract_json("".join(chunks))   # parse once, at the end

Fix 3: Validate the shape with a pydantic schema

json.loads only checks syntax. A reply like {"age": "thirty"} parses fine, then crashes later when your code does math on a string. pydantic lets you declare the exact shape you expect and validates in one line, with a clear error when the data is wrong.

from pydantic import BaseModel, ValidationError


class Pilot(BaseModel):
    name: str
    age: int


raw = extract_json('{"name": "Mara", "age": "34"}')

try:
    pilot = Pilot.model_validate(raw)   # coerces "34" -> 34, or raises
    print(pilot.name, pilot.age + 1)
except ValidationError as err:
    print("Bad shape from model:", err)

model_validate will coerce a clean numeric string into an int, but reject genuine nonsense — giving you a precise message instead of a confusing crash three functions later.

Fix 4: Retry automatically when parsing fails

Even with the fixes above, a model occasionally returns garbage. Instead of crashing the whole run, catch the failure and ask again. The loop below combines all four ideas: it requests JSON mode, extracts robustly, validates with pydantic, and retries on any failure.

import json
import os
from openai import OpenAI
from pydantic import BaseModel, ValidationError
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])


class Pilot(BaseModel):
    name: str
    age: int


def get_pilot(prompt: str, max_tries: int = 3) -> Pilot:
    """Ask for a Pilot as JSON, retrying if parsing or validation fails."""
    for attempt in range(1, max_tries + 1):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            response_format={"type": "json_object"},
            messages=[
                {"role": "system", "content": "Reply only with JSON: keys 'name' (string) and 'age' (integer)."},
                {"role": "user", "content": prompt},
            ],
        )
        text = response.choices[0].message.content
        try:
            return Pilot.model_validate(extract_json(text))
        except (json.JSONDecodeError, ValueError, ValidationError) as err:
            print(f"Attempt {attempt} failed: {err}")
            if attempt == max_tries:
                raise

    raise RuntimeError("unreachable")


pilot = get_pilot("Invent a fighter pilot as JSON.")
print(pilot.model_dump())

This pattern is production-grade: one bad reply no longer takes down your script. If you start hitting 429 responses because retries fire too often, slow them down with the techniques in Fix the 429 Rate-Limit Error in Python.

Key parameters

Parameter	Where	Effect
`response_format={"type": "json_object"}`	`chat.completions.create`	Provider guarantees syntactically valid JSON; prompt must mention "JSON"
`max_tries`	your retry loop	How many times to re-prompt before raising — 3 is a sane default
`model_validate()`	pydantic model	Validates and coerces the parsed dict against your declared schema

Troubleshooting

Expecting value: line 1 column 1 (char 0) — The string starts with prose or a backtick, or is empty. Cause: the model ignored JSON mode or you forgot it. Fix: turn on Fix 1, and wrap parsing in extract_json from Fix 2.
Extra data: line N column 1 — Valid JSON followed by trailing text like "Hope that helps!". Cause: the model kept talking after the object. Fix: use extract_json, which slices to the last } and ignores the rest.
This model's prompt must contain the word 'json' — The API rejected the request, not the parse. Cause: response_format JSON mode requires the literal word "JSON" somewhere in your messages. Fix: add "Reply with JSON" to your system prompt.
Parses fine but KeyError or wrong type later — json.loads passed, but a field is missing or a string where you expected a number. Cause: you validated syntax, not shape. Fix: add the pydantic check from Fix 3 right after parsing.

When to use this vs. alternatives

Use JSON mode (Fix 1) by default when your provider supports response_format — it removes the problem at the source and needs the least code.
Use extraction (Fix 2) when you are on a free tier, a local model, or an OpenAI-compatible endpoint that ignores response_format. Comparing those options is covered in Best Free AI APIs for Beginners.
Add pydantic plus retry (Fixes 3-4) for anything running unattended — a scheduled job, a chatbot, or a SaaS endpoint — where a single malformed reply must not crash the run. If the model returns truncated JSON because the answer is too long, that is a different problem; see Fix the Context-Length-Exceeded Error in Python.

Back to Understanding LLM APIs.

Understanding LLM APIs — the main guide for this section.
Fix the 401 Unauthorized Error in OpenAI Python — when the key itself is rejected.
Fix the 429 Rate-Limit Error in Python — slow down retries that trip rate limits.
Fix the Context-Length-Exceeded Error in Python — when responses are truncated.
Write System Prompts that Control Output Format — get clean JSON at the source.

Fix JSONDecodeError with AI API Responses in Python

Related pages in this content path

Best Free AI APIs for Beginners: A Python Quickstart

Fix the 401 Unauthorized Error in OpenAI Python

Fix the 429 Rate-Limit Error in Python

Fix the Context-Length-Exceeded Error in Python