Content & Marketing

AI Image Generation with Python and DALL·E

You need a dozen on-brand images for a campaign — a hero banner, a few social squares, a blog header — and you do not have a designer on call. Stock photos look generic, and a design tool eats an afternoon. With about forty lines of Python and the OpenAI images API, you can describe each image in plain English, generate it, resize it for every platform, and repeat the whole run tomorrow with new prompts. This guide shows you exactly how, even if you have never written an image-generation script before.

This is the visual half of AI Content Creation & Marketing Automation. The text half — headlines, captions, and body copy — lives in AI Copywriting Workflows, and once your images exist you can hand them off to Automated Social Media Posting to publish them on a schedule.

Who this is for and what you will build

This guide is for creators, marketers, and founders who can run a Python script but are not professional developers. By the end you will have a small, reusable image generator that:

  • Turns a text prompt into a saved PNG file.
  • Lets you control the image size, quality, and background.
  • Cleans up the output with Pillow — crop, resize, convert, watermark.
  • Runs over a list of prompts to produce a full set of assets in one pass.

We use the openai SDK throughout, because the same models that power DALL·E are reachable through one consistent Python interface. The newest model is gpt-image-1; the older dall-e-3 model uses the same client.images.generate call, so you can switch between them by changing one string.

How AI image generation actually works

You do not need the math, but a one-paragraph mental model saves a lot of confused debugging. A text-to-image model has been trained on millions of image-and-caption pairs, so it has learned the statistical link between words and visual patterns. When you send a prompt, the model starts from random noise and gradually refines it into a picture that matches the words you gave it — a process called diffusion. Two practical consequences fall out of this. First, the same prompt can produce slightly different images each run, because the starting noise is random; that is a feature, not a bug, and it is how you get variations. Second, the model only knows what your words describe, so the quality of your output is mostly the quality of your prompt. The rest of this guide leans on that: most of your iteration time goes into wording, and Python handles the repetitive mechanics of calling, saving, and resizing.

Because the heavy computation happens on OpenAI's servers, your script does only three small jobs every time: send the prompt, wait for the response, and write the returned bytes to a file. That is why none of the code below needs a GPU or any special hardware — your laptop is just a thin client making an API call.

Prompt to assets image pipeline A text prompt flows into the OpenAI image model, which returns variants that Pillow post-processes into platform-ready assets. Text prompt plain English OpenAI image gpt-image-1 Variant 1 Variant 2 Variant n Assets Pillow
One prompt becomes several variants, which Pillow trims and resizes into platform-ready assets.

Prerequisites

You need Python 3.10 or newer and a paid OpenAI account with image access enabled (image models are not on the free tier). If Python is not set up yet, start with Setting Up Python for AI, then come back here.

Create a project folder and a virtual environment so these packages stay isolated from the rest of your system:

python -m venv .venv
source .venv/bin/activate    # on Windows: .venv\Scripts\activate
pip install openai pillow python-dotenv

openai is the official SDK that talks to the image model. pillow (imported as PIL) is the standard Python library for opening, cropping, and saving images. python-dotenv loads your secret API key from a file so it never appears in your code.

Create a file named .env next to your script and paste your key into it:

OPENAI_API_KEY=sk-your-real-key-here

Immediately add .env to your .gitignore file so the key is never committed to version control or pushed to GitHub. A leaked key can run up real charges on your account. If you are new to API keys and authentication, Understanding LLM APIs walks through the whole flow.

One more thing before you write code: image generation is billed per image, with the price rising as you increase size and quality. A single test image costs cents, but a careless loop over a thousand prompts at high quality adds up. Set a spending limit in your OpenAI account dashboard, and during development always test prompts at low quality and small batches. You only spend on the final, polished renders once you are confident the wording is right.

Load the key once at the top of every script:

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

load_dotenv() reads the .env file into your environment, and OpenAI(...) opens an authenticated connection. You will reuse this client object in every step below.

Step 1: Generate your first image

The image model takes a text prompt and returns the picture as base64-encoded data — a long text string that represents the raw bytes of the file. You decode that string and write it to disk. Here is the smallest complete example:

import base64
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))


def generate_image(prompt: str, output_path: str) -> None:
    """Generate one image from a prompt and save it as a file."""
    result = client.images.generate(
        model="gpt-image-1",
        prompt=prompt,
        size="1024x1024",
    )
    image_bytes = base64.b64decode(result.data[0].b64_json)
    with open(output_path, "wb") as handle:
        handle.write(image_bytes)
    print(f"Saved {output_path}")


generate_image(
    "A friendly robot watering a small plant on a sunny desk, flat illustration",
    "robot_plant.png",
)

Run it with python your_script.py. After a few seconds you will have robot_plant.png in your folder. A few details worth knowing:

  • model="gpt-image-1" selects the current image model. Swap in "dall-e-3" to use the older one — the rest of the call is the same, though dall-e-3 only allows n=1.
  • result.data[0] is the first (and here, only) image returned. The .b64_json field holds the encoded image text.
  • Be specific in your prompt. "Flat illustration", "studio lighting", "muted pastel palette", and "no text" all steer the result. Vague prompts give vague images.

Writing prompts that produce usable assets

The single biggest lever on your results is the prompt. A useful prompt usually names four things in plain language: the subject (what is in the frame), the style (illustration, photo, 3D render, watercolor), the composition (top-down, close-up, centered, lots of empty space), and the mood or palette (warm, muted, high-contrast, pastel). A prompt like "A ceramic coffee mug on a wooden table, product photography, soft morning light, shallow depth of field, lots of negative space on the right" gives the model far more to work with than "a coffee mug".

Two habits make a real difference for marketing work. Add "no text" or "no lettering" when you plan to overlay your own headline later, because models often render garbled fake text inside images. And keep a short list of style phrases that match your brand — your go-to palette, lighting, and finish — so every batch looks like it belongs to the same family. You can even have a language model draft and tighten these prompts for you; the techniques in AI Copywriting Workflows apply directly to writing image prompts, not just body copy.

Step 2: Control size, quality, and style

The same generate call accepts parameters that change the shape, polish, and look of the output. The three you will reach for most are size, quality, and background.

def generate_styled(prompt: str, output_path: str) -> None:
    result = client.images.generate(
        model="gpt-image-1",
        prompt=prompt,
        size="1536x1024",       # wide landscape, good for blog headers
        quality="high",         # more detail, higher cost
        background="transparent",  # PNG with no backdrop, for logos and overlays
        n=1,
    )
    image_bytes = base64.b64decode(result.data[0].b64_json)
    with open(output_path, "wb") as handle:
        handle.write(image_bytes)


generate_styled(
    "A minimalist line-art coffee cup icon, single color, no background",
    "coffee_icon.png",
)

How to think about each one:

  • Size sets the aspect ratio. Use 1024x1024 for social squares, 1536x1024 for wide banners, and 1024x1536 for portrait stories. Pick the closest ratio to your target, then crop exactly in Step 3.
  • Quality trades cost for detail. Start at low while you iterate on prompts, then switch to high for the final render so you are not paying premium rates for throwaway tests.
  • Background can be transparent for icons and logos you want to layer over other designs, or opaque for a normal filled image. Transparent backgrounds only work with PNG output.
  • Style is expressed in the prompt itself, not a separate parameter — words like "photorealistic", "watercolor", or "3D render" do the work.

Step 3: Post-process with Pillow

The model gives you a clean image, but real assets need exact dimensions, a specific format, or a watermark. Pillow handles all of that. The most common job is cropping a square down to a precise platform size:

from PIL import Image


def crop_to_size(input_path: str, output_path: str, size: tuple[int, int]) -> None:
    """Resize and center-crop an image to an exact width and height."""
    img = Image.open(input_path).convert("RGB")
    target_w, target_h = size
    # Scale so the image fully covers the target box, then crop the overflow.
    scale = max(target_w / img.width, target_h / img.height)
    new_size = (round(img.width * scale), round(img.height * scale))
    img = img.resize(new_size, Image.Resampling.LANCZOS)
    left = (img.width - target_w) // 2
    top = (img.height - target_h) // 2
    img = img.crop((left, top, left + target_w, top + target_h))
    img.save(output_path, "JPEG", quality=90)


crop_to_size("robot_plant.png", "robot_instagram.jpg", (1080, 1080))

This scales the image up just enough to cover the target box, then trims the edges so you get an exact 1080x1080 with no stretching. To add a simple text watermark, draw onto the image before saving:

from PIL import Image, ImageDraw, ImageFont


def add_watermark(input_path: str, output_path: str, text: str) -> None:
    img = Image.open(input_path).convert("RGBA")
    draw = ImageDraw.Draw(img)
    font = ImageFont.load_default(size=36)
    draw.text((20, img.height - 60), text, fill=(255, 255, 255, 200), font=font)
    img.convert("RGB").save(output_path, "PNG")


add_watermark("robot_instagram.jpg", "robot_branded.png", "@yourbrand")

For platform-specific sizing — exact YouTube dimensions, safe zones, and overlay text — the Create YouTube Thumbnails with DALL·E 3 and Python guide builds on this same Pillow pattern.

A few Pillow habits keep your assets clean. Always convert("RGB") before saving a JPEG, because the model can return an image with an alpha (transparency) channel that JPEG cannot store, and the save will fail otherwise. Save your master copies as PNG to avoid the slight quality loss that JPEG compression introduces, then export JPEGs only for the final platform versions where file size matters. And when you crop, scale the image to cover the target box rather than fit inside it — covering and trimming the overflow, as the code above does, fills the whole frame without leaving empty bars or distorting the picture. These small rules are the difference between assets that look hand-finished and ones that look mechanically resized.

Step 4: Batch-generate a set of images

The real time saving comes from generating many images in one run. With gpt-image-1 you can ask for several variations of a single prompt by setting n, or loop over a list of different prompts to build a whole campaign.

import base64


def generate_batch(prompts: list[str], folder: str = "output") -> None:
    """Generate one image per prompt and save them with tidy filenames."""
    os.makedirs(folder, exist_ok=True)
    for index, prompt in enumerate(prompts, start=1):
        result = client.images.generate(
            model="gpt-image-1",
            prompt=prompt,
            size="1024x1024",
            quality="medium",
        )
        image_bytes = base64.b64decode(result.data[0].b64_json)
        path = os.path.join(folder, f"asset_{index:02d}.png")
        with open(path, "wb") as handle:
            handle.write(image_bytes)
        print(f"Saved {path}")


prompts = [
    "A cozy reading nook with autumn light, warm illustration",
    "A sleek laptop on a marble desk, product photography",
    "An abstract gradient background in teal and coral, no text",
]
generate_batch(prompts)

Each prompt becomes one numbered PNG inside an output folder. The {index:02d} format pads numbers to two digits (asset_01.png, asset_02.png) so files sort correctly. If you need dozens of product shots from a spreadsheet rather than a hardcoded list, the Batch-Generate Product Images with DALL·E and Python guide drives the same loop from a CSV file.

When you batch, generate at medium quality first to confirm the prompts look right, then re-run the winners at high for the final assets. Generating many images quickly can trip rate limits — if that happens, add a short time.sleep() between calls, as covered in Fix the 429 Rate-Limit Error in Python.

Two more practices pay off as soon as your batches grow past a handful of images. First, wrap each generate call in a try/except block so a single failed prompt does not abort the whole run — log the failure, keep going, and re-run the misses afterward:

import time

for index, prompt in enumerate(prompts, start=1):
    try:
        result = client.images.generate(
            model="gpt-image-1", prompt=prompt, size="1024x1024"
        )
        # ... decode and save as before ...
    except Exception as error:
        print(f"Prompt {index} failed: {error}")
        time.sleep(2)   # brief pause before the next attempt
        continue

Second, give your files meaningful names. asset_01.png works for a throwaway run, but for an ongoing library a name that encodes the campaign, the platform, and the date — summer-sale_instagram_2026-06-18.png — makes assets findable months later without opening every file. A tiny naming convention now saves real searching effort once you have generated hundreds of images.

Choosing between gpt-image-1 and dall-e-3

Both models run from the same client.images.generate call, so switching is a one-line change. The differences that matter in practice:

  • gpt-image-1 is the newer model. It follows complex prompts more faithfully, can render legible text when you actually want it, supports transparent backgrounds, and lets you request several images in one call with n. Use it as your default.
  • dall-e-3 is older and slightly cheaper for some sizes. It is capped at one image per call (n=1), so a batch means a loop of separate requests. Reach for it only if you have an existing script built around it or a specific cost reason.

A common pattern is to draft and preview at low quality on gpt-image-1, lock the prompt, then render finals at high. Because both models share the same parameters, you never rewrite your pipeline — you only change the model and quality strings. If you are weighing OpenAI against other providers more broadly, OpenAI vs Anthropic API for Beginners covers how to think about that choice for whole projects.

Parameter reference

These are the parameters you pass to client.images.generate. Defaults reflect the gpt-image-1 model.

ParameterTypeDefaultEffect
modelstringnone (required)Which model to use: gpt-image-1 (newest) or dall-e-3 (older).
promptstringnone (required)The plain-English description of the image to create.
sizestring1024x1024Output dimensions: 1024x1024, 1024x1536, or 1536x1024.
qualitystringmediumDetail and cost level: low, medium, or high.
ninteger1Number of images to return per call (gpt-image-1 only; dall-e-3 is fixed at 1).
backgroundstringopaquetransparent produces a PNG with no backdrop; opaque fills it.
output_formatstringpngFile format of the returned bytes: png, jpeg, or webp.

Troubleshooting

These are the errors you are most likely to hit, with the exact cause and a one-line fix.

  1. AuthenticationError: Incorrect API key provided — Your key is missing or wrong. Confirm .env sits in the folder you run the script from and the line reads OPENAI_API_KEY=sk-.... See Fix the 401 Unauthorized Error in OpenAI Python.
  2. BadRequestError: ... safety system — Your prompt was blocked for policy reasons (real people, violence, trademarks). Rewrite it to describe a generic subject and try again.
  3. RateLimitError: ... requests per minute — You sent calls faster than your tier allows. Add import time; time.sleep(2) inside your batch loop, or request fewer images at once.
  4. TypeError: 'NoneType' object is not subscriptable on b64_json — You requested a URL response instead of base64, so b64_json is empty. With gpt-image-1 the image is returned as base64 by default; make sure you are reading result.data[0].b64_json.
  5. PIL.UnidentifiedImageError — Pillow could not open the file because the write failed or the path is wrong. Confirm the generate step finished and the PNG actually exists before you post-process it.
  6. BadRequestError: invalid size — You passed a size the model does not support, such as 512x512. Use one of the listed sizes, then resize down with Pillow in Step 3.

Worked example: a full campaign image script

This script ties everything together. It reads a list of prompts, generates each image, crops it to a square social size, adds a watermark, and saves the finished files into a dated folder — a complete run you can adapt for any campaign.

import base64
import os
from datetime import date
from dotenv import load_dotenv
from openai import OpenAI
from PIL import Image, ImageDraw, ImageFont

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

CAMPAIGN_PROMPTS = [
    "A warm flat-lay of artisan coffee and pastries, top-down, no text",
    "A bright minimal workspace with a notebook and plant, soft daylight",
    "A bold abstract gradient in orange and purple, smooth, no text",
]
WATERMARK = "@yourbrand"
TARGET = (1080, 1080)


def make_asset(prompt: str, out_path: str) -> None:
    """Generate, crop, and watermark a single campaign image."""
    result = client.images.generate(
        model="gpt-image-1", prompt=prompt, size="1024x1024", quality="high"
    )
    raw = base64.b64decode(result.data[0].b64_json)
    tmp = out_path + ".tmp.png"
    with open(tmp, "wb") as handle:        # save the raw model output first
        handle.write(raw)

    img = Image.open(tmp).convert("RGB")   # crop to an exact square
    scale = max(TARGET[0] / img.width, TARGET[1] / img.height)
    img = img.resize((round(img.width * scale), round(img.height * scale)),
                     Image.Resampling.LANCZOS)
    left, top = (img.width - TARGET[0]) // 2, (img.height - TARGET[1]) // 2
    img = img.crop((left, top, left + TARGET[0], top + TARGET[1]))

    draw = ImageDraw.Draw(img)             # stamp the brand watermark
    draw.text((24, TARGET[1] - 56), WATERMARK,
              fill=(255, 255, 255), font=ImageFont.load_default(size=34))
    img.save(out_path, "JPEG", quality=92)
    os.remove(tmp)
    print(f"Done {out_path}")


def run() -> None:
    folder = f"campaign_{date.today().isoformat()}"
    os.makedirs(folder, exist_ok=True)
    for i, prompt in enumerate(CAMPAIGN_PROMPTS, start=1):
        make_asset(prompt, os.path.join(folder, f"post_{i:02d}.jpg"))


if __name__ == "__main__":
    run()

Save it, run python campaign.py, and you will get a folder like campaign_2026-06-18 holding three finished, watermarked, exactly-sized social images. Change the prompts list and run it again for the next campaign.

Next steps

Now that you can generate, style, and batch images, follow these guides to apply the skill to specific jobs:

  1. Make click-worthy video art with Create YouTube Thumbnails with DALL·E 3 and Python.
  2. Produce a full catalog from a spreadsheet with Batch-Generate Product Images with DALL·E and Python.
  3. Pair your images with captions from AI Copywriting Workflows, then publish both through Automated Social Media Posting.

Back to AI Content Creation & Marketing Automation.