This guide shows you how to connect to three genuinely free AI APIs from Python in under ten minutes, with one connector you can reuse everywhere. You will wire up Groq, OpenRouter, and the Hugging Face Inference API — all of which give you working keys with no credit card — and end with a single function that talks to any of them.
"API" just means a way for your code to send a request to a service and get an answer back. An "AI API" is that, where the service on the other end is a large language model (the kind of model that powers chat assistants). A "free tier" is an allowance you can keep using within set limits, as opposed to a trial that expires.
Prerequisites
You only need a working Python setup and three free accounts. If Python or virtual environments are new to you, start with Create a Python Virtual Environment for AI, then come back.
This guide uses Python 3.10 or newer, the official openai SDK (which works against any OpenAI-compatible endpoint, not just OpenAI), and httpx for the one provider that speaks a different format.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install openai httpx python-dotenv
Now grab one free key from each provider. None of these ask for a card:
- Groq — sign up at console.groq.com and create an API key. Copy it as
GROQ_API_KEY. - OpenRouter — sign up at openrouter.ai, open Keys, and create a key. Copy it as
OPENROUTER_API_KEY. - Hugging Face — sign up at huggingface.co, open Settings then Access Tokens, and create a read token. Copy it as
HUGGINGFACE_API_KEY.
Create a file named .env in your project root and paste them in:
GROQ_API_KEY=gsk_your_groq_key_here
OPENROUTER_API_KEY=sk-or-your_openrouter_key_here
HUGGINGFACE_API_KEY=hf_your_token_here
Add .env to your .gitignore immediately so you never commit your keys. One line in .gitignore does it:
echo ".env" >> .gitignore
A leaked key can be used by strangers and burn through your limits, so this single step matters more than anything else in the guide.
Step 1: Connect to Groq with the openai SDK
Groq runs open models on hardware tuned for speed, and its API copies OpenAI's format exactly. That means you can use the official openai SDK and only change one setting — the base_url — to point it at Groq instead.
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
groq = OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
)
response = groq.chat.completions.create(
model="llama-3.1-8b-instant",
messages=[{"role": "user", "content": "Explain an API in one sentence."}],
)
print(response.choices[0].message.content)
The messages list is the conversation so far. The role of "user" marks your input; the model replies in the choices array, and choices[0].message.content is the text you want. Run the file and you should see a one-sentence answer in well under a second.
Step 2: Connect to OpenRouter the same way
OpenRouter is a single doorway to dozens of models — some paid, several free. Because it is also OpenAI-compatible, the only things that change from Step 1 are the base_url, the key, and the model name. Free models on OpenRouter end in :free.
openrouter = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
response = openrouter.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct:free",
messages=[{"role": "user", "content": "Name three free AI APIs."}],
)
print(response.choices[0].message.content)
This is the payoff of OpenAI-compatible APIs: once you know one, you know all of them. If you want a side-by-side on which of these two to reach for, see Groq vs OpenRouter Free Tier.
Step 3: Connect to Hugging Face Inference with httpx
Hugging Face is the odd one out. Its general Inference API takes a plain inputs string instead of a messages array, and it returns a list of results instead of a choices object. Because it is not OpenAI-compatible, you call it directly with httpx (a modern HTTP library) rather than the openai SDK.
import httpx
HF_URL = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2"
headers = {"Authorization": f"Bearer {os.environ['HUGGINGFACE_API_KEY']}"}
payload = {"inputs": "Explain an API in one sentence."}
response = httpx.post(HF_URL, headers=headers, json=payload, timeout=60)
response.raise_for_status()
print(response.json()[0]["generated_text"])
The model name lives in the URL here, not in the payload. The first call to a model can be slow because Hugging Face may need to load it onto a server, which is why the timeout is set generously to 60 seconds.
Step 4: Wrap all three in one unified connector
You now have three working calls that look slightly different. The point of a connector is to hide those differences behind one function, so the rest of your program can say "ask provider X this prompt" and not care how each API is shaped.
import os
import httpx
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
CLIENTS = {
"groq": OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
),
"openrouter": OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
),
}
MODELS = {
"groq": "llama-3.1-8b-instant",
"openrouter": "meta-llama/llama-3.1-8b-instruct:free",
}
HF_URL = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2"
def ask(provider: str, prompt: str) -> str:
"""Send one prompt to any free provider and return clean text."""
if provider == "huggingface":
headers = {"Authorization": f"Bearer {os.environ['HUGGINGFACE_API_KEY']}"}
response = httpx.post(
HF_URL, headers=headers, json={"inputs": prompt}, timeout=60
)
response.raise_for_status()
return response.json()[0]["generated_text"]
# OpenAI-compatible providers: Groq and OpenRouter
completion = CLIENTS[provider].chat.completions.create(
model=MODELS[provider],
messages=[{"role": "user", "content": prompt}],
)
return completion.choices[0].message.content
Now any of these three lines works, and switching providers is a one-word change:
print(ask("groq", "Give me a fun fact about octopuses."))
print(ask("openrouter", "Give me a fun fact about octopuses."))
print(ask("huggingface", "Give me a fun fact about octopuses."))
Step 5: Add safe retries for rate limits
Free tiers cap how many requests you can send per minute. When you go over, the API replies with a 429 status, which means "too many requests, slow down." The fix is to wait and try again, doubling the wait each time — a pattern called exponential backoff. This keeps short bursts from crashing your script.
import time
from openai import RateLimitError
def safe_ask(provider: str, prompt: str, retries: int = 3) -> str:
for attempt in range(retries):
try:
return ask(provider, prompt)
except (RateLimitError, httpx.HTTPStatusError) as error:
status = getattr(getattr(error, "response", None), "status_code", None)
if status == 429 and attempt < retries - 1:
time.sleep(2 ** attempt) # wait 1s, then 2s, then 4s
continue
raise
raise RuntimeError(f"Failed after {retries} retries")
Call safe_ask exactly like ask, and it will quietly recover from the occasional rate-limit bump. For a deeper look at this error, read Fix the 429 Rate-Limit Error in Python.
Free API quick reference
The three providers differ in how you call them and what they are best at. Keep this table next to you while you experiment.
| Provider | Library to use | Request shape | Reads result from | Best for |
|---|---|---|---|---|
| Groq | openai SDK (base_url set) | messages array | choices[0].message.content | Fastest responses, easiest start |
| OpenRouter | openai SDK (base_url set) | messages array | choices[0].message.content | Trying many models from one key |
| Hugging Face | httpx | inputs string | json()[0]["generated_text"] | Open models for non-chat tasks |
Troubleshooting
A few errors trip up almost everyone on their first run. Here is what each one means and how to clear it.
KeyError: 'GROQ_API_KEY'— Python could not find that key. Cause: your.envfile is missing the line, orload_dotenv()never ran. Fix: confirm the variable name matches exactly and thatload_dotenv()is called before you read the key.401 Unauthorized— the provider rejected your key. Cause: a typo, a trailing space, or a key copied for the wrong provider. Fix: regenerate the key, paste it freshly into.env, and check you are sending it to the matching service. See Fix the 401 Unauthorized Error in OpenAI Python.KeyError: 0orTypeErroron the Hugging Face response — the model returned an object, not the usual list. Cause: the model is still loading, or it returned an error message instead of text. Fix: printresponse.json()to see the raw reply; if it says the model is loading, wait a few seconds and retry.429 Too Many Requests— you hit the free rate limit. Cause: too many calls in a short window. Fix: use thesafe_askretry wrapper from Step 5 and space your calls out.
When to use this vs. alternatives
- Use these free tiers when you are learning, prototyping, or running low-volume personal projects. They cost nothing and are more than fast enough to build real scripts.
- Reach for a paid OpenAI or Anthropic key when you need the strongest reasoning, larger context windows, or higher guaranteed rate limits for production traffic. The trade-offs are laid out in OpenAI vs Anthropic API for Beginners.
- Self-host an open model only once your volume is large enough that per-request fees outweigh the cost and effort of running your own server. For most beginners, that day is a long way off.
Back to Understanding LLM APIs.
Related guides
- Understanding LLM APIs — the main guide this page sits under.
- Groq vs OpenRouter Free Tier — pick between the two fastest free options.
- OpenAI vs Anthropic API for Beginners — when a paid key is worth it.
- Fix the 429 Rate-Limit Error in Python — handle the most common free-tier error.