Open Inference API

Access powerful open-weight language models through an OpenAI-compatible API. Pay only for what you use.

Models & Pricing

Prices in USD per million tokens. Use the model ID shown below each name in your API requests.

ModelProviderInput / M tokensOutput / M tokens
GPT OSS 120B
gpt-oss-120b
OpenAI OSS$0.07$0.27
GPT OSS 20B
gpt-oss-20b
OpenAI OSS$0.05$0.19
GLM-5
glm-5
Zhipu AI$0.75$2.40
GLM-4.7
glm-4.7
Zhipu AI$0.45$1.65

Quick Start

The API is fully compatible with the OpenAI SDK — just swap the base URL and your API key.

Base URL:https://api.dinference.com/v1

Authentication

Pass your API key in the Authorization header of every request:

Authorization: Bearer your-api-key-here

Chat Completions

Send a POST request to /chat/completions with a model ID and messages array.

curl https://api.dinference.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DINFERENCE_API_KEY" \
  -d '{
    "model": "glm-5",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Streaming

Add "stream": true to your request body (or stream=True in the SDK) to receive a server-sent events stream using the standard OpenAI streaming format.

Ready to start?

Create an account and start using the API — no credit card required.