ModelBridge provides a fully OpenAI-compatible API. If you've used the OpenAI SDK before, you already know how to use us — just change the base URL.
Get your first response in under 3 minutes. No credit card required.
Visit /auth/register.html and sign up with your email. No credit card needed.
After login, go to the Dashboard. Your API key (starts with mb-) is displayed at the top.
Copy the code below (Python / curl / Node.js) and replace YOUR_API_KEY with your actual key.
Visit the Dashboard anytime to view your token usage, remaining quota, and billing status.
from openai import OpenAI client = OpenAI( api_key="YOUR_API_KEY", base_url="https://aibridge-api.com/v1", ) response = client.chat.completions.create( model="deepseek-chat", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)
✅ Expected response: a normal chat reply from DeepSeek V3. If you get 401, double-check your API key.
mb-xxxxxxxxxxxxxxxxx) is displayed at the top of the dashboardAll requests require an API key in the Authorization header:
Authorization: Bearer mb-xxxxxxxxxxxxx
https://aibridge-api.com/v1
| Method | Endpoint | Description |
|---|---|---|
| GET | /v1/models | List available models |
| POST | /v1/chat/completions | Chat completion (supports streaming) |
| POST | /v1/embeddings | Generate text embeddings |
| GET | /health | Service health check |
| Model ID | Type | Context Window | Best For |
|---|---|---|---|
deepseek-chat | Chat | 64K | General purpose, coding (V3) |
deepseek-reasoner | Reasoning | 64K | Complex reasoning, math, logic |
deepseek-coder | Coding | 64K | Code generation & debugging (V2.5) |
deepseek-v4-pro | Chat | 128K | Flagship reasoning model, top-tier performance |
deepseek-v4-flash | Chat | 128K | Fast & lightweight, quick responses |
qwen-max | Chat | 32K | Multilingual tasks, long context |
qwen-plus | Chat | 131K | Cost-effective general usage |
qwen3-235b-a22b | Chat | 128K | Flagship Qwen3 model, best overall |
glm-4-plus | Chat | 128K | Advanced reasoning, complex tasks |
glm-4-air | Chat | 128K | Balanced performance & speed |
glm-4-flash | Chat | 128K | Fast & lightweight, cost-effective |
moonshot-v1-8k | Chat | 8K | Quick conversations |
moonshot-v1-32k | Chat | 32K | Medium context length |
moonshot-v1-128k | Chat | 128K | Long documents, deep analysis |
The primary endpoint for generating text responses. Supports both regular and streaming (SSE) modes.
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model ID (e.g. deepseek-chat) |
| messages | array | Required | List of message objects. Each has role (system/user/assistant) and content (string) |
| temperature | float | Optional | Sampling temperature (0–2). Higher = more random. Default: 1.0 |
| max_tokens | int | Optional | Max tokens to generate. Default varies by model |
| top_p | float | Optional | Nucleus sampling (0–1). Default: 1.0 |
| stream | boolean | Optional | If true, returns SSE stream. Default: false |
| stop | string/array | Optional | Up to 4 sequences where the API will stop generating |
| presence_penalty | float | Optional | Penalty for repeated tokens (-2 to 2). Default: 0 |
| frequency_penalty | float | Optional | Penalty based on token frequency (-2 to 2). Default: 0 |
| Field | Type | Description |
|---|---|---|
| id | string | Unique completion ID (e.g. chatcmpl-xxxxx) |
| object | string | Always "chat.completion" |
| created | int | Unix timestamp of creation |
| model | string | Model ID used for the completion |
| choices | array | List of completion choices. Each has index, message, finish_reason |
| usage | object | Token usage: prompt_tokens, completion_tokens, total_tokens |
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."},
],
temperature=0.7,
max_tokens=500,
)
print(response.choices[0].message.content)
print("Tokens used:", response.usage.total_tokens)
Set stream: true to receive tokens as they are generated:
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Generate vector embeddings from text. Compatible with OpenAI's /v1/embeddings endpoint.
response = client.embeddings.create(
model="text-embedding-ada-002",
input="The food was delicious",
)
print(response.data[0].embedding) # List[float]
print(response.usage.prompt_tokens)
| Plan | Requests / Min (RPM) | Tokens / Min (TPM) |
|---|---|---|
| Free | 20 | 50,000 |
| Pro | 60 | 200,000 |
| Custom | Custom | Custom |
If you exceed rate limits, you'll receive a 429 error. Implement exponential backoff in production.
| Status Code | Error | Cause | Solution |
|---|---|---|---|
| 401 | Unauthorized | Invalid or missing API key | Check your Authorization header format: Bearer mb-xxx |
| 404 | Not Found | Invalid endpoint or model ID | Verify the URL and model name |
| 429 | Rate Limited | Too many requests | Reduce frequency; upgrade plan if needed |
| 500 | Internal Error | Upstream service issue | Retry after a few seconds |
| 502 | Bad Gateway | Upstream unavailable | Upstream AI provider is down |
| 504 | Gateway Timeout | Request took too long | Try with shorter prompts |