API Documentation
Sage inference proxy — OpenAI-compatible chat completions with passkey auth, usage tracking, and credit-based rate limiting.
Overview
Sage is an AI inference proxy deployed as a Cloudflare Worker. It provides OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages endpoints, proxying requests to DeepSeek, OpenAI, Anthropic, and Groq based on the model name. Authentication uses WebAuthn passkeys (Touch ID, Face ID, security keys) with session cookies and API keys.
Base URL: https://sage-api.devblocktechnologies.com
Provider Routing
Sage automatically routes your request to the correct upstream provider based on the model field in the request body. No configuration changes needed — just specify the model name.
| Model Prefix | Upstream Provider | Endpoint |
|---|---|---|
gpt-*, o1-*, o3-*, o4-* | OpenAI | api.openai.com/v1/chat/completions |
claude-* | Anthropic | api.anthropic.com/v1/messages |
llama-*, mixtral-*, gemma-*, deepseek-r1 | Groq | api.groq.com/openai/v1/chat/completions |
deepseek-* | DeepSeek | api.deepseek.com/v1/chat/completions |
When using /v1/chat/completions with a claude-* model, Sage automatically translates the request to Anthropic's /v1/messages format, including extracting system prompts and remapping parameters.
Authentication
Sage supports three authentication methods:
1. Session Cookie Browser
After signing in via passkey, the server sets an HttpOnly; Secure; SameSite=Lax session cookie (sage_session). All subsequent requests from the browser automatically include this cookie. Sessions expire after 30 days.
2. Bearer Token API
Include a license key or API key in the Authorization header:
# License key (internal)
curl -H "Authorization: Bearer sage_abc123..." https://sage-api.../v1/chat/completions
# API key (user-created)
curl -H "Authorization: Bearer sk-abc123..." https://sage-api.../v1/chat/completions
3. Device Code Desktop
Generate a short-lived device code from the dashboard, enter it in the Sage Desktop app. The code is exchanged for a session token.
Rate Limits
Sage enforces two independent limits: requests per minute and weighted tokens per window.
Request Rate Limits
| Lifetime Credits | Requests/min |
|---|---|
| Free trial | 20 |
| Starter ($10+) | 60 |
| Growth ($25+) | 60 |
| Scale ($50+) | 120 |
| Enterprise ($100+) | ∞ |
Weighted Token Limits
Tokens are weighted by model cost (Claude tokens count 15× more than Flash tokens).
| Lifetime Credits | 5-hour window | 7-day window |
|---|---|---|
| Free trial | 500K | 5M |
| Starter | 300K | 10M |
| Growth | 1M | 36M |
| Scale | 1M | 36M |
| Enterprise | ∞ | ∞ |
Model Weights
| Model | Weight |
|---|---|
| deepseek-v4-flash | 1.0× |
| deepseek-v4-pro | 5.3× |
| deepseek-reasoner | 5.5× |
| llama-3.1-8b-instant | 0.2× |
| gpt-4o-mini | 1.5× |
| llama-3.3-70b-versatile | 1.5× |
| claude-sonnet-4-20250514 | 15.0× |
| gpt-4o | 25.0× |
POST /v1/chat/completions Auth
OpenAI-compatible chat completions. Routes to the correct upstream provider based on model (see Provider Routing).
Headers
| Name | Required | Description |
|---|---|---|
| Authorization | Yes | Bearer <license-key> or Bearer <api-key> |
| Content-Type | Yes | application/json |
| X-Sage-Agent | No | Client identifier (sage-desktop, claude-code, hermes-agent) |
Request Body
{
"model": "deepseek-v4-pro",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": false,
"max_tokens": 1024,
"temperature": 0.7
}
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "deepseek-v4-pro",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 10, "completion_tokens": 25}
}
POST /v1/messages Auth
Anthropic-compatible messages endpoint. Routes to the correct upstream provider based on model (see Provider Routing). With a claude-* model, proxies directly to Anthropic's API.
Headers
| Name | Required | Description |
|---|---|---|
| Authorization | Yes | Bearer <license-key> |
| Content-Type | Yes | application/json |
| Anthropic-Version | No | API version date (default: 2023-06-01) |
POST /api/signup Public
Create a new account. Returns a signup token used for passkey registration.
curl -X POST /api/signup -H "Content-Type: application/json" -d '{"email":"you@example.com"}'
Response (201)
{"signupToken":"...", "email":"you@example.com"}
POST /api/webauthn/register-begin Auth
Start WebAuthn passkey registration. Returns credential creation options.
Headers
Authorization: Bearer signup_<signupToken> (during signup) or standard Bearer auth.
Response
{"publicKey":{...}, "challengeId":"..."}
POST /api/webauthn/register-complete Auth
Complete passkey registration with the authenticator response. Sets session cookie on success.
Request Body
{
"id": "credential-id",
"challengeId": "from-register-begin",
"rawId": "...",
"response": {
"attestationObject": "...",
"clientDataJSON": "..."
},
"type": "public-key"
}
Response (200)
{"registered":true, "sessionToken":"...", "email":"...", "credit_balance":200}
POST /api/webauthn/auth-begin Public
Start WebAuthn authentication. Returns assertion options with a challenge.
curl -X POST /api/webauthn/auth-begin
Response
{"publicKey":{"challenge":"...","rpId":"...","userVerification":"preferred"},"challengeId":"..."}
POST /api/webauthn/auth-complete Public
Complete WebAuthn authentication. Verifies signature, challenge, origin, and rpIdHash. Sets session cookie on success.
Request Body
{
"id": "credential-id",
"challengeId": "from-auth-begin",
"rawId": "...",
"response": {
"authenticatorData": "...",
"clientDataJSON": "...",
"signature": "...",
"userHandle": null
},
"type": "public-key"
}
Response (200)
{"sessionToken":"...", "dashboard":{"email":"...","credit_balance":200}}
POST /api/login Auth
Login with a license key. Returns session token and sets cookie.
curl -X POST /api/login \
-H "Content-Type: application/json" \
-d '{"key":"sage_abc123..."}'
POST /api/logout Public
Revoke the current session. Clears the session cookie.
curl -X POST /api/logout
Response
{"ok":true}
GET /api/dashboard Auth
Get current usage stats, passkeys, and recent requests.
curl /api/dashboard \
-H "Cookie: sage_session=..."
Response
{
"email": "you@example.com",
"credit_balance": 200,
"lifetime_credits_purchased": 0,
"usage_5h": {"tokens_input":1200,"tokens_output":3400,"requests":8,"weighted":24380},
"usage_7d": {"tokens_input":15000,"tokens_output":42000,"requests":52,"weighted":302100},
"limits": {"5h":300000,"7d":10000000},
"passkeys": [{"id":"...","device_name":"...","created_at":...,"last_used_at":...}],
"recent": [{"model":"deepseek-v4-pro","tokens_input":500,"tokens_output":1200,...}]
}
GET /api/keys Auth
List active API keys for the authenticated user.
curl /api/keys -H "Cookie: sage_session=..."
Response
{"keys":[{"id":1,"name":"My App","created_at":...,"last_used_at":...}]}
POST /api/keys Auth
Create a new API key. The full key is returned only once — save it immediately.
curl -X POST /api/keys \
-H "Content-Type: application/json" \
-H "Cookie: sage_session=..." \
-d '{"name":"My App"}'
Response (201)
{"key":"sk-abc123...", "name":"My App", "created_at":1712345678000}
DELETE /api/keys/:id Auth
Revoke (deactivate) an API key. Cannot be undone.
curl -X DELETE /api/keys/1 -H "Cookie: sage_session=..."
Response
{"ok":true}
GET /health Public
Health check endpoint. Returns database connectivity status.
curl /health
Response
{"ok":true,"uptime":1712345678000}
Models & Pricing
Cost is computed per request and tracked in the usage log. Credits are deducted per request based on model pricing weights.
| Model | Input ($/1M tok) | Output ($/1M tok) | Weight |
|---|---|---|---|
| llama-3.1-8b-instant | $0.05 | $0.08 | 0.2× |
| deepseek-v4-flash | $0.10 | $0.40 | 1.0× |
| gpt-4o-mini | $0.15 | $0.60 | 1.5× |
| deepseek-v4-pro | $0.50 | $2.19 | 5.3× |
| deepseek-reasoner | $0.55 | $2.19 | 5.5× |
| llama-3.3-70b-versatile | $0.59 | $0.79 | 1.5× |
| gpt-4o | $2.50 | $10.00 | 25.0× |
| claude-sonnet-4-20250514 | $3.00 | $15.00 | 15.0× |
Credit Bundles
Purchase credit bundles to continue using Sage after your free trial ends. Credits never expire.
Starter
300K tok/5h
10M tok/week
Growth
1M tok/5h
36M tok/week
Scale
1M tok/5h
36M tok/week
Enterprise
∞ tokens
Priority support
Error Codes
| Status | Type | Description |
|---|---|---|
| 400 | error | Invalid request body or parameters |
| 401 | error | Missing or invalid authentication |
| 401 | error | Signature verification failed (WebAuthn) |
| 401 | error | Challenge expired or invalid |
| 429 | rate_limit | Request rate limit exceeded |
| 429 | token_limit | Weighted token limit reached |
| 500 | error | Internal server error |
| 502 | error | Inference provider unavailable |
Rate Limit Response
{
"error": {
"type": "rate_limit",
"message": "Rate limit exceeded. Retry after 12s.",
"retryAfterMs": 12000
}
}
Token Limit Response
{
"error": {
"type": "token_limit",
"limit": "5h",
"message": "5-hour token limit reached (300K). Resets in ~45 min.",
"used": 302100,
"cap": 300000,
"resetMs": 1712345678000
}
}
Device Codes
Used by the Sage Desktop app for OAuth-style pairing without copying API keys.
POST /api/device-code Auth
Generate a short-lived device code. Valid for 5 minutes.
curl -X POST /api/device-code \
-H "Content-Type: application/json" \
-H "Cookie: sage_session=..." \
-d '{"label":"MacBook Pro"}'
Response
{"code":"A1B2C3D4","label":"MacBook Pro"}
POST /api/device-exchange Public
Exchange a device code for a session token. Called by the desktop app.
curl -X POST /api/device-exchange \
-H "Content-Type: application/json" \
-d '{"code":"A1B2C3D4"}'
Response
{"valid":true,"sessionToken":"...","email":"...","credit_balance":200}