API Documentation

Sage inference proxy — OpenAI-compatible chat completions with passkey auth, usage tracking, and credit-based rate limiting.

Overview

Sage is an AI inference proxy deployed as a Cloudflare Worker. It provides OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages endpoints, proxying requests to DeepSeek, OpenAI, Anthropic, and Groq based on the model name. Authentication uses WebAuthn passkeys (Touch ID, Face ID, security keys) with session cookies and API keys.

Base URL: https://sage-api.devblocktechnologies.com

Provider Routing

Sage automatically routes your request to the correct upstream provider based on the model field in the request body. No configuration changes needed — just specify the model name.

Model Prefix	Upstream Provider	Endpoint
`gpt-`, `o1-`, `o3-`, `o4-`	OpenAI	`api.openai.com/v1/chat/completions`
`claude-*`	Anthropic	`api.anthropic.com/v1/messages`
`llama-`, `mixtral-`, `gemma-*`, `deepseek-r1`	Groq	`api.groq.com/openai/v1/chat/completions`
`deepseek-*`	DeepSeek	`api.deepseek.com/v1/chat/completions`

When using /v1/chat/completions with a claude-* model, Sage automatically translates the request to Anthropic's /v1/messages format, including extracting system prompts and remapping parameters.

Authentication

Sage supports three authentication methods:

1. Session Cookie Browser

After signing in via passkey, the server sets an HttpOnly; Secure; SameSite=Lax session cookie (sage_session). All subsequent requests from the browser automatically include this cookie. Sessions expire after 30 days.

2. Bearer Token API

Include a license key or API key in the Authorization header:

# License key (internal)
curl -H "Authorization: Bearer sage_abc123..." https://sage-api.../v1/chat/completions

# API key (user-created)
curl -H "Authorization: Bearer sk-abc123..." https://sage-api.../v1/chat/completions

3. Device Code Desktop

Generate a short-lived device code from the dashboard, enter it in the Sage Desktop app. The code is exchanged for a session token.

Rate Limits

Sage enforces two independent limits: requests per minute and weighted tokens per window.

Request Rate Limits

Lifetime Credits	Requests/min
Free trial	20
Starter ($10+)	60
Growth ($25+)	60
Scale ($50+)	120
Enterprise ($100+)	∞

Weighted Token Limits

Tokens are weighted by model cost (Claude tokens count 15× more than Flash tokens).

Lifetime Credits	5-hour window	7-day window
Free trial	500K	5M
Starter	300K	10M
Growth	1M	36M
Scale	1M	36M
Enterprise	∞	∞

Model Weights

Model	Weight
deepseek-v4-flash	1.0×
deepseek-v4-pro	5.3×
deepseek-reasoner	5.5×
llama-3.1-8b-instant	0.2×
gpt-4o-mini	1.5×
llama-3.3-70b-versatile	1.5×
claude-sonnet-4-20250514	15.0×
gpt-4o	25.0×

POST /v1/chat/completions Auth

OpenAI-compatible chat completions. Routes to the correct upstream provider based on model (see Provider Routing).

Headers

Name	Required	Description
Authorization	Yes	Bearer <license-key> or Bearer <api-key>
Content-Type	Yes	application/json
X-Sage-Agent	No	Client identifier (sage-desktop, claude-code, hermes-agent)

Request Body

{
  "model": "deepseek-v4-pro",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,
  "max_tokens": 1024,
  "temperature": 0.7
}

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-v4-pro",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 10, "completion_tokens": 25}
}

POST /v1/messages Auth

Anthropic-compatible messages endpoint. Routes to the correct upstream provider based on model (see Provider Routing). With a claude-* model, proxies directly to Anthropic's API.

Headers

Name	Required	Description
Authorization	Yes	Bearer <license-key>
Content-Type	Yes	application/json
Anthropic-Version	No	API version date (default: 2023-06-01)

Create a new account. Returns a signup token used for passkey registration.

curl -X POST /api/signup   -H "Content-Type: application/json"   -d '{"email":"you@example.com"}'

Response (201)

{"signupToken":"...", "email":"you@example.com"}

POST /api/webauthn/register-begin Auth

Start WebAuthn passkey registration. Returns credential creation options.

Headers

Authorization: Bearer signup_<signupToken> (during signup) or standard Bearer auth.

Response

{"publicKey":{...}, "challengeId":"..."}

POST /api/webauthn/register-complete Auth

Complete passkey registration with the authenticator response. Sets session cookie on success.

Request Body

{
  "id": "credential-id",
  "challengeId": "from-register-begin",
  "rawId": "...",
  "response": {
    "attestationObject": "...",
    "clientDataJSON": "..."
  },
  "type": "public-key"
}

Response (200)

{"registered":true, "sessionToken":"...", "email":"...", "credit_balance":200}

POST /api/webauthn/auth-begin Public

Start WebAuthn authentication. Returns assertion options with a challenge.

curl -X POST /api/webauthn/auth-begin

Response

{"publicKey":{"challenge":"...","rpId":"...","userVerification":"preferred"},"challengeId":"..."}

POST /api/webauthn/auth-complete Public

Complete WebAuthn authentication. Verifies signature, challenge, origin, and rpIdHash. Sets session cookie on success.

Request Body

{
  "id": "credential-id",
  "challengeId": "from-auth-begin",
  "rawId": "...",
  "response": {
    "authenticatorData": "...",
    "clientDataJSON": "...",
    "signature": "...",
    "userHandle": null
  },
  "type": "public-key"
}

Response (200)

{"sessionToken":"...", "dashboard":{"email":"...","credit_balance":200}}

curl -X POST /api/login \
  -H "Content-Type: application/json" \
  -d '{"key":"sage_abc123..."}'

POST /api/logout Public

Revoke the current session. Clears the session cookie.

curl -X POST /api/logout

Response

{"ok":true}

GET /api/dashboard Auth

Get current usage stats, passkeys, and recent requests.

curl /api/dashboard \
  -H "Cookie: sage_session=..."

Response

{
  "email": "you@example.com",
  "credit_balance": 200,
  "lifetime_credits_purchased": 0,
  "usage_5h": {"tokens_input":1200,"tokens_output":3400,"requests":8,"weighted":24380},
  "usage_7d": {"tokens_input":15000,"tokens_output":42000,"requests":52,"weighted":302100},
  "limits": {"5h":300000,"7d":10000000},
  "passkeys": [{"id":"...","device_name":"...","created_at":...,"last_used_at":...}],
  "recent": [{"model":"deepseek-v4-pro","tokens_input":500,"tokens_output":1200,...}]
}

GET /api/keys Auth

List active API keys for the authenticated user.

curl /api/keys -H "Cookie: sage_session=..."

Response

{"keys":[{"id":1,"name":"My App","created_at":...,"last_used_at":...}]}

POST /api/keys Auth

Create a new API key. The full key is returned only once — save it immediately.

curl -X POST /api/keys \
  -H "Content-Type: application/json" \
  -H "Cookie: sage_session=..." \
  -d '{"name":"My App"}'

Response (201)

{"key":"sk-abc123...", "name":"My App", "created_at":1712345678000}

DELETE /api/keys/:id Auth

Revoke (deactivate) an API key. Cannot be undone.

curl -X DELETE /api/keys/1 -H "Cookie: sage_session=..."

Response

{"ok":true}

GET /health Public

Health check endpoint. Returns database connectivity status.

curl /health

Response

{"ok":true,"uptime":1712345678000}

Models & Pricing

Cost is computed per request and tracked in the usage log. Credits are deducted per request based on model pricing weights.

Model	Input ($/1M tok)	Output ($/1M tok)	Weight
llama-3.1-8b-instant	$0.05	$0.08	0.2×
deepseek-v4-flash	$0.10	$0.40	1.0×
gpt-4o-mini	$0.15	$0.60	1.5×
deepseek-v4-pro	$0.50	$2.19	5.3×
deepseek-reasoner	$0.55	$2.19	5.5×
llama-3.3-70b-versatile	$0.59	$0.79	1.5×
gpt-4o	$2.50	$10.00	25.0×
claude-sonnet-4-20250514	$3.00	$15.00	15.0×

Credit Bundles

Purchase credit bundles to continue using Sage after your free trial ends. Credits never expire.

Starter

$10

1,200 messages

60 req/min
300K tok/5h
10M tok/week

Growth

$25

3,500 messages

60 req/min
1M tok/5h
36M tok/week

Scale

$50

8,000 messages

120 req/min
1M tok/5h
36M tok/week

Enterprise

$100

20,000 messages

∞ req/min
∞ tokens
Priority support

Error Codes

Status	Type	Description
400	error	Invalid request body or parameters
401	error	Missing or invalid authentication
401	error	Signature verification failed (WebAuthn)
401	error	Challenge expired or invalid
429	rate_limit	Request rate limit exceeded
429	token_limit	Weighted token limit reached
500	error	Internal server error
502	error	Inference provider unavailable

Rate Limit Response

{
  "error": {
    "type": "rate_limit",
    "message": "Rate limit exceeded. Retry after 12s.",
    "retryAfterMs": 12000
  }
}

Token Limit Response

{
  "error": {
    "type": "token_limit",
    "limit": "5h",
    "message": "5-hour token limit reached (300K). Resets in ~45 min.",
    "used": 302100,
    "cap": 300000,
    "resetMs": 1712345678000
  }
}

Device Codes

Used by the Sage Desktop app for OAuth-style pairing without copying API keys.

POST /api/device-code Auth

Generate a short-lived device code. Valid for 5 minutes.

curl -X POST /api/device-code \
  -H "Content-Type: application/json" \
  -H "Cookie: sage_session=..." \
  -d '{"label":"MacBook Pro"}'

Response

{"code":"A1B2C3D4","label":"MacBook Pro"}

POST /api/device-exchange Public

Exchange a device code for a session token. Called by the desktop app.

curl -X POST /api/device-exchange \
  -H "Content-Type: application/json" \
  -d '{"code":"A1B2C3D4"}'

Response

{"valid":true,"sessionToken":"...","email":"...","credit_balance":200}

API Documentation

Overview

Provider Routing

Authentication

1. Session Cookie Browser

2. Bearer Token API

3. Device Code Desktop

Rate Limits

Request Rate Limits

Weighted Token Limits

Model Weights

POST /v1/chat/completions Auth

Headers

Request Body

Response

POST /v1/messages Auth

Headers

POST /api/signup Public

Response (201)

POST /api/webauthn/register-begin Auth

Headers

Response

POST /api/webauthn/register-complete Auth

Request Body

Response (200)

POST /api/webauthn/auth-begin Public

Response

POST /api/webauthn/auth-complete Public

Request Body

Response (200)

POST /api/login Auth

POST /api/logout Public

Response

GET /api/dashboard Auth

Response

GET /api/keys Auth

Response

POST /api/keys Auth

Response (201)

DELETE /api/keys/:id Auth

Response

GET /health Public

Response

Models & Pricing

Credit Bundles

Starter

Growth

Scale

Enterprise

Error Codes

Rate Limit Response

Token Limit Response

Device Codes

POST /api/device-code Auth

Response

POST /api/device-exchange Public

Response