API Documentation

Sage inference proxy — OpenAI-compatible chat completions with passkey auth, usage tracking, and credit-based rate limiting.

Overview

Sage is an AI inference proxy deployed as a Cloudflare Worker. It provides OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages endpoints, proxying requests to DeepSeek, OpenAI, Anthropic, and Groq based on the model name. Authentication uses WebAuthn passkeys (Touch ID, Face ID, security keys) with session cookies and API keys.

Base URL: https://sage-api.devblocktechnologies.com

Provider Routing

Sage automatically routes your request to the correct upstream provider based on the model field in the request body. No configuration changes needed — just specify the model name.

Model PrefixUpstream ProviderEndpoint
gpt-*, o1-*, o3-*, o4-*OpenAIapi.openai.com/v1/chat/completions
claude-*Anthropicapi.anthropic.com/v1/messages
llama-*, mixtral-*, gemma-*, deepseek-r1Groqapi.groq.com/openai/v1/chat/completions
deepseek-*DeepSeekapi.deepseek.com/v1/chat/completions

When using /v1/chat/completions with a claude-* model, Sage automatically translates the request to Anthropic's /v1/messages format, including extracting system prompts and remapping parameters.

Authentication

Sage supports three authentication methods:

1. Session Cookie Browser

After signing in via passkey, the server sets an HttpOnly; Secure; SameSite=Lax session cookie (sage_session). All subsequent requests from the browser automatically include this cookie. Sessions expire after 30 days.

2. Bearer Token API

Include a license key or API key in the Authorization header:

# License key (internal)
curl -H "Authorization: Bearer sage_abc123..." https://sage-api.../v1/chat/completions

# API key (user-created)
curl -H "Authorization: Bearer sk-abc123..." https://sage-api.../v1/chat/completions

3. Device Code Desktop

Generate a short-lived device code from the dashboard, enter it in the Sage Desktop app. The code is exchanged for a session token.

Rate Limits

Sage enforces two independent limits: requests per minute and weighted tokens per window.

Request Rate Limits

Lifetime CreditsRequests/min
Free trial20
Starter ($10+)60
Growth ($25+)60
Scale ($50+)120
Enterprise ($100+)

Weighted Token Limits

Tokens are weighted by model cost (Claude tokens count 15× more than Flash tokens).

Lifetime Credits5-hour window7-day window
Free trial500K5M
Starter300K10M
Growth1M36M
Scale1M36M
Enterprise

Model Weights

ModelWeight
deepseek-v4-flash1.0×
deepseek-v4-pro5.3×
deepseek-reasoner5.5×
llama-3.1-8b-instant0.2×
gpt-4o-mini1.5×
llama-3.3-70b-versatile1.5×
claude-sonnet-4-2025051415.0×
gpt-4o25.0×

POST /v1/chat/completions Auth

OpenAI-compatible chat completions. Routes to the correct upstream provider based on model (see Provider Routing).

Headers

NameRequiredDescription
AuthorizationYesBearer <license-key> or Bearer <api-key>
Content-TypeYesapplication/json
X-Sage-AgentNoClient identifier (sage-desktop, claude-code, hermes-agent)

Request Body

{
  "model": "deepseek-v4-pro",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,
  "max_tokens": 1024,
  "temperature": 0.7
}

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-v4-pro",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "..."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 10, "completion_tokens": 25}
}

POST /v1/messages Auth

Anthropic-compatible messages endpoint. Routes to the correct upstream provider based on model (see Provider Routing). With a claude-* model, proxies directly to Anthropic's API.

Headers

NameRequiredDescription
AuthorizationYesBearer <license-key>
Content-TypeYesapplication/json
Anthropic-VersionNoAPI version date (default: 2023-06-01)

POST /api/signup Public

Create a new account. Returns a signup token used for passkey registration.

curl -X POST /api/signup   -H "Content-Type: application/json"   -d '{"email":"you@example.com"}'

Response (201)

{"signupToken":"...", "email":"you@example.com"}

POST /api/webauthn/register-begin Auth

Start WebAuthn passkey registration. Returns credential creation options.

Headers

Authorization: Bearer signup_<signupToken> (during signup) or standard Bearer auth.

Response

{"publicKey":{...}, "challengeId":"..."}

POST /api/webauthn/register-complete Auth

Complete passkey registration with the authenticator response. Sets session cookie on success.

Request Body

{
  "id": "credential-id",
  "challengeId": "from-register-begin",
  "rawId": "...",
  "response": {
    "attestationObject": "...",
    "clientDataJSON": "..."
  },
  "type": "public-key"
}

Response (200)

{"registered":true, "sessionToken":"...", "email":"...", "credit_balance":200}

POST /api/webauthn/auth-begin Public

Start WebAuthn authentication. Returns assertion options with a challenge.

curl -X POST /api/webauthn/auth-begin

Response

{"publicKey":{"challenge":"...","rpId":"...","userVerification":"preferred"},"challengeId":"..."}

POST /api/webauthn/auth-complete Public

Complete WebAuthn authentication. Verifies signature, challenge, origin, and rpIdHash. Sets session cookie on success.

Request Body

{
  "id": "credential-id",
  "challengeId": "from-auth-begin",
  "rawId": "...",
  "response": {
    "authenticatorData": "...",
    "clientDataJSON": "...",
    "signature": "...",
    "userHandle": null
  },
  "type": "public-key"
}

Response (200)

{"sessionToken":"...", "dashboard":{"email":"...","credit_balance":200}}

POST /api/login Auth

Login with a license key. Returns session token and sets cookie.

curl -X POST /api/login \
  -H "Content-Type: application/json" \
  -d '{"key":"sage_abc123..."}'

POST /api/logout Public

Revoke the current session. Clears the session cookie.

curl -X POST /api/logout

Response

{"ok":true}

GET /api/dashboard Auth

Get current usage stats, passkeys, and recent requests.

curl /api/dashboard \
  -H "Cookie: sage_session=..."

Response

{
  "email": "you@example.com",
  "credit_balance": 200,
  "lifetime_credits_purchased": 0,
  "usage_5h": {"tokens_input":1200,"tokens_output":3400,"requests":8,"weighted":24380},
  "usage_7d": {"tokens_input":15000,"tokens_output":42000,"requests":52,"weighted":302100},
  "limits": {"5h":300000,"7d":10000000},
  "passkeys": [{"id":"...","device_name":"...","created_at":...,"last_used_at":...}],
  "recent": [{"model":"deepseek-v4-pro","tokens_input":500,"tokens_output":1200,...}]
}

GET /api/keys Auth

List active API keys for the authenticated user.

curl /api/keys -H "Cookie: sage_session=..."

Response

{"keys":[{"id":1,"name":"My App","created_at":...,"last_used_at":...}]}

POST /api/keys Auth

Create a new API key. The full key is returned only once — save it immediately.

curl -X POST /api/keys \
  -H "Content-Type: application/json" \
  -H "Cookie: sage_session=..." \
  -d '{"name":"My App"}'

Response (201)

{"key":"sk-abc123...", "name":"My App", "created_at":1712345678000}

DELETE /api/keys/:id Auth

Revoke (deactivate) an API key. Cannot be undone.

curl -X DELETE /api/keys/1 -H "Cookie: sage_session=..."

Response

{"ok":true}

GET /health Public

Health check endpoint. Returns database connectivity status.

curl /health

Response

{"ok":true,"uptime":1712345678000}

Models & Pricing

Cost is computed per request and tracked in the usage log. Credits are deducted per request based on model pricing weights.

ModelInput ($/1M tok)Output ($/1M tok)Weight
llama-3.1-8b-instant$0.05$0.080.2×
deepseek-v4-flash$0.10$0.401.0×
gpt-4o-mini$0.15$0.601.5×
deepseek-v4-pro$0.50$2.195.3×
deepseek-reasoner$0.55$2.195.5×
llama-3.3-70b-versatile$0.59$0.791.5×
gpt-4o$2.50$10.0025.0×
claude-sonnet-4-20250514$3.00$15.0015.0×

Credit Bundles

Purchase credit bundles to continue using Sage after your free trial ends. Credits never expire.

Starter

$10
1,200 messages
60 req/min
300K tok/5h
10M tok/week

Growth

$25
3,500 messages
60 req/min
1M tok/5h
36M tok/week

Scale

$50
8,000 messages
120 req/min
1M tok/5h
36M tok/week

Enterprise

$100
20,000 messages
∞ req/min
∞ tokens
Priority support

Error Codes

StatusTypeDescription
400errorInvalid request body or parameters
401errorMissing or invalid authentication
401errorSignature verification failed (WebAuthn)
401errorChallenge expired or invalid
429rate_limitRequest rate limit exceeded
429token_limitWeighted token limit reached
500errorInternal server error
502errorInference provider unavailable

Rate Limit Response

{
  "error": {
    "type": "rate_limit",
    "message": "Rate limit exceeded. Retry after 12s.",
    "retryAfterMs": 12000
  }
}

Token Limit Response

{
  "error": {
    "type": "token_limit",
    "limit": "5h",
    "message": "5-hour token limit reached (300K). Resets in ~45 min.",
    "used": 302100,
    "cap": 300000,
    "resetMs": 1712345678000
  }
}

Device Codes

Used by the Sage Desktop app for OAuth-style pairing without copying API keys.

POST /api/device-code Auth

Generate a short-lived device code. Valid for 5 minutes.

curl -X POST /api/device-code \
  -H "Content-Type: application/json" \
  -H "Cookie: sage_session=..." \
  -d '{"label":"MacBook Pro"}'

Response

{"code":"A1B2C3D4","label":"MacBook Pro"}

POST /api/device-exchange Public

Exchange a device code for a session token. Called by the desktop app.

curl -X POST /api/device-exchange \
  -H "Content-Type: application/json" \
  -d '{"code":"A1B2C3D4"}'

Response

{"valid":true,"sessionToken":"...","email":"...","credit_balance":200}