API Documentation
Calculate LLM token costs programmatically. Perfect for integrating cost estimation into your applications, CI/CD pipelines, or monitoring dashboards. No API key required.
Rate Limits
All endpoints are rate limited to 100 requests per minute per IP address. Rate limit headers are included in every response: X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset.
Base URL
https://tokenbudget.edwardiaz.dev/api/v1Endpoints
/api/v1/calculateCalculate the cost for a specific model given input, output, and optionally cached tokens.
Request Body
{
"model": "gpt-4o",
"inputTokens": 1000,
"outputTokens": 500,
"cachedTokens": 200 // optional - tokens served from cache
}Response
{
"model": "gpt-4o",
"provider": "OpenAI",
"inputTokens": 1000,
"outputTokens": 500,
"cachedTokens": 200,
"costs": {
"input": 0.002,
"output": 0.005,
"cached": 0.00025,
"total": 0.00725
},
"prices": {
"inputPer1M": 2.5,
"outputPer1M": 10.0,
"cachedPer1M": 1.25
}
}cURL Example
curl -X POST https://tokenbudget.edwardiaz.dev/api/v1/calculate \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","inputTokens":1000,"outputTokens":500,"cachedTokens":200}'/api/v1/bestFind the cheapest model for your token usage, with optional filters for capabilities and providers.
Request Body
{
"inputTokens": 1000,
"outputTokens": 500,
"cachedTokens": 0,
"filters": {
"providers": ["OpenAI", "Anthropic"], // optional
"minContext": 128000, // optional
"capabilities": ["vision", "functions"] // optional
}
}Response
{
"recommendation": {
"model": "gpt-4o-mini",
"provider": "OpenAI",
"costs": {
"input": 0.00015,
"output": 0.0003,
"cached": 0,
"total": 0.00045
},
"contextWindow": 128000,
"capabilities": {
"vision": true,
"functions": true,
"reasoning": false
}
},
"alternatives": [
{
"model": "claude-3-5-haiku",
"provider": "Anthropic",
"totalCost": 0.0028,
"priceDiff": "+522%"
}
]
}Available Capability Filters
vision- Models that can analyze imagesfunctions- Models that support function callingreasoning- Models with enhanced reasoning (o1, o3, etc.)coding- Models optimized for code generation
/api/v1/modelsList all available models with their pricing. Supports filtering by provider and caching capability.
Query Parameters
provider- Filter by provider (e.g., openai, anthropic)hasCache- Only show models with cached pricing (true/false)search- Search by model name
Response
{
"count": 65,
"providers": ["OpenAI", "Anthropic", "Google", ...],
"models": [
{
"id": "gpt-4o",
"provider": "OpenAI",
"input": 2.5,
"output": 10.0,
"cached": 1.25,
"context": 128000
},
...
],
"lastUpdated": "2026-03-20T12:00:00.000Z"
}Example
curl "https://tokenbudget.edwardiaz.dev/api/v1/models?provider=openai&hasCache=true"Error Handling
All error responses follow a consistent format:
{
"error": "Rate limit exceeded",
"message": "Too many requests. Please try again later.",
"retryAfter": 45 // seconds until rate limit resets
}HTTP Status Codes
200- Success400- Bad request (invalid parameters)404- Model not found429- Rate limit exceeded500- Internal server error
Performance
- Cached Pricing Data: Model prices are cached and refreshed hourly from LiteLLM, ensuring fast response times (~50ms).
- Edge Deployment: API runs on Vercel Edge Functions for low latency globally.
- Minimal Payload: Responses are optimized to include only essential data, keeping bandwidth low.
Use Cases
CI/CD Integration
Add cost checks to your deployment pipeline to prevent expensive model regressions.
Cost Monitoring
Log token usage and calculate costs in real-time for your dashboard metrics.