Cost Tracking
Every response from the Lunar SDK includes detailed usage and cost information. This allows you to monitor spending and optimize your usage in real-time.Usage Object
- Python
- TypeScript
Available Fields
| Field | Type | Description |
|---|---|---|
prompt_tokens | int | Tokens in the input/prompt |
completion_tokens | int | Tokens in the generated output |
total_tokens | int | Total tokens used |
input_cost_usd | float | Cost for input tokens (USD) |
output_cost_usd | float | Cost for output tokens (USD) |
cache_input_cost_usd | float | Cost for cached tokens (if applicable) |
total_cost_usd | float | Total request cost (USD) |
latency_ms | float | Total request latency (milliseconds) |
ttft_ms | float | Time to first token (milliseconds) |
Accessing Cost Data
- Python
- TypeScript
Tracking Total Spend
Track cumulative costs across multiple requests:- Python
- TypeScript
Budget Monitoring
Set a budget limit and monitor usage:- Python
- TypeScript
Cost Optimization Tips
1. Choose the Right Model
| Model | Cost Level | Best For |
|---|---|---|
gpt-4o-mini | Low | Simple tasks |
claude-3-haiku | Low | Fast responses |
llama-3.1-8b | Very Low | High volume |
gpt-4o | High | Complex reasoning |
2. Optimize Prompts
3. Limit Output Length
- Python
- TypeScript
4. Use Caching (When Available)
Some models support prompt caching, which reduces input costs for repeated prompts. Checkcache_input_cost_usd to see cached token costs.