What is a 429 error?
A 429 Too Many Requests error means you have exceeded your API provider's rate limit. The response includes a Retry-After header indicating how long to wait. Your application should implement exponential backoff to handle these gracefully.
What is the difference between RPM and TPM?
RPM (Requests Per Minute) limits how many API calls you can make, regardless of size. TPM (Tokens Per Minute) limits the total volume of text processed. Either limit can trigger a 429 — whichever you hit first is your bottleneck. Apps with many short messages are RPM-bound. Apps with long system prompts or large context windows are typically TPM-bound.
How do I find my current tier?
OpenAI: platform.openai.com → Settings → Limits. Anthropic: console.anthropic.com → Settings → Limits. Google: aistudio.google.com → API keys. Tiers are usually assigned automatically based on cumulative spend.
What is "concurrent users" — is it the same as total users?
No. Concurrent users means users actively making API requests at the same moment. For most SaaS apps, real concurrency is 5–15% of your total user base. If you have 1,000 registered users, expect 50–150 concurrent at peak hours.
How often is the pricing data updated?
We update the model and pricing data manually when providers make changes. Data was last verified in April 2026. Always cross-check with your provider's official documentation before making architecture or budget decisions.
Does prompt caching affect TPM limits?
Yes — significantly. Anthropic and OpenAI both exclude cached input tokens from TPM calculations. If your system prompt is large and consistent across requests, enabling prompt caching can effectively multiply your TPM capacity by 5–10x. This planner does not account for caching, so real-world limits will be higher if you use it.