Feature
Rate Limits
Cap input and output traffic globally and per model for organizations, teams, users, and keys — with parent scopes enforcing limits child keys cannot override.
Limit
Input + output
Scope
Global + model
Policy
Inherited
Input and output ceilings at org, team, and key scope.
Global
Default limit
Baseline rate limit for all models.
Model
Override
Specific limit for a model route.
Key
Capped
Child key follows parent ceiling.
New capabilities
What your team gains with Concentrate
Input and output ceilings
Set global and per-model limits on input and output, so a single workload can't flood a provider or run up cost with runaway request volume.
Policy that flows downhill
Cap child scopes from organization and team settings, so limits are inherited by new keys instead of configured one at a time.
Guard budgets and quotas
Keep one key from consuming more traffic than intended, protecting both your spend and your shared provider rate limits.
Who Concentrate is designed for
Rate limits that protect spend and shared quotas
Rate limits cap how much input and output traffic a scope can send, globally and per model. When a key hits a provider rate limit during routing, Concentrate can skip that path and try the next healthy route — but org-level limits stop one workload from consuming the whole quota in the first place.
Platform engineering
Set ceilings at org or team scope so new keys inherit policy automatically.
High-traffic apps
Cap a single key before it exhausts shared provider quotas or your prepaid balance.
Per-model controls
Tighten limits on expensive or scarce models without throttling every workload equally.
Works with routing
Pair limits with request routing so over-limit providers are skipped in the failover chain.
Feature basics