Feature

Rate Limits

Cap input and output traffic globally and per model for organizations, teams, users, and keys — with parent scopes enforcing limits child keys cannot override.

Rate limit settings

Limit

Input + output

Scope

Global + model

Policy

Inherited

Input and output ceilings at org, team, and key scope.

Global

Default limit

Baseline rate limit for all models.

Model

Override

Specific limit for a model route.

Key

Capped

Child key follows parent ceiling.

New capabilities

What your team gains with Concentrate

Input and output ceilings

Set global and per-model limits on input and output, so a single workload can't flood a provider or run up cost with runaway request volume.

Policy that flows downhill

Cap child scopes from organization and team settings, so limits are inherited by new keys instead of configured one at a time.

Guard budgets and quotas

Keep one key from consuming more traffic than intended, protecting both your spend and your shared provider rate limits.

Who Concentrate is designed for

Rate limits that protect spend and shared quotas

Rate limits cap how much input and output traffic a scope can send, globally and per model. When a key hits a provider rate limit during routing, Concentrate can skip that path and try the next healthy route — but org-level limits stop one workload from consuming the whole quota in the first place.

Platform engineering

Set ceilings at org or team scope so new keys inherit policy automatically.

High-traffic apps

Cap a single key before it exhausts shared provider quotas or your prepaid balance.

Per-model controls

Tighten limits on expensive or scarce models without throttling every workload equally.

Works with routing

Pair limits with request routing so over-limit providers are skipped in the failover chain.

Feature basics

Frequently asked questions

What rate limits can teams set?

Teams can set global and per-model input and output limits through settings inherited by organizations, teams, users, and keys.

How do rate limits protect AI usage?

Rate limits keep keys and workloads from consuming more request volume than the owner intended.

Feature

Rate Limits

Cap input and output traffic globally and per model for organizations, teams, users, and keys — with parent scopes enforcing limits child keys cannot override.

Rate limit settings

Limit

Input + output

Scope

Global + model

Policy

Inherited

Input and output ceilings at org, team, and key scope.

Global

Default limit

Baseline rate limit for all models.

Model

Override

Specific limit for a model route.

Key

Capped

Child key follows parent ceiling.

New capabilities

What your team gains with Concentrate

Input and output ceilings

Set global and per-model limits on input and output, so a single workload can't flood a provider or run up cost with runaway request volume.

Policy that flows downhill

Cap child scopes from organization and team settings, so limits are inherited by new keys instead of configured one at a time.

Guard budgets and quotas

Keep one key from consuming more traffic than intended, protecting both your spend and your shared provider rate limits.

Who Concentrate is designed for

Rate limits that protect spend and shared quotas

Platform engineering

Set ceilings at org or team scope so new keys inherit policy automatically.

High-traffic apps

Cap a single key before it exhausts shared provider quotas or your prepaid balance.

Per-model controls

Tighten limits on expensive or scarce models without throttling every workload equally.

Works with routing

Pair limits with request routing so over-limit providers are skipped in the failover chain.

Feature basics

Frequently asked questions

What rate limits can teams set?

Teams can set global and per-model input and output limits through settings inherited by organizations, teams, users, and keys.

How do rate limits protect AI usage?

Rate limits keep keys and workloads from consuming more request volume than the owner intended.

Rate Limits

What your team gains with Concentrate

Input and output ceilings

Policy that flows downhill

Guard budgets and quotas

Rate limits that protect spend and shared quotas

Platform engineering

High-traffic apps

Per-model controls

Works with routing

Frequently asked questions

LLM Gateway

Teams

Integrations

Platform

Legal

Rate Limits

What your team gains with Concentrate

Input and output ceilings

Policy that flows downhill

Guard budgets and quotas

Rate limits that protect spend and shared quotas

Platform engineering

High-traffic apps

Per-model controls

Works with routing

Frequently asked questions

LLM Gateway

Teams

Integrations

Platform

Legal