Agent rate limits & observability

Agent traffic gets its own accounting so it doesn't trade bandwidth with human traffic.

Separate buckets

api-v1 — REST API. Keyed by API key ID / Clerk user ID.
mcp-tools — MCP tool calls. Keyed by agentClientId when the token came from a DCR-registered client; falls back to the API-key ID / user ID.
mcp-output-tokens — per-agent, 1-hour window, counts response tokens from the MCP wrapper. Lets us cap runaway output even when request counts look fine.

Agent-native identity

When an MCP client registered via DCR, its tokens carry a client_id claim that's stamped onto the caller's agentClientId. Audit log entries record it so reports can distinguish "Claude Desktop acting for alice" from "alice directly." The apiKeyAudit table has a by_agent_time index; query it via Convex for a per-agent activity view.

Gateway integration

Point a gateway (Kong, Traefik Hub, Databricks Unity AI Gateway) at https://api.exayard.com/mcp and key the gateway's per-client policies off the same client_id. Our rate-limit is the floor; the gateway layer can be stricter for enterprise customers without teaching our app about them.

Requests already emit RateLimit / RateLimit-Policy headers from the most-constraining counter, so a gateway can enforce its own budget without double-counting.