What is Cypress Vision?
For Business Leaders
Cypress Vision is AI spend management. It sits between your product and your AI providers — OpenAI, Anthropic, and Google — and automatically routes every AI call to the most cost-effective model that can handle it. Simple tasks go to efficient models. Complex tasks stay on premium. Result: 50–70% lower AI API bills with no change in output quality. Every agent, bot, workflow, or team member gets its own budget with real-time spend tracking, email and Slack alerts, and hard caps that block overspend before it happens. Every call is logged with a full audit trail. One URL change. 30 seconds to integrate.
For Engineering Teams
Cypress Vision is a drop-in OpenAI/Anthropic/Google-compatible proxy. Change base_url — nothing else. Every request is scored in under 1ms across complexity signals (token count, tool use, code markers, conversation depth, output length, JSON mode, keyword analysis), routed to the optimal model, checked against Redis budget state, served from cache if available, and logged asynchronously to ClickHouse. Full API compatibility — same request shape, same response shape, same streaming support. Works with every current model across all three providers.
Quick Start
Up and running in 30 seconds
Three steps. No SDK changes. No infrastructure work.
Create your account
Sign up at cypress-production-36c0.up.railway.app/signup. Your proxy URL appears immediately after signup.
Add your provider key
Settings → Provider Keys → paste your OpenAI, Anthropic, or Google key
Stored encrypted. Never logged.
Change one line in your code
from openai import OpenAI
client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="https://YOUR-TENANT.cypressvision.app/v1", # ← this line only
)import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://YOUR-TENANT.cypressvision.app/v1", // ← this line only
});curl https://YOUR-TENANT.cypressvision.app/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'How the Router Works
Intelligent routing — decided in under 1ms
Every prompt arriving at Cypress Vision is scored by a multi-signal complexity classifier before it reaches any AI provider. The classifier runs in under 1ms and produces a score. Simple tasks route automatically to the most cost-effective model in the same provider family. Complex tasks stay on the premium model requested. Response quality does not change — only the cost of calls that do not need a premium model.
The scoring engine
Model routing table — May 2026
Routing Playground
The Routing Playground in your dashboard lets you test exactly how any prompt scores before it goes live. Enter any prompt, see the score breakdown, the target model, and the estimated cost saving. No live API call is made.
Spend Guards & Budgets
Hard limits enforced at the proxy layer
Set a daily cap, a monthly cap, or both — for your whole account or per individual asset. At 70% spend you get an email and Slack alert. At 90%, another alert. At 100% the asset is hard-blocked — the proxy returns HTTP 429 before any call reaches the AI provider. Your bill cannot exceed what you set. This is enforced at the network layer, not a soft warning.
Technical flow
check_budget() reads Redis key → tg:budget:{tenant_id}:{budget_id}:{YYYY-MM-DD}spent_usd >= limit_usd? → YES: return HTTP 429 immediately, no upstream call maderecord_spend() increments Redis counter by cost_usd × 1,000,000 (stored as integer microseconds — no float drift)Alert thresholds
Reset options
Reset a single asset from Settings → Assets → Reset Spend. Reset all assets from Budgets page → Reset All (confirmation required). Resets clear the Redis spend counter only — ClickHouse event logs are permanent.
Asset Tracking
Know exactly what every agent, bot, and workflow costs
In Cypress Vision an asset is any named entity making AI calls through the proxy — an agent, a bot, a workflow, a team member, a client, or a feature. Each asset gets its own API key (passed as the Authorization header), its own daily and monthly budget caps, its own real-time spend dashboard, and its own status. The system tracks everything automatically from the moment the first call comes through.
What each asset shows
Response Cache
Zero cost on repeated prompts
Cypress Vision includes a Redis-backed exact-match response cache. When the same prompt is sent more than once by the same tenant, the cached response is returned immediately — no upstream API call, no token cost, sub-millisecond response time from the cache.
How it works
tg:cache:{tenant_id}:{md5_hash}stream: true) are not cached — requires a complete responseAsk AI — Spend Copilot
Built into your dashboard
Every Cypress Vision dashboard includes a conversational AI assistant with full context of your live tenant data — your assets, models, spend history, routing decisions, and budget state. Ask it questions in plain English and get instant answers backed by your real usage data.
Example questions
Infrastructure
What runs under the hood
Cypress Vision is built on four production-grade services. Each is chosen for a specific role in the request path. Every component that adds latency runs in-process or in Redis — under 1ms. ClickHouse logging runs asynchronously and never touches your response time.
Security
Built into every layer
Encrypted key storage
OpenAI, Anthropic, and Google API keys are stored encrypted at rest in Supabase Postgres. Never returned in API responses. Never written to any log. Never accessible outside the proxy's secure environment.
Complete tenant isolation
Every object in the system — Redis budget keys, Redis cache keys, ClickHouse events, Postgres records — is namespaced by tenant_id. Format: tg:{type}:{tenant_id}:{...}. One tenant cannot read, write, or affect another tenant's data at any layer.
TLS everywhere
All traffic between your application and the proxy is HTTPS/TLS. All traffic between the proxy and AI providers is HTTPS/TLS. No plaintext at any point in the chain.
No prompt storage by default
Only metadata is stored by default: model, token counts, cost_usd, latency_ms, routing decision, cache status. Prompt content is never logged unless audit mode is explicitly enabled. Your data is never used for training or shared with any third party.
Budget enforcement cannot be bypassed
Budget checks execute at the proxy layer before any upstream call. There is no API path that skips budget enforcement for AI calls. Dashboard and management endpoints are excluded by design.
CORS locked to your domain
The proxy accepts requests only from your registered dashboard origin. Cross-origin requests from other domains are rejected by middleware.
Self-hosted deployment available
For teams with strict data residency requirements — LegalTech, HealthTech, FinTech — Cypress Vision can be deployed entirely within your own infrastructure. Your own Railway, AWS, GCP, or Azure. Your own Redis, ClickHouse, and Postgres. AI call data never leaves your network. Available on Scale ($399/mo) and Enterprise.
Compliance & Audit Logs
Every call logged, immutably, forever
Every API call through Cypress Vision is written to ClickHouse — a column-oriented analytical database designed for exactly this workload: append-only, immutable, fast on large datasets, and queryable in milliseconds even at millions of rows.
ClickHouse schema
ClickHouse properties
- Append-only — events cannot be modified or deleted through normal operations
- Column-oriented — sum, group, and filter over millions of rows in milliseconds
- Real-time — events appear in your dashboard within seconds of the API call
Compliance use cases
Full AI usage audit trail. Which model was used for which task, by which user, at what time. Exportable for legal discovery or client reporting.
Document AI-assisted decisions with model version, timestamp, and asset. Data residency available via self-hosted deployment.
Model risk management documentation. Every inference logged with model, version, cost, and latency. Exportable for regulatory review.
Scaling
Scales from your first agent to your entire company
Startups & Small Teams
1 to 50 assets. Free and Starter plans. One integration covers your whole product. Routing starts saving money on day one. Fully managed — no ops work required.
Growing Companies
50 to 1,000 assets. Growth plan. Unlimited assets and routing rules. Per-project and per-client cost tracking. Spend Copilot answers cost questions instantly. CSV export for finance and reporting.
Agencies & Enterprise
Unlimited assets. Scale and Enterprise plans. White-label option — your brand, your domain. Self-hosted for data residency. SLA with dedicated support. Compliance exports. Custom contracts.
Pricing
Simple, transparent pricing
Pay one flat fee. We save you multiples of that every month.
FAQ
Frequently asked questions
Does this work with my existing OpenAI SDK?
Yes. One line — base_url. Your SDK, your API key, your model names all stay exactly the same. Works with openai-python, openai-node, and any HTTP client that accepts a base URL.
Does it work with Anthropic and Google too?
Yes. Add your Anthropic or Google key in Settings → Provider Keys. Cypress Vision detects the provider from the model name. claude-* goes to Anthropic. gemini-* goes to Google. gpt-* and o* go to OpenAI. All three simultaneously.
Will routing change the quality of my responses?
For simple tasks — no. The classifier is calibrated so that tasks requiring reasoning, long context, code generation, or tool use stay on premium models. Only genuinely simple tasks (short factual questions, translations, basic lookups) route to efficient models. Test any prompt in the Routing Playground before it goes live.
Does Cypress Vision see my prompt content?
By default, no. Only metadata is stored — model, tokens, cost, latency, routing decision. Full prompt logging is opt-in on Scale and Enterprise plans for compliance purposes only.
What happens if Cypress Vision goes down?
Point base_url back to api.openai.com in 30 seconds. Keep your original provider URL as a fallback environment variable. Scale plan includes 99.9% uptime SLA.
How are my provider API keys protected?
Stored encrypted at rest in Supabase Postgres. Never returned in API responses. Never written to any log. Used only by the proxy to forward your requests.
Can I write my own routing rules?
Yes. Custom rules override the automatic classifier — by agent ID, workflow ID, model name, or token range. Evaluated in priority order before the classifier runs. Available on Growth and above.
How many assets can I have?
Unlimited on Growth and above. No performance impact at any scale.
Is there a free trial?
Yes. Free plan — 10,000 calls, 7 days full access, no credit card.
Can I white-label this for my clients?
Yes. Scale plan includes white-label — your domain, your brand. Contact info@cypressvision.xyz.