What is Cypress Vision?

For Business Leaders

Cypress Vision is AI spend management. It sits between your product and your AI providers — OpenAI, Anthropic, and Google — and automatically routes every AI call to the most cost-effective model that can handle it. Simple tasks go to efficient models. Complex tasks stay on premium. Result: 50–70% lower AI API bills with no change in output quality. Every agent, bot, workflow, or team member gets its own budget with real-time spend tracking, email and Slack alerts, and hard caps that block overspend before it happens. Every call is logged with a full audit trail. One URL change. 30 seconds to integrate.

For Engineering Teams

Cypress Vision is a drop-in OpenAI/Anthropic/Google-compatible proxy. Change base_url — nothing else. Every request is scored in under 1ms across complexity signals (token count, tool use, code markers, conversation depth, output length, JSON mode, keyword analysis), routed to the optimal model, checked against Redis budget state, served from cache if available, and logged asynchronously to ClickHouse. Full API compatibility — same request shape, same response shape, same streaming support. Works with every current model across all three providers.

Quick Start

Up and running in 30 seconds

Three steps. No SDK changes. No infrastructure work.

STEP 1

Create your account

https://YOUR-TENANT.cypressvision.app/v1

STEP 2

Add your provider key

Settings → Provider Keys → paste your OpenAI, Anthropic, or Google key

Stored encrypted. Never logged.

STEP 3

Change one line in your code

Python

from openai import OpenAI
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://YOUR-TENANT.cypressvision.app/v1",  # ← this line only
)

Node.js

import OpenAI from "openai";
const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://YOUR-TENANT.cypressvision.app/v1", // ← this line only
});

curl

curl https://YOUR-TENANT.cypressvision.app/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

Works identically with Anthropic and Google. Cypress Vision detects the provider from the model name automatically. claude-* → Anthropic. gemini-* → Google. gpt-* and o* → OpenAI.

How the Router Works

Intelligent routing — decided in under 1ms

Every prompt arriving at Cypress Vision is scored by a multi-signal complexity classifier before it reaches any AI provider. The classifier runs in under 1ms and produces a score. Simple tasks route automatically to the most cost-effective model in the same provider family. Complex tasks stay on the premium model requested. Response quality does not change — only the cost of calls that do not need a premium model.

The scoring engine

Signal	Weight	Triggered by
Tool / function calls	+4	Any tools array in the request
Code markers	+3	Presence of ``` blocks, def, class, function, import, SELECT
Very long context (3000+ tokens)	+3	Estimated prompt token count
Deep conversation (12+ turns)	+3	Message count in the request
Hard complexity keywords	+4	"architect", "fault-tolerant", "distributed system", "microservice", "active-active"
Long context 1500–3000 tokens	+2	Estimated prompt token count
Multi-turn 6–12 messages	+2	Message count
Moderate keywords	+2 each, max +4	"implement", "refactor", "debug", "optimize", "analyze", "compare and contrast"
JSON / structured output	+1	response_format: json_object
Long output requested 1000+ tokens	+1	max_tokens in request
Simple question indicators	−2 each	"what is", "define", "who is", "translate", "calculate", "yes or no"
Very short prompt under 20 chars	−3	e.g. "4+4", "hello", "what time is it"

SIMPLE

Score 0–2

routes to efficient model

saves 70–97%

MODERATE

Score 3–5

passes through unchanged

no change

COMPLEX

Score 6+

keeps premium model

no downgrade

Model routing table — May 2026

You request	Simple task routes to	Cost saving per call
OpenAI
gpt-5.5 ($8.00/M)	gpt-4.1-mini ($0.40/M)	95% cheaper
gpt-5 ($5.00/M)	gpt-4.1-mini ($0.40/M)	92% cheaper
gpt-4o ($2.50/M)	gpt-4o-mini ($0.15/M)	94% cheaper
gpt-4.1 ($2.00/M)	gpt-4.1-mini ($0.40/M)	80% cheaper
gpt-4.1-mini ($0.40/M)	gpt-4.1-nano ($0.10/M)	75% cheaper
o3 ($10.00/M)	o4-mini ($1.10/M)	89% cheaper
Anthropic
claude-opus-4-6 ($15.00/M)	claude-haiku-4-5 ($0.80/M)	95% cheaper
claude-sonnet-4-6 ($3.00/M)	claude-haiku-4-5 ($0.80/M)	73% cheaper
Google
gemini-2.5-pro ($3.50/M)	gemini-2.5-flash ($0.075/M)	98% cheaper
gemini-2.0-flash ($0.10/M)	gemini-2.5-flash-lite ($0.02/M)	80% cheaper

Routing table updated continuously as providers release new models. All model names above are current production models as of May 2026.

Routing Playground

The Routing Playground in your dashboard lets you test exactly how any prompt scores before it goes live. Enter any prompt, see the score breakdown, the target model, and the estimated cost saving. No live API call is made.

Spend Guards & Budgets

Hard limits enforced at the proxy layer

Set a daily cap, a monthly cap, or both — for your whole account or per individual asset. At 70% spend you get an email and Slack alert. At 90%, another alert. At 100% the asset is hard-blocked — the proxy returns HTTP 429 before any call reaches the AI provider. Your bill cannot exceed what you set. This is enforced at the network layer, not a soft warning.

Technical flow

Request arrives at Cypress Vision proxy

check_budget() reads Redis key → tg:budget:{tenant_id}:{budget_id}:{YYYY-MM-DD}

spent_usd >= limit_usd? → YES: return HTTP 429 immediately, no upstream call made

NO → forward request to OpenAI / Anthropic / Google

Response received from provider

record_spend() increments Redis counter by cost_usd × 1,000,000 (stored as integer microseconds — no float drift)

Check thresholds 70% / 90% / 100% → fire Resend email + Slack webhook in background thread if newly crossed

Return response to your application

Daily caps reset at 00:00 UTC. Monthly caps reset on the 1st of each month. Redis TTL: daily = 87,600s, monthly = 31 days + 1hr buffer.

Management endpoints — dashboard, settings, analytics — are never blocked. Only AI proxy calls are subject to budget enforcement.

Budget state is stored in Redis as integer microseconds. $1.234567 is stored as 1,234,567. No floating-point precision loss.

Alert thresholds

70%	Email + Slack alert. Calls continue.
90%	Email + Slack alert. Calls continue.
100%	Hard block. HTTP 429. No upstream call. Email + Slack alert.

Reset options

Reset a single asset from Settings → Assets → Reset Spend. Reset all assets from Budgets page → Reset All (confirmation required). Resets clear the Redis spend counter only — ClickHouse event logs are permanent.

Asset Tracking

Know exactly what every agent, bot, and workflow costs

In Cypress Vision an asset is any named entity making AI calls through the proxy — an agent, a bot, a workflow, a team member, a client, or a feature. Each asset gets its own API key (passed as the Authorization header), its own daily and monthly budget caps, its own real-time spend dashboard, and its own status. The system tracks everything automatically from the moment the first call comes through.

What each asset shows

✓

Total spend today and this month

✓

Total API calls and routed calls

✓

Routing efficiency % (routed / total calls)

✓

Live budget progress bar — green under 70%, amber 70–99%, red at 100%

✓

Status badge: Healthy / Warning / Blocked

✓

Cost breakdown by model

✓

CSV export of full usage history

Assets scale without limit. 1 asset or 10,000 — performance is identical. AI consultants use assets to track per-client spend. Startups use assets per agent or feature. Internal teams use assets per department or workflow. There is no technical ceiling.

Response Cache

Zero cost on repeated prompts

Cypress Vision includes a Redis-backed exact-match response cache. When the same prompt is sent more than once by the same tenant, the cached response is returned immediately — no upstream API call, no token cost, sub-millisecond response time from the cache.

How it works

1.

Incoming messages array is extracted and joined into prompt text

2.

Text is normalized (trimmed, lowercased) and hashed with MD5

3.

Redis lookup on key: tg:cache:{tenant_id}:{md5_hash}

4.

HIT → cached response returned instantly. No provider call. No token cost.

5.

MISS → request forwarded normally. Response stored with 24-hour TTL.

6.

Streaming requests (stream: true) are not cached — requires a complete response

7.

Prompts under 4 words are not cached

Cache HIT	Response returned in under 1ms. No upstream call. No token cost. Zero provider latency.
Cache MISS	Request forwarded normally. Response cached for 24 hours for future identical requests.

Cache is per-tenant — one tenant's cache never affects another. Cache stats (total cached prompts, hit rate) are visible on your Overview page.

Ask AI — Spend Copilot

Built into your dashboard

Every Cypress Vision dashboard includes a conversational AI assistant with full context of your live tenant data — your assets, models, spend history, routing decisions, and budget state. Ask it questions in plain English and get instant answers backed by your real usage data.

Example questions

"What would switching my recommendation agent from claude-sonnet to claude-haiku save me this month?"

"Which asset is driving the most cost this week?"

"Show me my 7-day spend trend by model"

"How much did smart routing save me last month?"

"Which agents are near their budget cap right now?"

"What is my projected monthly spend at current burn rate?"

"Which of my calls are being routed and which are passing through?"

"Compare my OpenAI vs Anthropic spend this month"

Spend Copilot is available on Growth plan and above. During soft launch it is available to all premium accounts.

Infrastructure

What runs under the hood

Cypress Vision is built on four production-grade services. Each is chosen for a specific role in the request path. Every component that adds latency runs in-process or in Redis — under 1ms. ClickHouse logging runs asynchronously and never touches your response time.

Component	Technology	Role
Proxy / API gateway	FastAPI (Python) on Railway	Receives every AI call. Scores complexity, checks Redis budget, looks up cache, routes to optimal model, forwards to provider, logs event async to ClickHouse. Your application talks only to this.
Event store	ClickHouse (cloud, columnar)	Immutable append-only log of every API call. Stores 13 fields per event: timestamp, client_id, agent_id, model_requested, model_used, prompt_tokens, completion_tokens, total_tokens, cost_usd, cache_hit, was_routed, blocked, latency_ms. Powers all real-time analytics. Column-oriented — analytical queries over millions of rows complete in milliseconds.
Budget + Cache	Redis	Two uses: (1) Budget enforcement — spend counters stored as integer microseconds per tenant per period, checked in under 1ms on every request. (2) Prompt cache — MD5-keyed response store with 24hr TTL. Both use namespaced keys for full tenant isolation.
Auth & config	Supabase (Postgres)	User accounts, tenant records, API keys (encrypted at rest), routing rules, budget configuration, provider key storage. Never in the request hot path.
Dashboard	Next.js on Railway	Real-time spend analytics, routing performance, budget management, asset tracking, Spend Copilot, CSV export, ROI report. Polls proxy API every 30 seconds for live data.
Alerts	Resend (email) + Slack webhook	Fired in a background thread at 70%, 90%, 100% of budget. Never blocks the request path. Zero latency impact.

The request path — budget check (Redis) + cache lookup (Redis) + routing decision (in-process) — adds under 1ms to every call. ClickHouse logging is async in a background thread. Your application sees the same response time as calling the AI provider directly.

Security

Built into every layer

✓

Encrypted key storage

OpenAI, Anthropic, and Google API keys are stored encrypted at rest in Supabase Postgres. Never returned in API responses. Never written to any log. Never accessible outside the proxy's secure environment.

✓

Complete tenant isolation

Every object in the system — Redis budget keys, Redis cache keys, ClickHouse events, Postgres records — is namespaced by tenant_id. Format: tg:{type}:{tenant_id}:{...}. One tenant cannot read, write, or affect another tenant's data at any layer.

✓

TLS everywhere

All traffic between your application and the proxy is HTTPS/TLS. All traffic between the proxy and AI providers is HTTPS/TLS. No plaintext at any point in the chain.

✓

No prompt storage by default

Only metadata is stored by default: model, token counts, cost_usd, latency_ms, routing decision, cache status. Prompt content is never logged unless audit mode is explicitly enabled. Your data is never used for training or shared with any third party.

✓

Budget enforcement cannot be bypassed

Budget checks execute at the proxy layer before any upstream call. There is no API path that skips budget enforcement for AI calls. Dashboard and management endpoints are excluded by design.

✓

CORS locked to your domain

The proxy accepts requests only from your registered dashboard origin. Cross-origin requests from other domains are rejected by middleware.

✓

Self-hosted deployment available

For teams with strict data residency requirements — LegalTech, HealthTech, FinTech — Cypress Vision can be deployed entirely within your own infrastructure. Your own Railway, AWS, GCP, or Azure. Your own Redis, ClickHouse, and Postgres. AI call data never leaves your network. Available on Scale ($399/mo) and Enterprise.

Compliance & Audit Logs

Every call logged, immutably, forever

Every API call through Cypress Vision is written to ClickHouse — a column-oriented analytical database designed for exactly this workload: append-only, immutable, fast on large datasets, and queryable in milliseconds even at millions of rows.

ClickHouse schema

Field	Type	What it captures
timestamp	DateTime UTC	Exact moment of the API call
client_id	String	Your tenant identifier
agent_id	String	The asset that made the call
model_requested	String	The model your code asked for
model_used	String	The model actually used after routing
prompt_tokens	Int	Input token count
completion_tokens	Int	Output token count
total_tokens	Int	Combined token count
cost_usd	Float	Actual USD cost of this call
cache_hit	Boolean	Was this returned from cache?
was_routed	Boolean	Did routing change the model?
blocked	Boolean	Was this call blocked by budget?
latency_ms	Int	Full round-trip latency in milliseconds

ClickHouse properties

Append-only — events cannot be modified or deleted through normal operations
Column-oriented — sum, group, and filter over millions of rows in milliseconds
Real-time — events appear in your dashboard within seconds of the API call

Compliance use cases

LegalTech

Full AI usage audit trail. Which model was used for which task, by which user, at what time. Exportable for legal discovery or client reporting.

HealthTech

Document AI-assisted decisions with model version, timestamp, and asset. Data residency available via self-hosted deployment.

FinTech

Model risk management documentation. Every inference logged with model, version, cost, and latency. Exportable for regulatory review.

Scaling

Scales from your first agent to your entire company

Startups & Small Teams

1 to 50 assets. Free and Starter plans. One integration covers your whole product. Routing starts saving money on day one. Fully managed — no ops work required.

Growing Companies

50 to 1,000 assets. Growth plan. Unlimited assets and routing rules. Per-project and per-client cost tracking. Spend Copilot answers cost questions instantly. CSV export for finance and reporting.

Agencies & Enterprise

Unlimited assets. Scale and Enterprise plans. White-label option — your brand, your domain. Self-hosted for data residency. SLA with dedicated support. Compliance exports. Custom contracts.

There is no technical limit on assets, routing rules, or API call volume. The system is designed to scale horizontally. Contact info@cypressvision.xyz for enterprise volume pricing.

Pricing

Simple, transparent pricing

Pay one flat fee. We save you multiples of that every month.

Plan	Price	Calls/month	Assets	Key features
Free	$0	10,000	1	Auto-Router, basic analytics, 7-day full access, no credit card required
Starter	$49/mo	100,000	10	Full routing + caching + audit logs, all providers, email alerts
Growth	$149/mo	1,000,000	Unlimited	+ Spend Copilot, CSV export, custom routing rules, priority support
Scale	$399/mo	Unlimited	Unlimited	+ SLA, compliance exports, white-label, self-hosted option
Enterprise	Custom	Unlimited	Unlimited	+ On-premise, SSO, dedicated support, custom contracts

Start Free — No Credit Card Book a Demo

FAQ

Frequently asked questions

Does this work with my existing OpenAI SDK?

Yes. One line — base_url. Your SDK, your API key, your model names all stay exactly the same. Works with openai-python, openai-node, and any HTTP client that accepts a base URL.

Does it work with Anthropic and Google too?

Yes. Add your Anthropic or Google key in Settings → Provider Keys. Cypress Vision detects the provider from the model name. claude-* goes to Anthropic. gemini-* goes to Google. gpt-* and o* go to OpenAI. All three simultaneously.

Will routing change the quality of my responses?

For simple tasks — no. The classifier is calibrated so that tasks requiring reasoning, long context, code generation, or tool use stay on premium models. Only genuinely simple tasks (short factual questions, translations, basic lookups) route to efficient models. Test any prompt in the Routing Playground before it goes live.

Does Cypress Vision see my prompt content?

By default, no. Only metadata is stored — model, tokens, cost, latency, routing decision. Full prompt logging is opt-in on Scale and Enterprise plans for compliance purposes only.

What happens if Cypress Vision goes down?

Point base_url back to api.openai.com in 30 seconds. Keep your original provider URL as a fallback environment variable. Scale plan includes 99.9% uptime SLA.

How are my provider API keys protected?

Stored encrypted at rest in Supabase Postgres. Never returned in API responses. Never written to any log. Used only by the proxy to forward your requests.

Can I write my own routing rules?

Yes. Custom rules override the automatic classifier — by agent ID, workflow ID, model name, or token range. Evaluated in priority order before the classifier runs. Available on Growth and above.

How many assets can I have?

Unlimited on Growth and above. No performance impact at any scale.

Is there a free trial?

Yes. Free plan — 10,000 calls, 7 days full access, no credit card.

Can I white-label this for my clients?

Yes. Scale plan includes white-label — your domain, your brand. Contact info@cypressvision.xyz.