Platform

GPU EdgeGateway

Most gateways sit in front of the models and treat inference as a black box. This one is built from the inference engine out: one routing contract turns signals into projections, projections into decisions, and decisions into the model — across a mesh of local, private and frontier engines — while the prefix cache is protected, context is selected rather than pasted, and every turn takes the least-cost path that still meets the need. Session-aware across long-running agents, sandboxed for tool safety, and shadow-tested before any policy goes live. OpenAI- and Anthropic-compatible, multimodal, governed like production, on hardware you own.

Start a project → Read the field guide

signal→model

one routing contract

prefix-cache

reuse-protected

least cost

per-turn path

01 — What it does

Route, reason, act

/route

Signal-driven routing

Intent, complexity, modality and risk become projections and policy bands, then route across a local-to-frontier mesh — reasoning only when it pays.

intentprojectionswhen-to-reason

/session

Session-aware agentic routing

Stateful guards keep multi-turn agents coherent: hard locks block unsafe model switches mid-tool-loop, weighing quality gap, prefix locality and turn priors.

session-awaretool-loop lockscontinuity

/sandbox

Sandboxed & governed

Tools and code run in policy-governed MicroVM sandboxes — no unauthorized file, credential or network access.

MicroVMpolicy-as-codeno-exfil

02 — How it works

Signal to decision

Signal

Score intent, risk, modality, context.

Project

Normalise into policy bands.

Decide

Pick model, agent or tool.

Serve

Sandboxed, observed, metered.

Control plane · data plane

A self-improving router.

A control plane governs policy, identity and guardrails; a data plane serves fast, observable, cost-aware inference; and a self-improving router between them turns every request into a better next decision — protecting the prefix cache and selecting context so the work stays cheap as it grows.

03 — Architecture

Inside the gateway

/contract

One routing contract

Signals become projections, projections drive decisions, decisions choose the model — the same pipeline whether configured in YAML, the console, the CLI or Kubernetes.

signalsprojectionsdecisions

/mesh

Mixture-of-models mesh

Token- and capability-aware routing spans self-hosted engines, local SLMs and frontier APIs with semantic caching; classifiers run on any accelerator — one control plane, any backend.

self-hostedsemantic cacheany accelerator

/safe

Safety & protocol

History-aware PII, jailbreak and prompt-injection scanning across every turn — behind an OpenAI- and Anthropic-compatible ingress with explicit, lossless translation.

PIIjailbreakOpenAI/Anthropic

/cache

Prefix-cache discipline

Stable prompt epochs, deterministic tool-schema ordering and bounded, append-only context keep reusable prefixes intact — so cached tokens are reused across a long session at a fraction of the price instead of re-billed every turn.

prompt epochsstable schemacache reuse

/lifecycle

Shadow, activate, revert

Every routing policy is versioned and shadow-tested on replayed traffic before activation, with one-click rollback — routing never drifts silently.

shadowreplayrollback

04 — Agent-first delivery

Multimodal in, action out

/multimodal

Every modality, one path

Text, voice, image and event inputs are normalised, routed to the right modality model, and turned into grounded responses or tool actions.

text·voice·imagenormaliseactions

/context

Context selected, not pasted

Graph-shaped code evidence, bounded tool output and domain-aware compression extract the signal a turn actually needs and drop the rest — fewer prompt and tool-output tokens, without losing continuity across a long task.

select not pastegraph contextbounded output

/observe

Topology & token ledger

A console traces every signal → projection → decision with replay, and a live ledger shows cache reuse, context savings and per-route latency, tokens and cost — spend is accountable while the task runs, not after.

topologysavings ledgermetering