01
AI Engineering
02
Solutions
03
Platform
04
Device Platform
05
Field Guide (eBook)
06
Whitepaper
07
Start a project →
Home/Platform/GPU EdgeGateway
Platform

GPU EdgeGateway

Most gateways sit in front of the models and treat inference as a black box. This one is built from the inference engine out: one routing contract turns signals into projections, projections into decisions, and decisions into the model — across a mesh of local, private and frontier engines — while the prefix cache is protected, context is selected rather than pasted, and every turn takes the least-cost path that still meets the need. Session-aware across long-running agents, sandboxed for tool safety, and shadow-tested before any policy goes live. OpenAI- and Anthropic-compatible, multimodal, governed like production, on hardware you own.

signal→model
one routing contract
prefix-cache
reuse-protected
least cost
per-turn path
01 — What it does

Route, reason, act

/route

Signal-driven routing

Intent, complexity, modality and risk become projections and policy bands, then route across a local-to-frontier mesh — reasoning only when it pays.

intentprojectionswhen-to-reason
/session

Session-aware agentic routing

Stateful guards keep multi-turn agents coherent: hard locks block unsafe model switches mid-tool-loop, weighing quality gap, prefix locality and turn priors.

session-awaretool-loop lockscontinuity
/sandbox

Sandboxed & governed

Tools and code run in policy-governed MicroVM sandboxes — no unauthorized file, credential or network access.

MicroVMpolicy-as-codeno-exfil
02 — How it works

Signal to decision

01

Signal

Score intent, risk, modality, context.

02

Project

Normalise into policy bands.

03

Decide

Pick model, agent or tool.

04

Serve

Sandboxed, observed, metered.

Control plane · data plane

A self-improving router.

A control plane governs policy, identity and guardrails; a data plane serves fast, observable, cost-aware inference; and a self-improving router between them turns every request into a better next decision — protecting the prefix cache and selecting context so the work stays cheap as it grows.

CONTROL PLANE Policies Identities API keys Guardrails DATA PLANE Fast inference Model routing Observability Cost-aware SAAR Evals CLI-first Router models Self-improvingRouter research loop → closes the loop
03 — Architecture

Inside the gateway

/contract

One routing contract

Signals become projections, projections drive decisions, decisions choose the model — the same pipeline whether configured in YAML, the console, the CLI or Kubernetes.

signalsprojectionsdecisions
/mesh

Mixture-of-models mesh

Token- and capability-aware routing spans self-hosted engines, local SLMs and frontier APIs with semantic caching; classifiers run on any accelerator — one control plane, any backend.

self-hostedsemantic cacheany accelerator
/safe

Safety & protocol

History-aware PII, jailbreak and prompt-injection scanning across every turn — behind an OpenAI- and Anthropic-compatible ingress with explicit, lossless translation.

PIIjailbreakOpenAI/Anthropic
/cache

Prefix-cache discipline

Stable prompt epochs, deterministic tool-schema ordering and bounded, append-only context keep reusable prefixes intact — so cached tokens are reused across a long session at a fraction of the price instead of re-billed every turn.

prompt epochsstable schemacache reuse
/lifecycle

Shadow, activate, revert

Every routing policy is versioned and shadow-tested on replayed traffic before activation, with one-click rollback — routing never drifts silently.

shadowreplayrollback
04 — Agent-first delivery

Multimodal in, action out

/multimodal

Every modality, one path

Text, voice, image and event inputs are normalised, routed to the right modality model, and turned into grounded responses or tool actions.

text·voice·imagenormaliseactions
/context

Context selected, not pasted

Graph-shaped code evidence, bounded tool output and domain-aware compression extract the signal a turn actually needs and drop the rest — fewer prompt and tool-output tokens, without losing continuity across a long task.

select not pastegraph contextbounded output
/observe

Topology & token ledger

A console traces every signal → projection → decision with replay, and a live ledger shows cache reuse, context savings and per-route latency, tokens and cost — spend is accountable while the task runs, not after.

topologysavings ledgermetering
05 — By the numbers

Governed like production

<1ms
signal → decision
~90%
cached-token discount
least cost
per-turn path
shadow→activate
policy lifecycle
06 — Further reading

Whitepapers for your team

The architecture and the economics behind this platform — read in the browser or export to PDF.

Let's build

Serve models safely.

Turnkey Edge-AI — fixed time, fixed cost, full responsibility.