Home/Platform/Edge Streaming Intelligence

Platform

Edge Streaming Intelligence

Real-time vision and audio intelligence over live streams, at fleet scale. A zero-copy GPU pipeline decodes hundreds of feeds straight into device memory, runs a catalog of detectors on every frame, and turns raw video into a verified, queryable signal — with sub-millisecond validation and an agent layer that reasons and acts on top. Proven on broadcast-grade media: 1,000+ concurrent 4K streams across racks of edge GPUs.

Start a project → Read the field guide

1,000+

4K streams, concurrent

<1ms

high-frequency validation

128

streams / rack

01 — What it does

Vision and audio on every frame

/detect

A catalog of live detectors

Logos, freezes, macro-blocking, blank and splash screens, lip-sync and on-screen errors are scored per frame — video and audio anomalies caught the instant they appear.

videoaudioper-frame

/read

Reads the screen, not just watches it

OCR and vision-language models extract guide data, clocks, version strings and error dialogs; object detectors track focus, icons and UI state.

OCR/VLMobject-detUI state

/act

Closes the loop

An agent layer plans and acts — driving devices through an IR / Bluetooth control plane and verifying every step against the live stream.

agenticdevice-controlverify

02 — How it works

Stream to decision

Decode

Feeds decode into GPU memory.

Detect

A model graph scores every frame.

Decide

Validate sub-ms; reason in minutes.

Act

Drive devices, publish, alert.

03 — Detection catalog

A model graph, not a single model

/anomaly

Video anomaly detection

Freezes (consecutive pixel-difference), macro-blocking and pixelation (block-variance + Sobel edge density), tearing and stutter — flagged inside the stream buffer.

optical-flowSobelblock-variance

/logo

Logo & UI object detection

An RF-DETR detector with a CLIP refiner confirms logos, app tiles and widgets with bounding-box precision.

RF-DETRCLIPbbox

/ocr

OCR & VLM reading

GPU OCR (docTR) and vision-language models read guide grids, clocks, version strings and error dialogs — signal-loss, auth and tune failures included.

docTRVLMregex

/audio

Audio & sync checks

Audio-presence and lip-sync checks run beside the video probe, so silent feeds and A/V drift are caught too.

audio-probelip-syncffprobe

04 — Architecture

Inside the pipeline

/pipeline

Zero-copy vision pipeline

GStreamer + DeepStream pull RTSP / H.265 into the GPU via NVDEC; composite grids map to regions once, then a swappable model graph scores each region every frame.

GStreamerDeepStreamNVDEC

/serverless

Serverless model serving

Detectors run as auto-scaling GPU functions (Nuclio) drawn from a continuously trained catalog — new models deploy without touching the pipeline.

Nuclioauto-scaleregistry

/backbone

Event & knowledge backbone

Detections stream over NATS JetStream into ClickHouse for sub-second OLAP, with a knowledge graph, vectors and a fine-tuned vision-action model driving next-best-action.

NATSClickHouseknowledge-graph

/learn

A closed training loop

Misses become flagged frames become new annotation tasks — captured, versioned in COCO and retrained, then promoted through a registry.

CVATCOCOfeedback