The self-hosted stack,
edge to cloud
Agentyk runs on infrastructure we own and operate in Europe — zero-dependency Pure-C engines on a fleet that spans edge devices to multi-GPU servers. No hyperscaler in the path, no data leaving the EU.
Three Pure-C engines
One zero-dependency runtime DNA for inference, forecasting, and retrieval — small binaries, universal hardware, lower energy per answer.
Engine
Inference
Runs the AgentykLM models on any hardware — a Raspberry Pi to an NVIDIA data center — from one small, self-contained binary.
Engine
Forecasting
Edge inference for foundation time-series models (TimesFM, Chronos-2, Moirai-2, TiRex) with the same zero-dependency runtime.
Engine
Knowledge
A vector store and retriever in pure C — the on-device tier of Agentyk Knowledge, running alongside any LLM on any device.
No C++, no external dependencies
Vulkan, Metal, CUDA
NEON (ARM), AVX2, AVX-512
Hundreds of KB, fully self-contained
Tuned for energy efficiency
Nothing to pull at runtime
The self-hosted stack
Open runtimes, owned hardware, and EU jurisdiction end to end.
Runtimes
Open inference engines
Beyond our Pure-C engines, the fleet runs proven open runtimes — llama.cpp, vLLM, and TEI — picked per model and per box for the best throughput on the hardware it lands on.
Fleet
Edge to data center
A spread of EU-located machines, from low-power edge devices to multi-GPU servers, lets each request run on right-sized hardware instead of one over-provisioned tier.
Self-hosted
Owned, not rented
The stack runs on infrastructure we operate in EU jurisdiction — no US hyperscaler in the path, no CLOUD Act exposure, no third-party sub-processor for inference.
Reliable
Watched and self-healing
Per-model watchdogs relaunch a down or wedged model automatically, and temperature and health telemetry page on trouble — so the lineup stays up.
Health and temperature, watched continuously
Every box reports reachability, latency, and GPU/CPU temperature on a short interval. Operators see per-box health and temperatures in the admin console, and overheating or a downed model pages the team automatically.
Sovereign infrastructure, end to end
Owned hardware in EU jurisdiction, right-sized models, and open runtimes — build on a stack with no hyperscaler in the path.