AI Inference Server Observability in Kubernetes: The Four Signals MLOps Tools Don't Capture
In August 2025, a vulnerability chain in NVIDIA Triton Inference Server was found that allowed an unauthenticated remote attacker to send a single crafted inference request, leak the name of an internal shared memory region, register that region for subsequent requests, gain read-write primitives into the Triton Python backend’s private memory, and achieve full remote code execution. The exploit chain ran entirely through Triton’s standard inference API. No anomalous traffic volume.