Featured image of post Kubernetes UserNamespaces: the overhyped GA feature

Kubernetes UserNamespaces: the overhyped GA feature

Ecrit par ~ zwindler ~

The infographic that triggered me

Over the past few days, LinkedIn has been flooded with the same kind of infographic. Kubernetes 1.36 is out, and one of the most talked-about features is the GA release of UserNamespaces.

It’s a topic I’ve been following since 2018 (talk The Route to rootless containers at KubeCon EU 2018), so I’m genuinely glad to see this long journey finally reach the finish line. That said, I’m appalled by the way it’s being marketed on LinkedIn, apparently by people who have no idea how it actually works — and frankly don’t care.

“Kubernetes just made root safer. Just add hostUsers: false to your Pod spec.”

The visual: an all-powerful king “inside the container” and a helpless beggar “outside on the host”. The promise: “No Host Access. No Privilege Escalation. No Lateral Movement. No Node Takeover.”

Catchy.

But presenting it this way is genuinely dangerous, because it obscures entire areas of application and operational security. Selling hostUsers: false as the universal fix for the “root in containers” problem is a dramatic oversimplification that will push teams to ignore the real security priorities.

What UserNamespaces actually do

The threat model: container escape

First, what are we actually talking about? A container escape is when an attacker manages to break out of their container and directly access the host’s kernel or filesystem — completely bypassing the normal isolation mechanisms.

This type of vulnerability is rare, but real-world examples exist:

  • CVE-2019-5736 (runc): write to /proc/self/exe of the host process from inside the container
  • CVE-2022-0492 (cgroups v1): escape via unshare in certain configurations
  • CVE-2024-21626 (runc, “Leaky Vessels”): file descriptor leak to the host working directory

If there’s a vulnerability of this type on your node AND a process is compromised AND it runs as root in the container AND it doesn’t use UserNamespaces, the attacker gets root on the hostgame over. Full access to every file on the node, every secret mounted by other pods, ability to install a rootkit or exfiltrate data from all tenants running on that node.

It remains possible, but that’s a lot of “ifs”. Either way, this is exactly the scenario UserNamespaces address. They introduce UID mapping:

  • UID 0 inside the container is mapped to an unprivileged UID on the host (e.g. 100000, unique per pod)
  • If an attacker successfully escapes the container via a kernel exploit, they land as nobody on the node — the escape succeeds, but the post-escape impact is dramatically reduced

This is the “Breakouts Lose Impact” scenario from the infographic, and on that point, the infographic is right. That’s the real value of the feature.

Edge case: multi-tenancy even with non-root containers

Even without root containers, UserNamespaces provide something extra in a truly multi-tenant context (multiple customers on the same cluster). Without UserNamespaces, if two pods from different customers both run with runAsUser: 1000, they share the same UID 1000 on the node. If one escapes, the attacker can access files from the other pod with the same owner. UserNamespaces, by assigning a unique UID offset per pod, isolates UIDs between pods even when they use the same value inside the container.

For internal clusters where you control all workloads, this scenario is theoretical. For a multi-tenant SaaS platform or a public build service, it’s a real line of defense.

Technical requirements

There are a few prerequisites, but most up-to-date clusters should qualify.

  • Linux kernel ≥ 5.19
  • Compatible runtime (containerd ≥ 1.7, CRI-O ≥ 1.25)
  • Idmapped mounts support for persistent volumes (XFS, ext4 — not NFS in all cases)
  • Kubernetes ≥ 1.33 (Beta), ≥ 1.36 (GA)

What the infographic exaggerates (and leaves out)

The infographic is right about one specific thing: UserNamespaces reduces the impact of a successful container escape. That’s real. The problem is it sells the feature as a universal solution to “root in containers” — and that’s just wrong.

1. UID isolation is not application privilege isolation

The infographic promises “No Lateral Movement”. That’s false — completely false.

A root container with hostUsers: false can still read the ServiceAccount Token mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. If that token has broad RBAC permissions (which happens — we may cover this in a future post), the attacker can call the API Server, enumerate cluster resources, and move laterally — all without ever touching the host node.

UID mapping protects the host. It does not protect the cluster.

2. A root container is still root inside the container

“Install Anything ✅” — it’s literally written in the infographic, presented as a feature 😖.

In a root container (even with UserNS), an attacker who gains control can:

  • Install nmap, curl, nc to scan the internal network
  • Modify application files, binaries, configurations
  • Read all files mounted as volumes
  • Persist in the container across restarts if the filesystem is writable

UserNamespaces removes none of these attack vectors. The ability to install software is a fast track to lateral movement.

3. It’s not that easy, especially for storage

Enabling hostUsers: false breaks existing storage in most cases.

Container UID 0 is mapped to UID 100000+ on the host (each container has its own offset). If a persistent volume (NFS, EBS, Ceph RBD) is owned by UID 1000, the root container can’t read or write it. The result: counterintuitive Permission Denied errors that are potentially hard to diagnose, since the application was likely never designed to be root yet have no access to its own files.

The technical solution exists (idmapped mounts), but it requires a recent kernel and a compatible filesystem. See the official idmapped mounts documentation for details.

4. Same story, but for networking

hostUsers: false is incompatible with hostNetwork: true. It’s a corner case, but it catches networking workloads (monitoring agents, CNI plugins, etc.).

Note: that said, running containers with hostNetwork is its own security problem, so…


Honest comparison: UserNS vs the real alternatives

Attack vectorUserNS (root inside)Non-root (UID 1000)Distroless / Scratch
Post-escape impact after successful container escape✅ Nobody on host⚠️ UID 1000 on host⚠️ UID 1000 on host
UID isolation between pods (multi-tenant)✅ Unique offset per pod❌ Shared UID on node❌ Shared UID on node
Malware installation inside the container❌ Trivial❌ Possible✅ Near impossible
Write scope in ephemeral container FS❌ Full filesystem❌ App directory only✅ Near impossible
Lateral movement via SA Token❌ Possible❌ Possible⚠️ Potentially difficult
Operational complexity❌ Sometimes high✅ Often near zero✅ Often low
Compatibility with existing storage❌ Sometimes problematic✅ Standard✅ Standard

Reading the table reveals the true nature of UserNamespaces: it excels on exactly two rows:

  • post-escape impact
  • UID isolation in multi-tenant environments

On everything else, non-root + distroless does better, or just as well, without the operational complexity. And that “everything else” — write scope in ephemeral FS, malware installation, lateral movement via SA Token — represents the vast majority of real-world attack vectors, far more common than a container escape. We’ll come back to this in the Where to invest your security budget section.

Real use cases

It would be dishonest to dismiss the feature entirely. There are three scenarios where UserNamespaces aren’t a lazy option but a genuine technical necessity (with caveats).

1. Build-as-a-Service (Buildah, rootless Podman)

To build a Docker image, the build engine needs to perform chown, chmod and mknod. These operations require CAP_CHOWN and CAP_FOWNER. Before UserNamespaces, the solution was to run the pod as --privileged — an obvious open door to the host.

With hostUsers: false, the build engine believes it’s root for its own file manipulation, but it can’t touch the host. This is the only case where “root inside” is a technical constraint rather than technical debt.

Note: Kaniko, long the go-to for in-cluster builds, has been archived since October 2025 and no longer receives security updates. Buildah or rootless Podman are the active alternatives.

My take: it can be useful for shared CI/CD platforms (GitLab Runners, Tekton) that refuse privileged pods. But if isolation is critical (public platform, aggressive multi-tenancy), microVMs (Kata Containers, Firecracker) offer far stronger guarantees for an overhead that has become quite manageable.

2. Hostile multi-tenancy (user code platforms)

If your business is running code provided by strangers (PaaS, online code editors, public CI/CD), you know upfront that users will try to escalate their privileges. In this context, UserNS is an extra barrier against kernel 0-days.

My take: honestly, if the environment is truly hostile, UserNS alone isn’t enough. MicroVMs (Kata Containers, Firecracker) provide real hardware isolation and are the right choice here. UserNS can be a complement, not a substitute.

3. Hard-coded legacy (Postfix, Dovecot, BIND)

Some old UNIX daemons start as root to open a privileged port (< 1024) or read sensitive config files, then drop privileges via setuid(). This mechanism fails in a classic non-root container.

UserNamespaces let these processes believe they can make their identity management syscalls, because they are root inside their namespace.

Here’s a concrete example written by a colleague (thanks Louis 😘):

apiVersion: v1
kind: Pod
metadata:
  name: postfix
spec:
  hostUsers: false        # UID mapping: root in container → nobody on host
  securityContext:
    runAsNonRoot: false   # allowed under PSS Restricted *only* because of hostUsers: false
    fsGroup: 103          # postfix GID
  containers:
  - name: postfix
    image: postfix:latest
    securityContext:
      runAsNonRoot: false # same — cf. https://kubernetes.io/docs/concepts/workloads/pods/user-namespaces/#integration-with-pod-security-admission-checks
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      seccompProfile:
        type: RuntimeDefault
      capabilities:
        drop: ["ALL"]     # drop everything first
        add:
          - SETUID        # privilege drop via setuid() done by postfix itself at startup
          - SETGID        # same for groups
          - CHOWN         # chown on mail queues at startup
          - FOWNER        # file operations without being the owner
          - FSETID        # preserve setuid bit after write
          - DAC_OVERRIDE  # MANDATORY: root in UserNS is not "real" root —
                          # DAC checks are not automatically bypassed

This manifest illustrates several important things.

First, making a legacy application actually secure with UserNS is painful and requires compromises — especially around capabilities. This is a far cry from the “magic feature that secures root apps” the LinkedIn wanna-be influencers imply.

Then there are some interesting surprises. Normally, the Restricted Pod Security Standard forbids runAsNonRoot: false. Kubernetes makes an exception when hostUsers: false is present. This is documented here. Without UserNamespaces, this pod would be rejected by the admission controller.

There’s also the DAC_OVERRIDE capability, which is counterintuitive. Root in a UserNS is not real root from the kernel’s perspective for DAC (Discretionary Access Control) checks. When Postfix runs set-permissions to chown its queues, the kernel still verifies permissions — and denies them if DAC_OVERRIDE isn’t present. This is exactly the kind of operational surprise that stays invisible until the first production deployment.

Worth noting: we were still able to keep readOnlyRootFilesystem: true and allowPrivilegeEscalation: false — legacy doesn’t justify throwing everything overboard.

My take: this is the only use case where UserNS is genuinely acceptable. No untrusted third-party code, no hostile platform — just well-identified legacy with a migration plan. The other two cases are “acceptable under conditions”; the legacy case is the cleanest of the three.

Some counterarguments

I see you coming with objections, so let’s save everyone some time with a quick Q&A:

“It’s defense in depth.” True — but defense in depth assumes the foundational layers are already in place. If you haven’t migrated your images to non-root yet, investing energy in UserNS is putting the cart before the horse. And once you’re non-root, the marginal gain of UserNS is negligible compared to the complexity it introduces.

“We don’t control third-party images.” Somewhat weak, in my opinion: if a proprietary vendor’s black-box image is hardcoded to run as root, there’s a good chance it either genuinely needs it (as is the case for some proprietary security tooling) or it will break with UID mapping (see the storage problem above). UserNS is not a magic wand that makes any third-party image compatible and secure.

“It’s a centralized safeguard against human error.” It’s just as easy to forget hostUsers: false as it is to forget runAsNonRoot: true. The real centralized solution is Pod Security Standards or an Admission Controller (Kyverno, OPA) that outright rejects root pods. Simpler, more reliable, and it doesn’t break storage.

“We need it for SOC2/PCI-DSS/… compliance.” If your compliance requires strict tenant isolation, UserNS will likely be deemed insufficient by your auditors. VMs or microVMs remain the gold standard. Using UserNS for compliance means choosing the most complex tool to maintain for a result that remains debatable.

Where to invest your security budget

Setting aside the marketing, here’s where effort actually pays off — from highest impact to most niche:

Priority 1 — Non-root images + nobody (UID 65534) Move images to non-root, ideally using the nobody user (the least privileged on the system). If an application is compromised under nobody, the attacker can do almost nothing, even on the container filesystem. Combine with readOnlyRootFilesystem: true and capabilities: drop: ["ALL"].

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534   # nobody
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: my-app:distroless
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
      readOnlyRootFilesystem: true

Priority 2 — Pod Security Standards (PSS) at Baseline or Restricted Block root and privileges without breaking anything at the infra level. It requires having done Priority 1 first, but it’s free, standard, and applies cluster-wide via a namespace label (with per-namespace overrides when needed). No more risk of forgetting. Already enabled by default on several Kubernetes distributions (Talos being one, but not the only one).

Priority 3 — MicroVMs (Kata Containers, Firecracker) For truly untrusted workloads. Real hardware isolation, overhead now quite reasonable on recent generations.

Priority 4 — UserNamespaces When all else fails. Only for the legitimate cases identified above (builds, legacy, hostile multi-tenancy). This is genuinely the last thing to do.

Conclusion

Kubernetes 1.36 UserNamespaces are the result of a project that officially took five years (KEP-127 dates back to 2021) and has been discussed since nearly the dawn of Kubernetes. For shared build platforms and multi-tenant SaaS running user-provided code, it’s a potentially useful building block — particularly to prevent one customer’s app from reading another’s in the event of a container escape without privilege escalation.

For everything else — that is to say, 99% of production clusters — that’s not where container security starts. And that’s precisely the problem with this kind of infographic.

LinkedIn infographics selling effortless security are dangerous: “keep your 800MB root image full of tools, just add hostUsers: false, and you’re protected.” That’s exactly the wrong approach. Real container security is built in the Dockerfile, not in the PodSpec.

If you’re enabling UserNamespaces to secure an application whose source code you own, you’ve probably missed a step in your secure software development lifecycle.

References

Licensed under CC BY-SA 4.0

Vous aimez ce blog ou cet article ? Partagez-le avec vos amis !   Twitter Linkedin email Facebook

Vous pouvez également vous abonner à la mailing list des articles ici

L'intégralité du contenu appartenant à Denis Germain (alias zwindler) présent sur ce blog, incluant les textes, le code, les images, les schémas et les supports de talks de conf, sont distribués sous la licence CC BY-SA 4.0.

Les autres contenus (thème du blog, police de caractères, logos d'entreprises, articles invités...) restent soumis à leur propre licence ou à défaut, au droit d'auteur. Plus d'informations dans les Mentions Légales

Built with Hugo
Theme Stack designed by Jimmy