Featured image of post zeropod: scale-to-zero with container checkpointing

zeropod: scale-to-zero with container checkpointing

Testing zeropod, a Kubernetes runtime that does scale-to-zero with automatic container checkpointing

Ecrit par ~ zwindler ~

What is zeropod?

I kept the intro paragraph from the project documentation as is because I find it perfect. It says everything you need to know about the tool, neither too much nor too little.

Zeropod is a Kubernetes runtime (more specifically a containerd shim) that automatically checkpoints containers to disk after a certain amount of time of the last TCP connection.

While in scaled down state, it will listen on the same port the application inside the container was listening on and will restore the container on the first incoming connection.

Depending on the memory size of the checkpointed program this happens in tens to a few hundred milliseconds, virtually unnoticable to the user.

As all the memory contents are stored to disk during checkpointing, all state of the application is restored.

It adjusts resource requests in scaled down state in-place if the cluster supports it.

To prevent huge resource usage spikes when draining a node, scaled down pods can be migrated between nodes without needing to start up.

TL;DR: it will freeze your app if it doesn’t receive TCP calls, and restore it when a call arrives.

If you want to understand in more detail HOW it really works, in that case, I invite you to read the “How it works” section of the official project documentation, which has the merit of being quite clear:

Prerequisites

Let’s not waste time, let’s dive into the experiment. As prerequisites, I needed:

  • an Ubuntu server with vanilla k3s (flannel + traefik, single node). If you don’t know how to install it, you can always check out my article on the subject.
  • cert-manager. Not necessarily required but I like having valid HTTPS certificates.
# Install cert-manager CRDs and namespace
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.0/cert-manager.yaml

# Wait for cert-manager to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=cert-manager -n cert-manager --timeout=60s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=cainjector -n cert-manager --timeout=60s
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=webhook -n cert-manager --timeout=60s

ClusterIssuer configuration:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: your.email@example.org
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik

Also optional, for easier access, I modified the traefik service ports to 30080 and 30443 since I don’t have LoadBalancer Service support on this cluster:

kubectl get svc -A
NAMESPACE      NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
cert-manager   cert-manager           ClusterIP      10.43.212.125   <none>          9402/TCP                     21h
cert-manager   cert-manager-webhook   ClusterIP      10.43.134.29    <none>          443/TCP                      21h
default        kubernetes             ClusterIP      10.43.0.1       <none>          443/TCP                      21h
kube-system    kube-dns               ClusterIP      10.43.0.10      <none>          53/UDP,53/TCP,9153/TCP       21h
kube-system    metrics-server         ClusterIP      10.43.83.225    <none>          443/TCP                      21h
kube-system    traefik                LoadBalancer   10.43.208.56    192.168.1.242   80:30080/TCP,443:30443/TCP   21h

Installing zeropod

Once we have our functional cluster with all the prerequisites, we can install zeropod. The first step “just” involves applying the following kustomize manifest, which will create a customized DaemonSet with the right paths so it can hook into / patch containerd.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: zeropod-node
  namespace: zeropod-system
spec:
  template:
    spec:
      volumes:
        - name: containerd-etc
          hostPath:
            path: /var/lib/rancher/k3s/agent/etc/containerd/
        - name: containerd-run
          hostPath:
            path: /run/k3s/containerd/
        - name: zeropod-opt
          hostPath:
            path: /var/lib/rancher/k3s/agent/containerd
kubectl apply -k https://github.com/ctrox/zeropod/config/k3s

Then we’ll label the Node and verify the controller pod is working:

kubectl label node zeropod zeropod.ctrox.dev/node=true
kubectl -n zeropod-system wait --for=condition=Ready pod -l app.kubernetes.io/name=zeropod-node

The documentation indicates that you need to restart the Node in the case of k3s because everything is packaged together in k3s (probably the same for k0s). This is probably not necessary for most “normal” distributions.

NAMESPACE        NAME                                       READY   STATUS      RESTARTS      AGE
...
zeropod-system   zeropod-node-qntzh                         1/1     Running     1 (21h ago)   21h

Interesting point: zeropod will add its own runtimeClass (I’ll let you check out the Kubernetes documentation, it’s worth a look if you’re not familiar):

kubectl get runtimeclass
NAME                  HANDLER               AGE
crun                  crun                  21h
[...]
zeropod               zeropod               21h

Deploying a WordPress application

What’s the best use case for Kubernetes?

Hosting a personal blog with WordPress and autoscaling, of course!! Everyone knows that.

Beyond the joke, the idea was to test a stateful application, preferably with a database, to see how far we can push the tool. Because one of the limitations of scale-to-zero tools in Kubernetes is precisely that they work great for stateless workloads (or FaaS), but it’s more complicated when you have state.

First thing to know: beyond the label we put on the node, it’s necessary to add 2 additional configuration points to our applications that we want to scale to zero.

  1. The zeropod annotations; no need to explain what they do: we give it the port number, the container, and the duration after which, if I have no connection, I scale down:
annotations:
  zeropod.ctrox.dev/ports-map: "wordpress=80"
  zeropod.ctrox.dev/container-names: wordpress
  zeropod.ctrox.dev/scaledown-duration: 10s
  1. The runtimeClass that must be defined:
runtimeClassName: zeropod

Application manifests

Here’s what it could roughly look like. We could do cleaner (helm charts) but I did quick and dirty, it’s enough for this PoC:

WordPress Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php
spec:
  selector:
    matchLabels:
      app: php
  template:
    metadata:
      labels:
        app: php
      annotations:
        zeropod.ctrox.dev/ports-map: "wordpress=80"
        zeropod.ctrox.dev/container-names: wordpress
        zeropod.ctrox.dev/scaledown-duration: 10s
    spec:
      runtimeClassName: zeropod
      initContainers:
      - command:
        - sh
        - -c
        - |
          until mysql -h mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SELECT 1"; do
            echo "Waiting for MySQL to be ready..."
            sleep 5
          done
          echo "MySQL is ready!"
          mysql -h mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "CREATE DATABASE IF NOT EXISTS wordpress;"
          mysql -h mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "GRANT ALL PRIVILEGES ON wordpress.* TO 'root'@'%';"
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: verySecurePassword
        image: mysql
        imagePullPolicy: IfNotPresent
        name: wait-for-mysql
      containers:
      - env:
        - name: WORDPRESS_DB_HOST
          value: mysql
        - name: WORDPRESS_DB_USER
          value: root
        - name: WORDPRESS_DB_PASSWORD
          value: verySecurePassword
        - name: WORDPRESS_DB_NAME
          value: wordpress
        image: wordpress:latest
        imagePullPolicy: Always
        name: wordpress
        ports:
        - containerPort: 80
          protocol: TCP

MySQL StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  selector:
    matchLabels:
      app: mysql
  serviceName: "mysql"
  replicas: 1
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
        - image: mysql
          name: mysql
          ports:
            - containerPort: 3306
          env:
            - name: MYSQL_ROOT_PASSWORD
              value: verySecurePassword
          volumeMounts:
            - name: data
              mountPath: /var/lib/mysql
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 5Gi

Services and Ingress:

apiVersion: v1
kind: Service
metadata:
  name: php
  labels:
    app: php
spec:
  ports:
    - port: 8080
      name: http
      targetPort: 80
  selector:
    app: php
---
apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
    - port: 3306
      name: mysql
  clusterIP: None
  selector:
    app: mysql
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: zeropod-ingress
  namespace: default
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  rules:
  - host: zeropod.example.org
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: php
            port:
              number: 8080
  tls:
  - hosts:
    - zeropod.example.org
    secretName: zeropod-tls

Observing the behavior

Once deployed, let’s quickly check the state of pods and services:

kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
mysql-0               1/1     Running   0          17h
php-dc7cb9cff-29hzb   1/1     Running   0          17h
kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
[...]
mysql        ClusterIP   None            <none>        3306/TCP   21h
php          ClusterIP   10.43.165.131   <none>        8080/TCP   21h

Detecting the absence of traffic

Shortly after deploying Apache PHP, zeropod notices that there hasn’t been a connection for a while and pauses the container.

The funny thing is that from Kubernetes’ point of view, nothing happened! The php-dc7cb9cff-29hzb pod still exists and is present in the Node’s Non terminated pods list:

kubectl get pods php-dc7cb9cff-29hzb
NAME                  READY   STATUS    RESTARTS   AGE
php-dc7cb9cff-29hzb   1/1     Running   0          17h

kubectl describe nodes
[...]
Non-terminated Pods:          (11 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
[...]
  default                     mysql-0                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
  default                     php-dc7cb9cff-29hzb                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         17h
[...]

However, it no longer appears in the “top pods” metrics collected by metrics-server:

kubectl top pods
NAME      CPU(cores)   MEMORY(bytes)   
mysql-0   4m           460Mi    

And if we search for the apache2 process on the node, we won’t find it:

sudo ps -ef | grep apache2

Under the hood, zeropod logs

In the zeropod logs, we first notice the detection of a container eligible for scale to zero:

{"time":"2025-06-20T17:31:46.222502407Z","level":"INFO","msg":"subscribing to status events","sock":"/run/zeropod/s/5858a327ae5a70c2e12b5dad1e8320a4670ff11152476ad49605ebca5327f7d6.sock"}
{"time":"2025-06-20T17:31:47.572065537Z","level":"INFO","msg":"status event","component":"podlabeller","container":"wordpress","pod":"php-dc7cb9cff-29hzb","namespace":"default","phase":1}
{"time":"2025-06-20T17:31:47.5737556Z","level":"INFO","msg":"attaching redirector for sandbox","pid":64980,"links":["eth0","lo"]}

And after 10 seconds, since there hasn’t been a connection, zeropod shuts it down (“phase”:0):

{"time":"2025-06-20T17:31:57.932464147Z","level":"INFO","msg":"status event","component":"podlabeller","container":"wordpress","pod":"php-dc7cb9cff-29hzb","namespace":"default","phase":0}

Performance tests

When the process is checkpointed, we’ll see that curl still takes a bit of time to be served. Nothing dramatic, but still:

time curl https://zeropod.example.org:30443 -I
HTTP/2 200 
content-type: text/html; charset=UTF-8
date: Fri, 20 Jun 2025 17:41:29 GMT
link: <https://zeropod.example.org:30443/wp-json/>; rel="https://api.w.org/"
server: Apache/2.4.62 (Debian)
x-powered-by: PHP/8.2.28

real	0m0.454s
user	0m0.052s
sys	0m0.010s

However, for all subsequent connections, the times are correct for an empty (and unoptimized) WordPress:

time curl https://zeropod.example.org:30443 -I
HTTP/2 200 
content-type: text/html; charset=UTF-8
date: Fri, 20 Jun 2025 17:41:42 GMT
link: <https://zeropod.example.org:30443/wp-json/>; rel="https://api.w.org/"
server: Apache/2.4.62 (Debian)
x-powered-by: PHP/8.2.28

real	0m0.088s
user	0m0.053s
sys	0m0.008s

So, it works!

The first connection takes a bit longer, while the eBPF program that “listens” to traffic while the container is down turns it back on and hands over control (here +350-400ms on a small, cheap, and quite loaded VM).

Once the container is restored, we can actually see the apache2 processes reappear with a ps on the Node:

ps -ef |grep apache2
root       67038   64955  1 18:56 ?        00:00:00 apache2 -DFOREGROUND
www-data   67055   67038  0 18:56 ?        00:00:00 apache2 -DFOREGROUND
www-data   67056   67038  0 18:56 ?        00:00:00 apache2 -DFOREGROUND
www-data   67057   67038  0 18:56 ?        00:00:00 apache2 -DFOREGROUND
www-data   67058   67038  0 18:56 ?        00:00:00 apache2 -DFOREGROUND
www-data   67059   67038  0 18:56 ?        00:00:00 apache2 -DFOREGROUND
www-data   67060   67038  0 18:56 ?        00:00:00 apache2 -DFOREGROUND

Going further: testing with MySQL

Well, on the other hand, scaling an http server to 0 is fun, but it’s not revolutionary. There are already scale-to-zero solutions on the market in Kubernetes, especially for FaaS or stateless workloads. But what if we push the experiment all the way?

We rarely want to scale a database to 0 in real life. Stopping it and then restarting it can take time, and potentially cause errors in our apps if the scaling is done poorly.

However, with zeropod, the principle is a bit different, since we’re not really going to stop the process, just freeze it.

For science (don’t do this in prod), I therefore added zeropod to the MySQL database too!

As with wordpress, we add the annotations and runtimeClass:

template:
  metadata:
    labels:
      app: mysql
    annotations:
      zeropod.ctrox.dev/ports-map: "mysql=3306"
      zeropod.ctrox.dev/container-names: mysql
      zeropod.ctrox.dev/scaledown-duration: 10s
  spec:
    runtimeClassName: zeropod
    containers:
    # ...

After a few seconds, zeropod moves the mysql database to phase “0”:

{"time":"2025-06-20T18:06:14.92372877Z","level":"INFO","msg":"status event","component":"podlabeller","container":"mysql","pod":"mysql-0","namespace":"default","phase":1}
{"time":"2025-06-20T18:06:14.925316023Z","level":"INFO","msg":"attaching redirector for sandbox","pid":69570,"links":["eth0","lo"]}
{"time":"2025-06-20T18:06:25.766097339Z","level":"INFO","msg":"status event","component":"podlabeller","container":"mysql","pod":"mysql-0","namespace":"default","phase":0}

kubectl top pods no longer reports any pods (and an error…):

kubectl top pods
error: Metrics not available for pod default/php-dc7cb9cff-29hzb, age: 35m59.026573861s

But the pods remain visible and “Running” from Kubernetes’ point of view:

kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
mysql-0               1/1     Running   0          2m30s
php-dc7cb9cff-29hzb   1/1     Running   0          36m

Final test: cascading wake-up

The final test: will an HTTP call to php wake it up, which will trigger a connection to the mysql database which will in turn wake it up too?

Drum roll

time curl https://zeropod.example.org:30443 -I
HTTP/2 200 
content-type: text/html; charset=UTF-8
date: Fri, 20 Jun 2025 18:09:24 GMT
link: <https://zeropod.example.org:30443/wp-json/>; rel="https://api.w.org/"
server: Apache/2.4.62 (Debian)
x-powered-by: PHP/8.2.28

real	0m0.978s
user	0m0.053s
sys	0m0.008s

Victory!!

Limitations

Beyond this somewhat silly example (who hasn’t wanted to host a wordpress on Kubernetes with scale to zero?), we realize that the technology “works” but remains a bit flaky.

I had several cases where scale to zero happened on the php pod, while I was running a loop with “while true; do curl” (maybe related to the ingress -> service -> pod chain?). And the checkpointing time is still visible on my test VM (400 ms per container, that’s not nothing).

One point that is not addressed in the project documentation is that it’s almost impossible to have proper liveness / readiness probes when you use zeropod.

If you put a liveness probe on a web service, you’ll trigger a call on the TCP port listened to by zeropod, and thus restart the app which will never be checkpointed. If you put a readiness probe, same thing.

And if you plan to get around this with a liveness / readiness probe that doesn’t trigger an HTTP call on the port monitored by zeropod, you’ll end up with an app seen by Kubernetes as (respectively) KO or “Not ready”, since the container won’t be available because it’s checkpointed.

Technically, zeropod is very fun and quite clever, I’m quite impressed.

But I don’t really see in what world we would want to have containers in prod without liveness / readiness probes, so I’m quite skeptical about using this technology as is, except for very non-critical examples. This limitation seems too big to me.

Licensed under CC BY-SA 4.0

Vous aimez ce blog ou cet article ? Partagez-le avec vos amis !   Twitter Linkedin email Facebook

Vous pouvez également vous abonner à la mailing list des articles ici

L'intégralité du contenu appartenant à Denis Germain (alias zwindler) présent sur ce blog, incluant les textes, le code, les images, les schémas et les supports de talks de conf, sont distribués sous la licence CC BY-SA 4.0.

Les autres contenus (thème du blog, police de caractères, logos d'entreprises, articles invités...) restent soumis à leur propre licence ou à défaut, au droit d'auteur. Plus d'informations dans les Mentions Légales

Built with Hugo
Theme Stack designed by Jimmy