Kubernetes 1.29 - sidecar containers - what are they good for?

Introduction

A friend of mine needs to run a periodic job. This job runs as code in a container (on a GKE cluster) and needs to query a Cloud SQL database.

For this kind of use case, Google Cloud provides an image to deploy alongside your application (sidecar pattern) that acts as a proxy so the application can connect to the Cloud SQL instance on Google Cloud.

cloud.google.com/sql/docs/mysql/connect-kubernetes-engine

Unfortunately, this image sometimes takes a while to start, and its absence causes my friend’s application to crash. We end up with a race condition and he asked me if I had any ideas to solve this problem.

Among other suggestions (which I detail in the very last paragraph), I suggested he try a brand new feature that went beta in Kubernetes 1.29 and that I hadn’t tested myself yet: sidecar containers!

kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/

Fun fact: while searching for documentation on cloud sql sidecar, I stumbled upon this article from someone who has the exact same problem as my friend.

hwchiu.medium.com/exploring-kubernetes-1-28-sidecar-container-support-ed1a39ac7fe0

For once, so you can properly understand the problem and its resolution, I’ll walk you through this feature discovery with a demo.

All the code and instructions are available on the GitHub repository github.com/zwindler/sidecar-container-example.

The idea is as follows: we’ll simulate my buddy’s problem with two Docker images created for the occasion:

zwindler/slow-sidecar a basic helloworld in V lang (vhelloworld) that sleeps for 5 seconds before listening on port 8081.
zwindler/sidecar-user a bash script that curls and exit 1 if the curl fails.

Prerequisites

As mentioned earlier, the feature was introduced in Kubernetes 1.28 as an alpha feature. If you’re using this version and want to test it, you need to specifically enable the feature flag.

Starting with Kubernetes 1.29, this feature moved to beta and should be enabled by default on your cluster.

Without sidecar containers

First, let’s try to deploy the CronJob naively on a cluster:

$ cat 1-cronjob-without-sidecar-container.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: sidecar-cronjob
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: sidecar-user
            image: zwindler/sidecar-user
          - name: slow-sidecar
            image: zwindler/slow-sidecar
            ports:
            - containerPort: 8081
          restartPolicy: Never

$ kubectl apply -f 1-cronjob-without-sidecar-container.yaml

This should fail because the “slow sidecar” container won’t be ready when the “sidecar user” container tries to curl.

$ kubectl get pods
NAME                             READY   STATUS   RESTARTS   AGE
sidecar-cronjob-28689938-5n5x9   1/2     Error    0          9s

$ kubectl describe pods sidecar-cronjob-28689938-5n5x9
[...]
Containers:
  slow-sidecar:
[...]
    State:          Running
      Started:      Fri, 19 Jul 2024 15:38:03 +0200
    Ready:          True
[...]
  sidecar-user:
[...]
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 19 Jul 2024 15:38:05 +0200
      Finished:     Fri, 19 Jul 2024 15:38:05 +0200
    Ready:          False
    Restart Count:  0
[...]

slow-sidecar is running fine but our sidecar-user request failed because the sidecar was too slow to start.

Quick cleanup before we try again:

kubectl delete cronjob sidecar-cronjob

Using an init container isn’t an option either because the init container will never terminate (that’s not its purpose) and the “sidecar user” container will wait forever for its turn. If you want to try, just convert slow-sidecar to an initContainer.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: sidecar-cronjob
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: sidecar-user
            image: zwindler/sidecar-user
+         initContainers:
          - name: slow-sidecar
            image: zwindler/slow-sidecar
            ports:
            - containerPort: 8081
          restartPolicy: Never

And run it

$ kubectl apply -f 2-cronjob-with-init-container.yaml

$ kubectl get pods
NAME                             READY   STATUS     RESTARTS   AGE
sidecar-cronjob-28689955-lzbnf   0/1     Init:0/1   0          27s

And we’re stuck at this step until the end of tiiiiiime.

With sidecar containers

To avoid this type of race condition, let’s update the manifest by converting slow-sidecar to an initContainer BUT ALSO adding restartPolicy: Always in the slow-sidecar container declaration.

This trick is the way to tell Kubernetes to start this container as an initContainer but NOT to wait for it to finish (which it will never do since it’s a web server listening on 8081 until the end of time) before starting the main application.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: sidecar-cronjob
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: sidecar-user
            image: zwindler/sidecar-user
+         initContainers:
          - name: slow-sidecar
            image: zwindler/slow-sidecar
+           restartPolicy: Always
            ports:
            - containerPort: 8081
          restartPolicy: Never

Note: This is the official way to declare a sidecar container in Kubernetes. I haven’t read the KEP yet so I can’t say why the development team didn’t introduce a new keyword sidecarContainers in the Pod spec schema and reused the existing initContainers instead.

$ kubectl apply -f 3-cronjob-with-sidecar-container.yaml

This time, the init container should start and ONLY THEN, the application:

$ kubectl get pods -w
NAME                             READY   STATUS    RESTARTS   AGE
sidecar-cronjob-28689958-zrmhh   0/2     Pending   0          0s
sidecar-cronjob-28689958-zrmhh   0/2     Pending   0          0s
sidecar-cronjob-28689958-zrmhh   0/2     Init:0/1   0          0s
sidecar-cronjob-28689958-zrmhh   1/2     PodInitializing   0          2s
sidecar-cronjob-28689958-zrmhh   1/2     Error             0          3s

We can see it’s better (sidecar-user starts in a second phase) but in this particular example, it still fails…

With sidecar containers AND a startupProbe

By default, the kubelet considers the sidecar container to be up as soon as the process in the container is running, then if the other initContainers have all finished (or if there are none), moves to the main phase of starting containers.

Unfortunately, in our case, the sidecar container is very slow (sleep 5), so the fact that the process is running is not an indication of the sidecar’s state…

We need to add a startupProbe so Kubernetes knows WHEN to move past the init phase and start the main phase.

After a sidecar-style init container is running (the kubelet has set the started status for that init container to true), the kubelet then starts the next init container from the ordered .spec.initContainers list. That status either becomes true because there is a process running in the container and no startup probe defined, or as a result of its startupProbe succeeding.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: sidecar-cronjob
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: sidecar-user
            image: zwindler/sidecar-user
          initContainers:
          - name: slow-sidecar
            image: zwindler/slow-sidecar
            restartPolicy: Always
            ports:
            - containerPort: 8081
+           startupProbe:
+             httpGet:
+               path: /
+               port: 8081
+             initialDelaySeconds: 5
+             periodSeconds: 1
+             failureThreshold: 5
          restartPolicy: Never

One last time:

$ kubectl apply -f 4-cronjob-with-sidecar-container-and-startup-probe.yaml && kubectl get pods -w
cronjob.batch/sidecar-cronjob created
NAME                             READY   STATUS    RESTARTS   AGE
sidecar-cronjob-28689977-lt77c   0/2     Pending   0          0s
sidecar-cronjob-28689977-lt77c   0/2     Pending   0          0s
sidecar-cronjob-28689977-lt77c   0/2     Init:0/1   0          0s
sidecar-cronjob-28689977-lt77c   0/2     Init:0/1   0          1s
sidecar-cronjob-28689977-lt77c   0/2     PodInitializing   0          6s
sidecar-cronjob-28689977-lt77c   1/2     PodInitializing   0          6s
sidecar-cronjob-28689977-lt77c   1/2     Completed         0          7s

Hooray!

Bonus: if you don’t have sidecarContainers enabled

If you’re still on Kubernetes 1.28 (or worse) and don’t have the ability to enable alpha featureFlags, you’ll need to find another method.

Unfortunately, the solution will likely involve modifying your main application’s code or its Docker image. You can:

add a retry policy in the sidecar-user application
add a script in the sidecar-user application that waits a bit (sleep) before trying to contact the sidecar

The first is a good practice when dealing with microservices and you should consider it anyway to handle temporary database connection issues.

The second is a band-aid on a wooden leg. I strongly advise against it because startup speed can vary in the sidecar and adding too much delay in the application is also bad when you need to handle incidents and bugs in production (potentially inducing other problems).