Introduction
A friend of mine needs to run a periodic job. This job runs as code in a container (on a GKE cluster) and needs to query a Cloud SQL database.
For this kind of use case, Google Cloud provides an image to deploy alongside your application (sidecar pattern) that acts as a proxy so the application can connect to the Cloud SQL instance on Google Cloud.
Unfortunately, this image sometimes takes a while to start, and its absence causes my friend’s application to crash. We end up with a race condition and he asked me if I had any ideas to solve this problem.
Among other suggestions (which I detail in the very last paragraph), I suggested he try a brand new feature that went beta in Kubernetes 1.29 and that I hadn’t tested myself yet: sidecar containers!
Fun fact: while searching for documentation on cloud sql sidecar, I stumbled upon this article from someone who has the exact same problem as my friend.
For once, so you can properly understand the problem and its resolution, I’ll walk you through this feature discovery with a demo.
All the code and instructions are available on the GitHub repository github.com/zwindler/sidecar-container-example.
The idea is as follows: we’ll simulate my buddy’s problem with two Docker images created for the occasion:
- zwindler/slow-sidecar a basic helloworld in V lang (vhelloworld) that sleeps for 5 seconds before listening on port 8081.
- zwindler/sidecar-user a bash script that
curls andexit 1if thecurlfails.
Prerequisites
As mentioned earlier, the feature was introduced in Kubernetes 1.28 as an alpha feature. If you’re using this version and want to test it, you need to specifically enable the feature flag.
Starting with Kubernetes 1.29, this feature moved to beta and should be enabled by default on your cluster.
Without sidecar containers
First, let’s try to deploy the CronJob naively on a cluster:
$ cat 1-cronjob-without-sidecar-container.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: sidecar-cronjob
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sidecar-user
image: zwindler/sidecar-user
- name: slow-sidecar
image: zwindler/slow-sidecar
ports:
- containerPort: 8081
restartPolicy: Never
$ kubectl apply -f 1-cronjob-without-sidecar-container.yaml
This should fail because the “slow sidecar” container won’t be ready when the “sidecar user” container tries to curl.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
sidecar-cronjob-28689938-5n5x9 1/2 Error 0 9s
$ kubectl describe pods sidecar-cronjob-28689938-5n5x9
[...]
Containers:
slow-sidecar:
[...]
State: Running
Started: Fri, 19 Jul 2024 15:38:03 +0200
Ready: True
[...]
sidecar-user:
[...]
State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 19 Jul 2024 15:38:05 +0200
Finished: Fri, 19 Jul 2024 15:38:05 +0200
Ready: False
Restart Count: 0
[...]
slow-sidecar is running fine but our sidecar-user request failed because the sidecar was too slow to start.
Quick cleanup before we try again:
kubectl delete cronjob sidecar-cronjob
Using an init container isn’t an option either because the init container will never terminate (that’s not its purpose) and the “sidecar user” container will wait forever for its turn. If you want to try, just convert slow-sidecar to an initContainer.
apiVersion: batch/v1
kind: CronJob
metadata:
name: sidecar-cronjob
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sidecar-user
image: zwindler/sidecar-user
+ initContainers:
- name: slow-sidecar
image: zwindler/slow-sidecar
ports:
- containerPort: 8081
restartPolicy: Never
And run it
$ kubectl apply -f 2-cronjob-with-init-container.yaml
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
sidecar-cronjob-28689955-lzbnf 0/1 Init:0/1 0 27s
And we’re stuck at this step until the end of tiiiiiime.
With sidecar containers
To avoid this type of race condition, let’s update the manifest by converting slow-sidecar to an initContainer BUT ALSO adding restartPolicy: Always in the slow-sidecar container declaration.
This trick is the way to tell Kubernetes to start this container as an initContainer but NOT to wait for it to finish (which it will never do since it’s a web server listening on 8081 until the end of time) before starting the main application.
apiVersion: batch/v1
kind: CronJob
metadata:
name: sidecar-cronjob
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sidecar-user
image: zwindler/sidecar-user
+ initContainers:
- name: slow-sidecar
image: zwindler/slow-sidecar
+ restartPolicy: Always
ports:
- containerPort: 8081
restartPolicy: Never
Note: This is the official way to declare a sidecar container in Kubernetes. I haven’t read the KEP yet so I can’t say why the development team didn’t introduce a new keyword sidecarContainers in the Pod spec schema and reused the existing initContainers instead.
$ kubectl apply -f 3-cronjob-with-sidecar-container.yaml
This time, the init container should start and ONLY THEN, the application:
$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
sidecar-cronjob-28689958-zrmhh 0/2 Pending 0 0s
sidecar-cronjob-28689958-zrmhh 0/2 Pending 0 0s
sidecar-cronjob-28689958-zrmhh 0/2 Init:0/1 0 0s
sidecar-cronjob-28689958-zrmhh 1/2 PodInitializing 0 2s
sidecar-cronjob-28689958-zrmhh 1/2 Error 0 3s
We can see it’s better (sidecar-user starts in a second phase) but in this particular example, it still fails…
With sidecar containers AND a startupProbe
By default, the kubelet considers the sidecar container to be up as soon as the process in the container is running, then if the other initContainers have all finished (or if there are none), moves to the main phase of starting containers.
Unfortunately, in our case, the sidecar container is very slow (sleep 5), so the fact that the process is running is not an indication of the sidecar’s state…
We need to add a startupProbe so Kubernetes knows WHEN to move past the init phase and start the main phase.
After a sidecar-style init container is running (the kubelet has set the started status for that init container to true), the kubelet then starts the next init container from the ordered .spec.initContainers list. That status either becomes true because there is a process running in the container and no startup probe defined, or as a result of its startupProbe succeeding.
apiVersion: batch/v1
kind: CronJob
metadata:
name: sidecar-cronjob
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: sidecar-user
image: zwindler/sidecar-user
initContainers:
- name: slow-sidecar
image: zwindler/slow-sidecar
restartPolicy: Always
ports:
- containerPort: 8081
+ startupProbe:
+ httpGet:
+ path: /
+ port: 8081
+ initialDelaySeconds: 5
+ periodSeconds: 1
+ failureThreshold: 5
restartPolicy: Never
One last time:
$ kubectl apply -f 4-cronjob-with-sidecar-container-and-startup-probe.yaml && kubectl get pods -w
cronjob.batch/sidecar-cronjob created
NAME READY STATUS RESTARTS AGE
sidecar-cronjob-28689977-lt77c 0/2 Pending 0 0s
sidecar-cronjob-28689977-lt77c 0/2 Pending 0 0s
sidecar-cronjob-28689977-lt77c 0/2 Init:0/1 0 0s
sidecar-cronjob-28689977-lt77c 0/2 Init:0/1 0 1s
sidecar-cronjob-28689977-lt77c 0/2 PodInitializing 0 6s
sidecar-cronjob-28689977-lt77c 1/2 PodInitializing 0 6s
sidecar-cronjob-28689977-lt77c 1/2 Completed 0 7s
Hooray!
Bonus: if you don’t have sidecarContainers enabled
If you’re still on Kubernetes 1.28 (or worse) and don’t have the ability to enable alpha featureFlags, you’ll need to find another method.
Unfortunately, the solution will likely involve modifying your main application’s code or its Docker image. You can:
- add a retry policy in the sidecar-user application
- add a script in the sidecar-user application that waits a bit (sleep) before trying to contact the sidecar
The first is a good practice when dealing with microservices and you should consider it anyway to handle temporary database connection issues.
The second is a band-aid on a wooden leg. I strongly advise against it because startup speed can vary in the sidecar and adding too much delay in the application is also bad when you need to handle incidents and bugs in production (potentially inducing other problems).
