Zwindler's Reflection

Conferences

Mon, 03 Nov 2025 18:00:00 +0000

Alongside this blog, I enjoy sharing my passion for computer science in general through conferences.

Presentation Materials

Note: You can contact me for meetups and Brown Bag Lunches (BBLs) for any of the talks you’ll find below. I’m also available for podcast recordings on these topics.

Handy for following live or finding links 😉

Conference List

I have been (or will soon be) a speaker at the following events:

Meetup TADx : Kubernetes : 5 façons créatives de flinguer sa prod 🔫 (video)
DevFest Nantes 2025 : Limits, Requests, QoS, PriorityClasses, on balaie ce que vous pensiez savoir sur le scheduling dans Kubernetes (avec Quentin Joly) (video, feedbacks)
DevoxxFR 2025 : Kubernetes : 5 façons créatives de flinguer sa prod 🔫 (video)
DevoxxFR 2025 : Ne perdez plus vos photos de vacances 🔥🏠🔥 (ou tout autre fichier important) (video)
VoxxedDays Luxembourg 2024 : Démystifions le fonctionnement interne de Kubernetes (video)
Archilocus 2024 : Table ronde “Comment je monitore ma prod ?” video
FOSDEM 2024 : Putting an end to Makefiles in go projects with GoReleaser (video / feedbacks here and here)
BBL ManoMano janvier 2024 : Demystifying the Internal Components of Kubernetes
Conférence Université de Pau et des Pays de l’Adour janvier 2024 : SRE ! SRE Partout !
OSXP 2023 : Démystifions le fonctionnement interne de Kubernetes (version raccourcie)
BBL HelloAsso novembre 2023 : Démystifions le fonctionnement interne de Kubernetes
BDX I/O 2023 : En finir avec les Makefile en Go avec GoReleaser (video / feedbacks)
Cloud Nord 2023 : Démystifions le fonctionnement interne de Kubernetes (video, feedbacks)
DevoxxFR 2023 : Démystifions le fonctionnement interne de Kubernetes (video, feedbacks (4.36/5 - 33 votes))
BBL Zenika Bordeaux mars 2023 : Démystifions le fonctionnement interne de Kubernetes
Meetup ToursJUG décembre 2022 : Démystifions le fonctionnement interne de Kubernetes
BBL Sellsy Bordeaux Septembre 2022 : SRE ! SRE Partout !
BBL Ippon Bordeaux Juillet 2022 : SRE ! SRE Partout !
VoxxedDays Luxembourg 2022 : Du code Terraform VRAIMENT factorisé avec Terragrunt 👹 (video)
Devoxx France 2022 : Ciel ! Mon Kubernetes mine des Bitcoins ! (video)
Cloud Sud 2022 : SRE ! SRE Partout ! (video)
Cloud Sud 2022 : Du code Terraform VRAIMENT factorisé avec Terragrunt
BBL Malt (remote) Mars 2022 : SRE ! SRE Partout !
GDG Nantes Mars 2022 : SRE ! SRE Partout ! (video)
Touraine Tech 2022 : Dis papa ! C’est quoi un SRE ? (video, feedback)
Open Source Experience 2021 : Ciel mon Kubernetes mine des Bitcoins ! (format raccourci / 20 minutes, video)
Cloud Est 2021 : Dis papa ! C’est quoi un SRE ? (video)
Cloud Nord 2021 : Dis papa ! C’est quoi un SRE ? (video, feedback)
RabbitMQ Summit 2021 : 101 ways to break RabbitMQ (en anglais)
GDG Lille Meetup mai 2021 : Ciel, mon Kubernetes mine des Bitcoins ! (video)
RRLL 2020 : Le logiciel libre a-t-il de beaux jours devant lui ? (video)
Hack-it-N 2019 : Ciel, mon Kubernetes mine des Bitcoins !
BDX I/O 2019 : Le logiciel libre a-t-il de beaux jours devant lui ? (vidéo, feedbacks)
CNCF Meetup Bordeaux #6 : Besoin de métriques Prometheus à long terme ? Thanos fera des Marvels ! (lien du meetup)
Pas Sage En Seine 2019 : Le logiciel libre a-t-il de beaux jours devant lui ? vidéo
CNCF Meetup Bordeaux #3 : Dans ton Kube : retour sur 2 ans d’incidents en production (lien du meetup)
BDX I/O 2018 : Ami développeur, deviens un Ops sans effort avec Ansible (vidéo du talk)

Podcasts and Interviews

Author

In addition to this blog and these conferences, I’m also an author:

of an article in GNU/Linux Magazine · GLMF-265; En finir avec les Makefiles en Go avec GoReleaser
of a book published by Eyrolles titled Kubernetes : 50 solutions pour les postes de développement et les clusters de production

Regarding the book, I also have a separate website that traces the entire adventure of creating this book, from the idea to the release date. It’s available at 50ndk.zwindler.fr

Organizer

I’m currently a member of the CNCF Bordeaux Meetup organization team, led by Alexis Fala
I reviewed the CFP for the BDX I/O 2025 conference
I participated in organizing the BDX I/O 2024 conference (I talk about it in this article)
I participated in organizing the KCD France 2023 conference (I talk about it in this article)

All My Replays on My Peertube

Talks available as replays are also available on my personal PeerTube instance

Cilium's new policy log field: our use case

Mon, 03 Nov 2025 12:00:00 +0200

TL;DR

Cilium 1.18 added a log field to CiliumNetworkPolicies to tag flows with custom labels. Great for filtering out expected blocked traffic from your monitoring dashboards!

But there’s a catch, unrelated to this feature, that made this irrelevant in our use case: you can’t use it with egressDeny + toFQDNs.

AND, there is a bug, that makes the “log” only visible on “allowed” traffic.

Here’s why we ran into this wall and what we learned.

The problem: monitoring all the things (but not too much)

Like any good ops team should, we monitor our Kubernetes cluster network flows using Hubble. We (mostly my colleague Nicolas Nativel) push all AUDIT and DROPPED flows to a dashboard so we can quickly spot when something’s blocked and decide:

Is this legitimate? → Open the flow
Is this suspicious? → Sound the alarm 🚨

This works pretty well… until you start explicitly blocking things that you know should be blocked.

In our case, we wanted to prevent a third-party application from phoning home with its “telemetry” (yeah, let’s call it that 😏). We’re talking about calls to external tracking domains.

The issue? If we just block these flows, they’ll show up as DROPPED in Hubble, trigger our monitoring, and we’ll end up with alerts for something we intentionally blocked.

That’s noise we don’t want.

Enter Cilium 1.18’s policy log field

Good news! Cilium 1.18 introduced exactly what we needed: the ability to add custom log fields to your network policies.

Check out the official announcement.

The idea is simple: you add a log field to your CiliumNetworkPolicy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
 name: my-policy
spec:
 endpointSelector:
 matchLabels:
 app: my-app
 egress:
 - toFQDNs:
 - matchName: "example.com"
 log:
 value: "my-custom-log-tag"

Then, when you observe flows in Hubble, you can filter them out using CEL (Common Expression Language):

hubble observe \
 --verdict AUDIT \
 --not \
 --cel-expression "(_flow.policy_log.endsWith('my-custom-log-tag'))" \
 --print-raw-filters

Output:

allowlist:
- '{"verdict":["AUDIT"]}'
denylist:
- '{"experimental":{"cel_expression":["(_flow.policy_log.endsWith(''my-custom-log-tag''))"]}}'

Perfect! This is exactly what we need. We can now tag our “expected blocks” and exclude them from our monitoring.

The plan: block telemetry elegantly

Armed with this new feature, we crafted our strategy:

Use egressDeny to explicitly block telemetry domains
Add a custom log field: app-explicit-traffic-blocked
Configure Hubble to filter out flows with this tag
Profit! 🎉

Here’s what we tried:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
 name: app-external-block-policy
 namespace: my-namespace
spec:
 endpointSelector:
 matchLabels:
 app.kubernetes.io/name: my-app
 # note: egressDeny takes precedence over egress rules
 # https://docs.cilium.io/en/stable/security/policy/language/#deny-policies
 egressDeny:
 # Block all external traffic and log it with an arbitrary log field
 # This is used to prevent the app from sending telemetry data externally 
 # without triggering an AUDIT/DROPPED alert
 # feature added in cilium 1.18.0 https://github.com/cilium/cilium/pull/39902
 - toFQDNs:
 - matchPattern: "*.telemetry.example.com"
 toPorts:
 - ports:
 - port: "443"
 protocol: TCP
 - port: "80"
 protocol: TCP
 log:
 value: "app-explicit-traffic-blocked"

This should work, right? We’re using egressDeny (which takes precedence over other allow rules, for legitimate calls, which is good!), and we’re tagging it with our custom log.

Reality check: you can’t have nice things

And then… patatra (as we say in French 🇫🇷).

While reading the Cilium documentation on deny policies, we stumbled upon this little gem:

Deny policies do not support:

policy enforcement at L7, i.e., specifically denying an URL

toFQDNs, i.e., specifically denying traffic to a specific domain name.

Wait, what?

You cannot use toFQDNs with egressDeny. Our entire plan just collapsed 😱.

Why this is a problem

The issue is the precedence model in Cilium:

egressDeny rules take precedence over egress rules (by design, and that’s good!)
But if we use egressDeny without toFQDNs, we have to block by IP or CIDR
These telemetry services probably use dynamic IPs for their endpoints (good luck maintaining a list…)
If we block all 80/443 traffic in egressDeny, we can’t make exceptions for legitimate traffic in egress rules because… deny takes precedence to allow!

We’re stuck between a rock and a hard place:

Use egress with toFQDNs → works, but we can’t deny, only allow other traffic
Use egressDeny with IPs → we’ll be playing whack-a-mole with rotating IP ranges
Use egressDeny to block all 80/443 → we block everything, including legitimate traffic

Potential workarounds

While waiting for Cilium to support toFQDNs in egressDeny policies, here are some alternative approaches you might consider:

Find a way to disable telemetry in the app directly

That’s the best option but sadly not always on the table.

DNS-based blocking

Bend the DNS server to return NXDOMAIN for telemetry domains, like a personal pi-hole server would do with ads. The application will fail to resolve the domain and won’t send data.

Use IP-based egressDeny (with maintenance overhead)

Resolve the telemetry FQDNs to their current IP ranges and block them with egressDeny:

egressDeny:
 - toCIDRSet:
 - cidr: 203.0.113.0/24  # Example telemetry IP range
 toPorts:
 - ports:
 - port: "443"

If the list doesn’t evolve too often, this is a good option.

Ok, but let’s assume there is no legitimate traffic. Can we use the feature to add a log on dropped traffic?

Sadly no, not right now.

There is a bug in this new Cilium feature that only logs the policy_log field on “allowed” flows, not on audit/dropped flows.

Policy log does not work for DROPPED/AUDIT flow

When defining a CiliumNetworkPolicy with the spec.log field configured, I expect the relevant hubble flows to have the policy_log field. It works for allowed flow.

But for denied/audited flow resulting from the rule (implicit or explicit), policy_log is never available.

Note: I observe the same issue with --print-policy-names option of hubble, the k8s:io.cilium.k8s.policy.derived-from label is not set for denied flows (but correctly set for allowed flows).

and a related issue [Hubble CLI] –print-policy-names flag does not do anything opened by someone else.

Since 2 tickets are opened and maintainers have started to acknowledge the issue, we can hope this will be fixed, though.

Conclusion

In our use case, we finally didn’t use this new feature from Cilium, but adding details (and allowing filtering on them as well) is always nice.

A shout-out to my colleague Nicolas Nativel, who did most of the work around CiliumNetworkPolicies, including the dashboards, exploratory work on this feature, and took the time to create the issue on the Cilium repository.

References

$> whoami (about)

Mon, 03 Nov 2025 00:00:00 +0000

About me.

Who am I?

I’m a fan of science fiction, fantasy, and symphonic metal. I love running and wide open spaces.

But what I enjoy most is tinkering with my home servers and installing new open source software. And since anyone who tinkers often needs external help to achieve their goals, I’m well placed to know how nice it is to find accessible information on the Internet.

Hence my contribution: this blog.

Speaker / Conferences / Podcasts / Press

Alongside this blog, I enjoy sharing my passion for computer science in general through conferences. You can find more details (and replays/materials/etc) on this page

I co-organized of the Kubernetes Community Days France (Paris), which took place for the first time in March 2023, and also the 2024 edition of BDX I/O (Bordeaux, France).

Finally, I’ve written an article in GNU Linux Magazine about GoReleaser, wrote a whole book (in french) about 50 different ways to deploy Kubernetes (site of the book / editor’s official web page) and been invited to several podcasts.

Note: You can contact me for meetups and BBLs for any of the talks you’ll find in the Conferences tab mentioned above. I’m also available for podcast recordings on these topics.

My Professional Background

I graduated as a Computer Science Engineer from ENSEIRB-MATMECA, specializing in Information Systems through the Networks and Distributed Systems option.

After several years of Unix/Linux/Windows system administration in small and large organizations, consulting firms or end clients, I’m now a jack-of-all-trades in system engineering.

This generalist approach allowed me to quickly evolve to a systems architect position at a major French retail giant, then a cloud engineer position at a major French industrial company, and SRE at a famous French music streaming company.

I’m currently a Platform Engineer at a company that publishes an HR SaaS.

Recompile Mimir’s "MetaMonitoring" Grafana Dashboards for Kubernetes

Thu, 12 Dec 2024 18:00:00 +0200

Context

When working on observability, there is a tool that always comes in mind first : Grafana.

Grafana is an open source visualization tool developed by Grafana Labs, and I’m sure you all know it (and I also wrote about it in French quite a few times). But aside from this, they also develop a lot of other useful tools in the observability landscape, to the point that you can in theory build you whole o11y stack with only Grafana Labs Tools.

To answer Prometheus lack of long term storage and lack of high availability features (I have NEVER understood why the Prometheus team refuse working on this), Grafana Labs forked Cortex a few years back and renamed it Mimir.

I won’t cover the installation of Mimir here, there are plenty of tutorial on the Internet and an official documentation for this.

Instead, I’ll talk about an issue that I have with the official Mimir helm chart, and more precisely with the built-in Grafana dashboards that come along with it.

Dashboards, you say?

Mimir is shipped with a lot of useful Grafana dashboards to help ensure that the components are running fine.

These dashboards are compatible with the various deployment modes of Mimir. In Kubernetes, if you use the mimir-distributed official helm chart this can be enabled by a simple value:

metaMonitoring:
 dashboards:
 enabled: true

But, by default, all dashboards installed using the metaMonitoring value in the mimir helm charts are precompiled JSON manifests using jsonnet/mixin.

For example, here is the precompiled version of the “Mimir / Overview dashboard”:

github.com/grafana/mimir/blob/2640b8f72127548e9e3da281a763476b03fb4aae/operations/mimir-mixin-compiled/dashboards/mimir-overview.json

By design, you can’t change things like the prefix name of the mimir pods, which makes these precompiled dashboards useless in a helm-like environment where release name (mimir-) is a prefix of the pod.

kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
mimir-alertmanager-0 1/1 Running 0 24h
mimir-alertmanager-1 1/1 Running 0 24h
mimir-compactor-0 1/1 Running 0 24h
mimir-distributor-5d668b479f-ksltr 1/1 Running 1 (24h ago) 6d1h
...

In this case, all dashboards will be broken, all showing “no data” in Grafana because data will be incorrectly filtered. For example, the “Write requests / sec” panel in “Mimir / Overview dashboard”, has a label job=~"($namespace)/((distributor..., but our pod is mimir-distributor, not distributor:

sum by (status) (
label_replace(label_replace(rate(cortex_request_duration_seconds_count{cluster=~"$cluster", job=~"($namespace)/((distributor.*|cortex|mimir|mimir-write.*))", route=~"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics"}[$__rate_interval]),
"status", "${1}xx", "status_code", "([0-9]).."),
"status", "${1}", "status_code", "([a-zA-Z]+)"))

The solution is to disable the metaMonitoring flag from the chart, and build / ship the dashboards separately.

Procedure

Get the Mimir sources:

git clone https://github.com/grafana/mimir.git

Hopefully, the jsonnet/mixin files include a job_prefix variable that will help us fix this:

sed -i.bak "s/job_prefix: '(\$namespace)\/',/job_prefix: '(\$namespace)\/mimir-',/" operations/mimir-mixin/config.libsonnet

Rebuild the dashboards

make build-mixin

podman image inspect grafana/mimir-build-image:pr9491-80f5778956 >/dev/null 2>&1 || podman pull grafana/mimir-build-image:pr9491-80f5778956
podman tag grafana/mimir-build-image:pr9491-80f5778956 grafana/mimir-build-image:latest
[...]
make: Leaving directory '/go/src/github.com/grafana/mimir'
 10,10 real 0,02 user 0,01 sys

Note: If you don’t have docker on your machine (I use podman), the make command will fail because it can’t find docker and the docker binary is hardcoded in the make commands. Modify the Makefile to replace docker by podman.

The json files in operations/mimir-mixin-compiled/dashboards are now built with the correct pod names.

Create a grafana-dashboards helm chart (called yourDashboardsChart here).

helm create yourDashboardsChart

In this chart, create a src/dashboards/mimir directory (for json dashboard sources) alongside the classic templates directory containing the actual go-templated YAML manifests. We will create the gotemplate helm files just after:

cp operations/mimir-mixin-compiled/dashboards/* ../yourDashboardsChart/src/dashboards/mimir

Now, for each json file generated by jsonnet, we are going to create a helm gotemplated yaml file, which in turn will create a ConfigMap for each dashboard in our Kubernetes cluster. They will look like this:

---
# Source: mimir-distributed/templates/metamonitoring/grafana-dashboards.yaml
apiVersion: v1
kind: ConfigMap
metadata:
 name: mimir-alertmanager-dashboard
 namespace: '{{ $.Release.Namespace }}'
 labels:
 grafana_dashboard: "1"
 annotations:
 k8s-sidecar-target-directory: /tmp/dashboards/Mimir Dashboards
data:
 mimir-alertmanager.json: |-
 {{ $.Files.Get "src/dashboards/mimir/mimir-alertmanager.json" | fromJson | toJson }}

To speed up the process, you can reuse the helm template and a few bash commands to generate all the helm gotemplate files for you:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

mkdir -p mimir

helm -n monitoring template mimir grafana/mimir-distributed --set metaMonitoring.dashboards.enabled=true > helm-output.yaml

# Count the number of document separators
doc_count=$(grep -c '^---$' helm-output.yaml)

# Split the YAML file into separate files for each document
csplit -f mimir/helm-output- helm-output.yaml '/---/' "{$((doc_count - 2))}" >/dev/null

# Some triming/cleaning
for file in mimir/helm-output-*; do
 if grep -q 'kind: ConfigMap' "$file" && grep -q 'dashboard' "$file"; then
 name=$(yq eval '.metadata.name' "$file")
 yq eval -i 'del(.metadata.labels."helm.sh/chart", .metadata.labels."app.kubernetes.io/name", .metadata.labels."app.kubernetes.io/instance", .metadata.labels."app.kubernetes.io/version", .metadata.labels."app.kubernetes.io/managed-by")' "$file"
 yq eval -i '.metadata.namespace = "{{ $.Release.Namespace }}"' "$file"
 yq eval -i '.data |= with_entries(.value = "{{ $.Files.Get \"src/dashboards/mimir/" + .key + "\" | fromJson | toJson }}")' "$file"
 mv "$file" "mimir/${name}.yaml"
 else
 rm "$file"
 fi
done

mv mimir/* ../yourDashboardsChart/templates/mimir

Now, you should have all the files to re-generate the grafana dashboard in your Kubernetes cluster, with the correct prefix.

Enjoy!

Source

AI manifesto

Fri, 15 Nov 2024 12:00:00 +0200

All written content on this blog has been thought out, written, and reviewed by a human being

I operate on the principle (like Cassidy Williams) that if I don’t take the time to write the content of this blog myself, you shouldn’t take the time to read it.

It’s imperfect (in typos, errors, awkward phrasing) because it’s human, which makes it authentic and alive.

The only small exception I allow myself: when I translate articles that I’ve previously written (by myself, without LLMs) in one language (French or English) and want to make a version in the other. In that case, I sometimes use an LLM to speed up the translation of certain paragraphs (but I often end up rewriting them anyway 🙃).

For non-written content, it’s more complicated

Finding beautiful and relevant illustrations is often complex, especially to be sure they’re copyright-free. I say this because several amateur bloggers have been taken advantage of by scammers assisted by unscrupulous lawyers in the past.

While I generally try to create my own illustrations (screenshots, montages of software logos I’m testing, homemade diagrams, sometimes even my own drawings), I have occasionally used GenAI tools to generate illustration images to save time.

Aware of the ethical issues towards artists, I commit to restricting this use to a minimum in the future.

Are LLMs / GenAI tools bad?

I don’t wish to say that LLMs / GenAI tools are a bad thing, nor that I don’t use them. I use them to learn, to assist me in my daily life. However, what I wish to convey through this blog, I want it to come from me.

Inspiration

Note: this page is inspired by an idea from Damola Morenikeji (www.bydamo.la/p/ai-manifesto), which I discovered via Cassidy Williams (through this post bsky.app/profile/cassidoo.co/post/3l7aqai5vsg2w)

Run AIDungeon 2 on Ubuntu 18.04

Wed, 08 Jan 2020 11:30:00 +0000

AIDungeon 2

A few weeks ago, I stumbled upon AIDungeon 2, a hilarious project mixing Text based adventure, (heavy) machine learning and big (big BIG) CUDA GPUs.

AIDungeon2 is a first of its kind AI generated text adventure. Using a 1.5B parameter machine learning model called GPT-2 AIDungeon2 generates the story and results of your actions as you play in this virtual world. Unlike virtually every other game in existence, you are not limited by the imagination of the developer in what you can do. Any thing you can express in language can be your action and the AI dungeon master will decide how the world responds to your actions. https://www.aidungeon.io/

So, if you haven’t heard of it yet, it’s a machine learning project created by Nick Walton, a college student. It’s based on the GPT-2 text model from OpenAI that you may already seen in other fun projects and was trained to predict the next word using 40 GB of Internet text (you can also check TalkTotransformer by Adam D King).

So… What does it do ?

That’s where the fun begins. Like the TalkTotransformer generator, from a simple semi random generated background, the machine learning model builds the start of a new story, different every time. And your actions stear the story in one way or the other.

I won’t lie, it’s far from perfect. The model tends to run in circles, or forgets what the other characters did just a line before, which can be really annoying.

But… aside from this, the possibilities seem to be limitless, and that’s REALLY impressive.

After all, that’s not really surprising. You’re only limited by the knowledge and writing style of 40 GB of Internet text! If you need examples, I invite you to take a look at Nick Walton’s twitter feed or AIDungeon subreddit to find out the most hilarious adventures the AIDungeon community came upon.

In this example, my inputs are the 2 lines preceded by the « > » symbol, the machine did the rest

What’s the catch ?

I’ve already said it.

That game is probably the most GPU intensive game you’ve run in your life. For a text based adventure, even that is already ironically fun.

The game requires nearly 9 GB of GPU VRAM and a lot of CUDA cores, ruling out all AMD cards and nearly every NVidia cards costing less than 1500$.

Bummer…

Community to the rescue

Hopefully, the community enthusiasm was so intense that Nick Walton and his brother have decided to drop everything else to improve it. During december, they built mobile Apps and now a web based one to play on every device.

Of course, these apps run on AWS servers featuring Tegra GPUs and cost around 65k$ a month in hosting. They have managed to raise nearly 15k$/month on their Patreon Account but there may come a day (probably very soon) where they won’t be able to provide free access for everyone.

So, after you read the article, if you like the game, don’t forget to support them!

So what now?

It also turns out that, Nick Walton published the game as an open source project. That’s where I became the most interested in this game, in fact. And you can find the sources on the Github AIDungeons project.

Starting from there, I asked myself:

Hey! Wouldn’t it be nice to run it on a cloud instance on a random cloud provider GPU powered VM?

So I decided to try it.

For my test, I chose to run AIDungeon on a NC6 virtual machine (6 vcpus, 56 GiB memory, Tegra K80) on Microsoft Azure on a free test account. This machine costs approximatly 0.43€ per hour (or even 0,21€ if you use preemtible VMs) so even if you don’t use a free credit, it won’t cost you too much.

Side note: Of course, if you have a GTX 1080 Ti, GTX 2080 Ti or GTX 2080ti super (or a K80), you can also run it on your own machine…

On my NC6, I deployed a Ubuntu 18.04.

And that’s where this tutorial begins ;-)

Side note: If you follow this guide and start on a fresh Ubuntu 18.04, the installation process should take 30 minutes to 45 minutes.

Install updates and prerequisites

Once connected on the machine, update and upgrade the OS.

sudo apt-get update
sudo apt-get upgrade

Then, install prerequisites packages for AI

sudo apt-get install git aria2 unzip python3-pip

Dependancy hellscape

Now, the real fun begins. AIDungeon uses TensorFlow and CUDA drivers to run. But here’s the catch: not every versions will work!

To run AIDungeons, you have to install specifically tensorflow 1.15 (no more, no less). And tensorflow==1.15 specifically requires cuda10.0 (not cuda10.1 nor cuda10.2) and Python 3.4 to 3.7!

The dependancy nightmare begins…

Install NVidia drivers and Cuda and machine learning modules

Add the cuda10.0 repos:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

Now, we can install nvidia-driver, and reboot:

sudo apt-get install --no-install-recommends nvidia-driver-440 xserver-xorg-video-nvidia-440
sudo reboot

After reboot, install cuda10.0 libs

sudo apt-get install --no-install-recommends \
cuda-10-0 cuda-runtime-10-0 cuda-demo-suite-10-0 cuda-drivers \
libcudnn7=7.6.2.24-1+cuda10.0 libcudnn7-dev=7.6.2.24-1+cuda10.0
sudo apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \
libnvinfer-dev=5.1.5-1+cuda10.0

Get the sources

Download the source from Github and hop-in in the directory:

git clone https://github.com/AIDungeon/AIDungeon/
cd AIDungeon/

Install python dependancies through pip

Sadly, installation time not yet over. Now that we have the python project on deck, we need to install the Python dependancies… By default, Ubuntu 18.04 still serves Python 2 as default Python interpreter, which is now deprecated since 1st january. Hopefully Python 3 is easily available (not like on CentOS 7).

Also, the pip (python package manager) installed should be updated as pip installed by Ubuntu is not compatible with tensorflow 1.15.

Upgrading pip in place can be tedious as this often lead to “pip ImportError: cannot import name ‘main’ after update” error message. To work around this, use the script given in upgrading pip official page and you should be fine.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
sudo python3 get-pip.py

Install AIDungeon for good

Since I wrote the draft of the article a few weeks ago, there was a dependancy missing (gsutil) and the install script was not perfect. But now it seems to be working better and even uses venvs for a clean Python dependancies install.

sudo ./install.sh

If it doesn’t work, you can install it yourself with the following commands:

python3 -m pip install -r requirements.txt --user

The next script allows you to download the AIDungeon machine learning model through a torrent file (at first, Nick had a terrifying GDrive bill due to enormous egress traffic).

./download_model.sh
[...]
Status Legend:
(OK):download completed.
Download Complete!

Run AIDungeon 2

Finally, you can now sit back and enjoy the game!

If you used the install.sh script, use the following command (with venv):

source ./venv/bin/activate
./play.py

If not, skip the venv step:

cd ~/AIDungeon/
python3 play.py

The initialisation should take a few minutes (don’t panic, it’s “normal”), depending of your setup. In december, initialization took 5-10 minutes but there seem to have been optimisation now as it took only a minute or two last time I checked.

Bonus: Useful command to check GPU consumption

Install gpustat to check if GPU usage is working

pip3 install gpustat --user
gpustat
gpustat -cp
aidungeon2 Mon Dec 9 13:22:24 2019 430.50
[0] Tesla K80 | 69'C, 17 % | 0 / 11441 MB |

Or use integrated nvidia tool (a little crude)

nvidia-smi --loop=1

Bonus: Check that GPU is working with tensorflow

See Tensorflow GPU guide

python3
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Example working GPU setup

2019-12-10 16:25:39.714605: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-10 16:25:39.743758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 81c7:00:00.0
2019-12-10 16:25:39.744001: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
[...]
2019-12-10 16:25:39.754396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
Num GPUs Available: 1

Example of non working GPU setup

[...]
Num GPUs Available: 0

If you have this, game will be very slow (waiting 1-2 minutes between each answer) but will not crash. Check that Cuda and tensorFlow are proprely installed.

Bonus: gsutil and tensorflow errors

If you forgot or failed to upgrade pip, you will get this error :

Collecting tensorflow==1.15 (from -r requirements.txt (line 6))
Could not find a version that satisfies the requirement tensorflow==1.15 (from -r requirements.txt (line 6)) (from versions: 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.8.0rc0, 1.8.0rc1, 1.8.0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc2, 1.9.0, 1.10.0rc0, 1.10.0rc1, 1.10.0, 1.10.1, 1.11.0rc0, 1.11.0rc1, 1.11.0rc2, 1.11.0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc2, 1.12.0, 1.12.2, 1.12.3, 1.13.0rc0, 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1)
No matching distribution found for tensorflow==1.15 (from -r requirements.txt (line 6))

You should not come across this anymore now, but if you get this error you should install gsutil python module to avoid stacktrace when saving / exiting

> ^C
Traceback (most recent call last):
File "play.py", line 211, in <module>
play_aidungeon_2()
File "play.py", line 97, in play_aidungeon_2
action = input("> ")
KeyboardInterrupt
Exception ignored in: <bound method Story.__del__ of <story.story_manager.Story object at 0x7f797f4af0f0>>
Traceback (most recent call last):
File "/home/zwindler/AIDungeon/story/story_manager.py", line 35, in __del__
self.save_to_storage()
File "/home/zwindler/AIDungeon/story/story_manager.py", line 131, in save_to_storage
p = Popen(['gsutil', 'cp', file_name, 'gs://aidungeonstories'], stdout=FNULL, stderr=subprocess.STDOUT)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'gsutil': 'gsutil'

Sources

Open source bashing still has a bright future ahead

Mon, 29 Oct 2018 19:30:26 +0000

This article is a translation of an article I previous wrote in french and has been asked by a friend of mine.

[French]Pour les francophones qui n’auraient pas lu l’article en Français, il est disponible ici[/French]

Once is not custom

(or maybe twice, as I already wrote an opinion piece for a weird issue between comics authors and Creative Commons)

Yeah, I have said that I wouldn’t write opinion pieces, but then, who can blame me when I find this in my Twitter from SAP France account (sponsored, of course).

This comic series (I couldn’t find the author name anywhere) tells the story of a project manager that has to implement a big computer science project. In this particular episode, he thought it was a good idea to try to do it with open source solutions.

But clearly his team fails to deliver, having all sort of « open source » related issues. In the end, it’s SO MUCH EASIER to choose SAP.

Riiiight. As if we don’t know anybody among our profesionnal relations that struggled with SAP ;-).

Did you say bashing ?

I don’t know for sure if the word bashing still only has it’s colorful first signification in english (like bashing someone’s head in), but in french, we heavily borrowed your word to describe the act of systematic denigration of a person or a subject.

A quick search of « bashing » terms in my favorite search engine gave me: (French politician) Mélenchon bashing, but also (French footballer) Benzema bashing, and even plactic bashing. This has gone too far… :p

If you are an IT professional, you might already know that for a long time, free open source softwares were frowned upon in the professional world.

More than once, I heard :

« It’s free, it’ll never work… »
« Why do you contribute to this free project? You should get paid for this work rather than give it away freely ».
« It’s a free open source software developped by bearded men in a garage » (being bearded clearly is an issue here. Which is funny if you consider that hipsters fashistas now all wear beards. And that Microsoft started in a garage…)

If that’s not open source bashing, I don’t know what is…

He’s the one who started it!

Maybe he’s not the one, but the ultimate quote of open source bashing is probably given for eternity to Steve Ballmer from Microsoft for his world famous « Linux is a cancer » in 2001.

Oh boy how much I laugh why I look in the mirror.

first they ignore you
then they laugh at you
then they fight you
then you win
-So many people think it’s Gandhi who said that, but that’s probably not true ;)

Tough luck for Ballmer, that was a big mistake. Most companies, starting from the biggest but now more and more average to small businesses rely heavily on open source solutions and solutions that work only on Linux/Unix systems.

While it was not so much of a problem when we lived in the physical servers world, on premise, it was still OK.

Turn. Turn! Turn aroouuuuuuuuund!

Virtualization was the first nail in the coffin.

First version of Hyper-V, Microsoft attempt (ahah) at a professional virtualization platform was excruciatingly slow/broken/not suited for Linux/Unix runtimes. This benefited mostly to VMware (and in some proportion to Citrix and KVM), which was already the leader back then and also resulted in a considerable loss of market share for Microsoft in this area.

If I remember correctly, that’s around the time when Microsoft began to make less vindicative statement about Linux. It even resulted in a partnership with SUSE Linux first, then other distribs, and I remember because it surprised me a lot at that time.

Nowadays, Microsoft can’t shake off the idea that’s it’s some kind of open source arch enemy, no matter how much is spent on contributions on open sources projects. Microsoft communicates heavily on the fact that they are (as a company) one of the biggest (if not the biggest) contributor in open source, far before Google or Redhat (I don’t know if IBM acquisition changes anything to this). They even have a website claiming their love for open source open.microsoft.com!

We are all in on open source
–Satya Nadella (actual CEO of Microsoft)

Nothing seems to matter. When they bought Github.com for nearly 8 billion dollars, the Internet nearly took the pitchforks out. « Why Github acquisition by Microsoft chocks the Internet » were titling newspapers. In the process, a lot of open source teams left Github in a hurry, never realizing that they had been on a closed platform all along and that Microsoft decision changed virtually nothing. There is a great article about this called Le danger Github but sadly (for you) it’s only in french…

And the others?

Many followed the trend. It’s commonly admitted (even though some might call it a little bit unethical) to play on both sides of the source (open/closed).

One example among many others is VMware, which acquired a few years ago Pivotal, contributes to open source projects, and open sourced some of it’s own. While at the same time maintaining a large part of it’s portfolio closed source and making a lot of money with it.

Still, with all these possibilities, some big companies, selling uber-complex softwares still continue to choose open source bashing to promote their product. Among the years, I came across these examples:

A big company selling an ERP/ETL, claiming that should you use the open source alternative, you will fail. The image chosen for the illustration was a Donkey showing teeth, looking really dump and silly (sadly, I didn’t think to take a screenshot at that time)…
A company selling a platform to secure file transfers (one of the most costly software I ever came across), explaining us that you will in fact loose money if you choose an open source solution rather than their own. They made the comparaison between their software and … plain FTP. Well, for a paid solution, I do hope that their solution is better! But choosing to ignore the open source solutions that DO compete with their product is not really fair (to say the least).
An then, there is SAP

Should we take a look again at the tweet?

Today’s comic is caracteristic of those caricatures of a comparison we get from those companies.

So here’s what is said in the first bubble:

So? How is progressing your data platform based on open source software

From the look of our hero, I guess it not going well…

« We need to install the new release AGAIN! » / « 2.65 beta! »

The main idea to convey here is that open source software are unreliable. The versions change quickly to compensate blocking bugs. They are even so buggy that you need to deploy betas in order to hope to make it run.

Now that’s really bad-faith. There are many open source project ran by teams adamant on stability, delivery process and proper quality checks and tools.

On the contrary, many software project do deploy badly tested code, open source or not, with sometimes catastrophic impacts. One example very close to us is the last October update of Windows 10, which has skipped one of the usual validation steps to be released in production early (maybe just to be on time) which deleted some users directories!

« Wait! It’s not compatible with my module! » / « Why is the text highlighted in yellow in the new version? » »

One part in this is true. Building a complex solution from multiple components is indeed a challenge (open source or not… but let’s not stray) and does indeed brings a certain number of problems. You can’t improvise this and have to build procedures to avoid major issues.

But let’s consider for a second the alternative we are proposed here. Let’s take a total random example, say, a big bad closed source costly ERP. How does it’s lifecycle plays out?

I’ve seen it countless times, there are 2 types of companies :

Heavily staffed and organized. They take this big software and they do it right. Meaning: they put up a whole team whose sole purpose is to maintain the whole platform up-to-date. They don’t do anything else, it’s really costly.
Small or(/and) not organized. The maintaning up-to-date is « just a thing », in the mind of the board. Why put so much resources, when actual admins can do it when they have time? And that’s were the fun begins. In those companies, at the first bug, the support of the big software will tell the admin to update, before even taking a look at the case. You end up with an ERP that is kept in a working state but never updated, in case something could break. Hackers/security auditors will be so happy ;).

« And the fork, what do we do with it? »

That’s an excellement question, and I’m really glad you asked.

The idea here is that, at some point, the team in charge had to fork an open source project, maybe to patch with an homemade fix or to add a missing functionality. Eventually, the real open source project moved on and now, our team doesn’t know what to do. Forget their work and switch back to the main project, or continue to maintain their own copy while the main project grows?

Let’s be honest here. If you rely on open source softwares and don’t contribute your improvments (be they fixes or functionalities), don’t expect your work to be future proof!

« Buy a new server / Buy another server / Try with a server from another brand »

I’d really like to have an opinion on this, because it’s seems important, as this is told 3 times. But unfortunatly I have no freaking idea of what critic of the open source the author meant.

Funnies of all, SAP does contribute to Open Source!

I have had trouble to understand this at all at first. Because I checked, SAP is one of those companies that DO contribute! In the past years, SAP open sourced 4 internal big projects (and a few hundred of little more), all available freely on Github!

There is also an OFFICIAL LIST of SAP open source contributions. SAP has no reason to contribute actively on those external projects if they are not using these at least partly. According to Matt Asay, they even rank as the seventh biggest open source contributor on Github! Just behind Amazon and way before Facebook or Mozilla!

I’m 98% sure (as you can see that’s a really precise metric :p) that all SAP softwares at least use a third party open source component. More than an intuition, it’s more the fact that everyone uses open source at some level, be it frameworks, libs, or third party modules to ease integrations with third party softwares.

Last, but not least, there is a documentation page on SAP website, listing FOSS and licences that go with them. That doesn’t say that all SAP software use open source, but at least one does.

In conclusion

Some might say that I’m over-reacting, that it’s just a bad tweet from France marketing team rather than a real will to undermine open source philosophy.

Yet, I can’t shake the idea that all the people in marketing from these companies MUST know that open source was for a long time a rude word. This probably still echos on the mind of people less close to today’s work, who have « always done it like this » and « don’t see a reason to change now ». Claiming that open source is unreliable will be probably most effective to top management.

And don’t be fooled, this tweet’s target is not developers or system administrators. This tweets aims top management, who have to decide weither they go for open source or SAP solutions.

No, I don’t think that’s a mistake; on the contrary, I think it’s done on purpose and probably very effective.

Sources

The initial tweet (deleted)
SAP HANA, la plateforme de données qui rend libre ! (litterally, « SAP HANA, the data platform that frees/opens ». I find the irony hilarious and I thank the post author for the pun intended)
Who really contributes to open source on Infoworld
official list of SAP open source contributions
FOSS and licences for third party open source projects that come SAP
An article about famous « Linux is a cancer » from Steve Ballmer
Le danger Github from Carl Chenet [FR]
October update removing user folders (arstecnica)

Zwindler's Reflection

Conferences

Presentation Materials

Conference List

Podcasts and Interviews

Author

Organizer

All My Replays on My Peertube

Cilium's new policy log field: our use case

TL;DR

The problem: monitoring all the things (but not too much)

Enter Cilium 1.18’s policy log field

The plan: block telemetry elegantly

Reality check: you can’t have nice things

Why this is a problem

Potential workarounds

Find a way to disable telemetry in the app directly

DNS-based blocking

Use IP-based egressDeny (with maintenance overhead)

Ok, but let’s assume there is no legitimate traffic. Can we use the feature to add a log on dropped traffic?

Conclusion

References

$> whoami (about)

Who am I?

Speaker / Conferences / Podcasts / Press

My Professional Background

Recompile Mimir’s "MetaMonitoring" Grafana Dashboards for Kubernetes

Context

Dashboards, you say?

Procedure

Source

AI manifesto

All written content on this blog has been thought out, written, and reviewed by a human being

For non-written content, it’s more complicated

Are LLMs / GenAI tools bad?

Inspiration

Run AIDungeon 2 on Ubuntu 18.04

AIDungeon 2

So… What does it do ?

What’s the catch ?

Community to the rescue

So what now?

Install updates and prerequisites

Dependancy hellscape

Install NVidia drivers and Cuda and machine learning modules

Get the sources

Install python dependancies through pip

Install AIDungeon for good

Run AIDungeon 2

Bonus: Useful command to check GPU consumption

Bonus: Check that GPU is working with tensorflow

Bonus: gsutil and tensorflow errors

Sources

Archives

Open source bashing still has a bright future ahead

Once is not custom

Did you say bashing ?

He’s the one who started it!

Turn. Turn! Turn aroouuuuuuuuund!

And the others?

Should we take a look again at the tweet?

« We need to install the new release AGAIN! » / « 2.65 beta! »

« Wait! It’s not compatible with my module! » / « Why is the text highlighted in yellow in the new version? » »

« And the fork, what do we do with it? »

« Buy a new server / Buy another server / Try with a server from another brand »

Funnies of all, SAP does contribute to Open Source!

In conclusion

Sources