Flux and GitOps: What, Why and How
In the previous post, we set up a Kubernetes cluster on our home server using Ansible. We installed Flux to manage app deployments and cluster configurations. But we didn’t discuss why we’re using Flux, what GitOps is, and how it fits together.
In this series we’re using Flux, but the same concepts and principals apply to other GitOps tools like ArgoCD. |
The Problem(s)
If you’ve worked in a DevOps environment (where the developers who write the code are also responsible for deploying it, supporting it, etc.), I’d bet money that your deployment process looks something like this:
Beautiful, almost nostalgic. But there are a few problems with it:
-
What if the cluster wasn’t in the state you expected it to be in?
-
It’s probably happened to you. You’re doing a routine deployment, but the pipeline fails because the pods didn’t reach a ready state. You check the deployment manifest, turns out your favourite colleague was testing stuff and added
replicas: 0
but forgot to take it back out (again). Or worse, the namespace you’re deploying to no longer exists! 🔥
-
-
Now that the pipeline has failed and the cluster is in a failed state, what do you do?
-
You had a fancy pipeline to abstract away and automate the whole deployment process, but now that you need to dive in you’re stuck. Are you going to run
kubectl
commands against the cluster to see what’s going on? Or runkubectl delete deployment/<app>
to start fresh, and then re-run the pipeline? Hopefully you’re not in an bridge call with 100 people watching your screen as you typekubectl --help
. How exactly do you rollback, what permissions do you need to do so, and then how do you guarantee that the cluster is now in the state you expect it to be?
-
-
You’ve heard of SecOps and you wear the T-shirt with pride. You ignore your therapist when she says not to apply the Zero Trust Model to everyday life. I think part of you knows how insecure this process is but don’t want to admit it.
-
Not only does the CI/CD pipeline need write access to your cluster, but probably so does every developer on your team, because when the pipeline fails, the cluster will require manual intervention to troubleshoot and fix. This means any successful phishing attack, or even a lost/stolen laptop, could spell game over.
-
-
It’s Friday evening and you’ve been called into an urgent meeting. "What is our DR?!" your manager shouts. Confused, you wonder if he’s been drinking. "We probably see different doctors, boss".
-
Your cluster has disappeared. Most likely it was on Azure. You raise a platinum support ticket to get your money’s worth but you know you’re on your own. Now you’re tasked with a rebuild. You’ll need to duplicate/update all the CI/CD pipelines and monitoring/alerting rules. Create new service connections. Copy over (hopefully) existing manifests and configurations. Re-deploy all your applications (who knows what versions were actually live? It’s definitely not written anywhere. You find the following pipeline code and your impostor syndrome climaxes:
sed -ie "s/IM_THE_VERSION_REPLACE_ME_LOL/$actualVersionLol/g" deployment.yml kubectl apply -f deployment.yml
).
-
So it seems the de facto solution for deployments is to somehow get some state into Kubernetes, and then make continuous alterations to that state. This makes it very hard to know what is the current state of the cluster, how to get to any given state, and who has access to the cluster.
The Solution
Imagine if you could define the desired state of your cluster in a Git repository, and that there was a tool that would monitor your cluster, and make sure it matched the state in Git. It would mean you could make changes to your cluster with a git commit
. Changes would be auditable, and reversible. A rollback would be a git revert
. Who has permission to make changes to the cluster is as simple as looking at the Git repository settings. This is what Flux offers us, and the process of driving change through Git is called GitOps.
This is what the same process above would look like using Flux and GitOps:
What does "Push code defining what should be in the cluster" look like in practice? Well, we could for example apply the following manifests:
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: prometheus-community-charts
namespace: observability
spec:
interval: 24h (1)
url: https://prometheus-community.github.io/helm-charts
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: kube-prometheus-stack
namespace: observability
spec:
interval: 30m
chart:
spec:
chart: kube-prometheus-stack
version: "61.x" (2)
sourceRef:
kind: HelmRepository
name: prometheus-community-charts
namespace: observability
interval: 12h
1 | Flux will check this Helm repository every 24 hours for new versions. |
2 | If Flux detects a new version, e.g. 61.1 , it will automatically update the Helm release to use that version. This is so powerful. You simply define what you want in your cluster, and Flux will make it so. By the end of this series we’ll have several apps deployed, all of which will be being kept automatically up to date, with absolutely no effort on our side. |
Let’s see how this new process has solved the problems above:
-
What state is the cluster in?
-
Explore the
main
branch of your Git repository.
-
-
How to get back to a given state?
-
git revert
.
-
-
Security?
-
We’ve closed the cluster off from the outside world. Flux runs within the cluster, with read-only egress to the Git repository. Developers and service accounts no longer need access.
-
-
Reproducibility?
-
The entire cluster state is in code in Git. You can install Flux on any other cluster and have it bootstrap from the Git repository to get back to the exact same state.
-
There are other benefits as well, like developers not needing to learn new tools or Kubernetes intricacies. Instead, they use an interface they’re already familiar with: Git.
Next Steps
In the next post we’re going to solve the problem of secrets management in our cluster.