Wayfinder is a toolkit for curating, testing, and deploying collections of platform capabilities (Voyages) as versioned OCI artifacts to a container registry. Wayfinder provides example configurations for deploying Voyages to a fleet of Kubernetes clusters in a GitOps manner.
Extensible tooling built with CUE is used to demonstrate a baseline process for managing the ongoing component upgrade cycle.
To get your bearings with Wayfinder, you can spin up a local dev environment using Docker and Kubernetes KIND in under five minutes to take you first Voyage.
When operating Kubernetes in Production at any scale, but especially as the number of clusters moves from 10s to 100s of clusters, it becomes essential to have an automated manner to package and deploy managed collections of platform capabilities that have been verified to work well together. These platform capabilities form the basis of the shared services that product delivery teams rely on to successfully deploy, operate, and monitor core customer-facing workloads.
Platform capabilities are often provided by third-party software (open-source and commercial) packaged as Helm Charts which are installed and managed by an infrastructure, DevOps, or platform team. On a steady basis, new versions of platform components are released to address vulnerabilities, fixes, or to add new features. Every organization is different with regard to the cost, benefit, and risk of upgrading platform components and will establish policies for performing required upgrades. The platform operations team must factor the ongoing maintenance into the schedule and cost of running the container platform.
Most teams operating at scale follow a GitOps model, where the desired state of the platform is represented in source control, a tool such as Flux CD or ArgoCD is used to reconcile the current state to the the desired state. While GitOps practices and tools are generally considered a necessity, they do not answer higher order questions such as:
The above, non-exhaustive questions illustrate the inherent complexity of running production Kubernetes. In order to scale, platform operators require not only efficient tools and automation, but also an overall lifecycle managing Kubernetes.
What changes are required to solve this problem and achieve the project goals?
What alternatives did you consider? Describe the evaluation criteria for how you chose the proposed solution.