- TechOps Examples
- Posts
- How Karpenter Feature Gates Helped on Black Friday
How Karpenter Feature Gates Helped on Black Friday
TechOps Examples
Hey โ It's Govardhana MK ๐
Along with a use case deep dive, we identify the remote job opportunities, top news, tools, and articles in the TechOps industry.
๐ Before we begin... a big thank you to today's sponsor NOTOPS
๐ Simplify Cloud-Native with NotOps.io ๐
Overwhelmed by Kubernetes and cloud-native complexities? NotOps.io is here to transform the way you manage cloud infrastructure.
๐ Battle-Tested Cloud-Native Tooling
Leverage decades of expertise. Weโve curated the perfect mix of tools and practices to deliver unmatched stability, security, and scalability.
๐ Secure by Design
Security isnโt an afterthought; itโs built into every layer of NotOps.io. With regular updates, your infrastructure stays patched and protected, effortlessly.
๐ Why NotOps.io?
From stability to security to speed, organizations are experiencing measurable results from day one with NotOps.io.
IN TODAY'S EDITION
๐ง Use Case
How Karpenter Feature Gates Helped on Black Friday
๐ Top News
๐ Remote Jobs
XM is hiring a AWS Cloud Engineer
Remote Location: Worldwide
Givelify is hiring a Platform Engineer
Remote Location: USA, Canada
๐๏ธ Resources
Why struggle with file uploads? Pinataโs File API is your fix
Simplify your development workflow with Pinataโs File API. Add file uploads and retrieval to your app in minutes, without the need for complicated configurations. Pinata provides simple file management so you can focus on creating great features.
๐ ๏ธ TOOL OF THE DAY
Pixie - Instant Kubernetes Native Application Observability tool to view the high-level state of your cluster (service maps, cluster resources, application traffic).
๐ง USE CASE
How Karpenter Feature Gates Helped on Black Friday
My Black Friday experience this time is with an e-commerce platform, handling the typical objective: scaling quickly under heavy traffic while keeping costs low.
We all know Karpenter is a great Kubernetes cluster autoscaler, designed to help dynamically manage workloads by provisioning nodes tailored to your requirements.
And I would like to talk today about the Karpenter feature gates and how it helped to end up here ๐๏ธ
And these were the challenges starring at me:
Traffic could skyrocket at any moment, requiring immediate node scaling without delays.
We heavily relied on Spot Instances. These instances could be reclaimed at any time, creating potential downtime.
Over time, some nodes in the cluster could drift from their intended configurations, resulting in wasted resources.
High workloads could stress some nodes to failure, requiring quick detection and repairs.
To tackle these challenges, I enabled three powerful Karpenter Feature Gates:
SpotToSpotConsolidation:
Migrates workloads from at-risk spot instances to more stable, cost-effective options before termination.Drift:
Automatically detects misaligned or underutilized nodes and replaces them to maintain efficiency.NodeRepair:
Automatically detects unhealthy nodes and repairs or replaces them without manual intervention.
Implementation:
1. Enable Feature Gates
Helm Chart Configuration to update and deploy:
settings:
featureGates:
SpotToSpotConsolidation: true
Drift: true
NodeRepair: true
2. Configure Provisioner
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: techops-provisioner
spec:
requirements:
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot"]
provider:
instanceTypes: ["m5.large", "m5.xlarge"]
ttlSecondsAfterEmpty: 30
consolidation:
enabled: true
This configuration ensures:
Spot Instances are prioritized for cost savings.
Instances match workload requirements.
Empty nodes are terminated after 30 secs.
SpotToSpotConsolidation is enabled via the
consolidation.enabled
parameter.
Note: We already configured and deployed Helm with the Drift and NodeRepair feature gates enabled. These parameters work automatically in the background and donโt need to be added to the Provisioner file.
3. Monitoring and Observability
SpotToSpotConsolidation Logs:
{"level":"info","msg":"Migrating workload from spot node techops1 to more stable node techops2"}
Drift Detection Logs:
{"level":"info","msg":"Drift detected on node techops1. Marking for termination and replacement."}
NodeRepair Logs:
{"level":"info","msg":"Node repair initiated for unhealthy node techops1"}
{"level":"info","msg":"Node replaced successfully"}
Final Results:
30% cost savings
Lean, healthy infrastructure
Zero downtime throughout the event
90% of the engineers I knew who were hyping Kubernetes in 2021 are now trying to escape managing it.
The other 10% went on to make careers giving talks about how to manage it.
โ Govardhana Miriyala Kannaiah (@govardhana_mk)
11:24 AM โข Nov 29, 2024
You may even like: