TechOps Examples
Posts
How Karpenter Feature Gates Helped on Black Friday

How Karpenter Feature Gates Helped on Black Friday

Govardhana M K
December 02, 2024

TechOps Examples

Hey — It's Govardhana MK 👋

Along with a use case deep dive, we identify the remote job opportunities, top news, tools, and articles in the TechOps industry.

👋 Before we begin... a big thank you to today's sponsor NOTOPS

🚀 Simplify Cloud-Native with NotOps.io 🌐

Overwhelmed by Kubernetes and cloud-native complexities? NotOps.io is here to transform the way you manage cloud infrastructure.

🌟 Battle-Tested Cloud-Native Tooling
Leverage decades of expertise. We’ve curated the perfect mix of tools and practices to deliver unmatched stability, security, and scalability.

🔒 Secure by Design
Security isn’t an afterthought; it’s built into every layer of NotOps.io. With regular updates, your infrastructure stays patched and protected, effortlessly.

👀 Why NotOps.io?
From stability to security to speed, organizations are experiencing measurable results from day one with NotOps.io.

Get Started For FREE

IN TODAY'S EDITION

🧠 Use Case

How Karpenter Feature Gates Helped on Black Friday

🚀 Top News

AWS announced EKS Auto Mode, a new feature that simplifies Kubernetes Operations

👀 Remote Jobs

XM is hiring a AWS Cloud Engineer
Remote Location: Worldwide

Givelify is hiring a Platform Engineer
Remote Location: USA, Canada

📚️ Resources

Building an Enterprise CI/CD Pipeline with Jenkins, Docker, Trivy, and GKE

Mastering Argo CD Image Updater with Helm: A Complete Configuration Guide

How to Achieve 60% AWS Cost Optimization with Functions and Tags?

Why struggle with file uploads? Pinata’s File API is your fix

Simplify your development workflow with Pinata’s File API. Add file uploads and retrieval to your app in minutes, without the need for complicated configurations. Pinata provides simple file management so you can focus on creating great features.

Build now!

🛠️ TOOL OF THE DAY

Pixie - Instant Kubernetes Native Application Observability tool to view the high-level state of your cluster (service maps, cluster resources, application traffic).

🧠 USE CASE

How Karpenter Feature Gates Helped on Black Friday

My Black Friday experience this time is with an e-commerce platform, handling the typical objective: scaling quickly under heavy traffic while keeping costs low.

We all know Karpenter is a great Kubernetes cluster autoscaler, designed to help dynamically manage workloads by provisioning nodes tailored to your requirements.

And I would like to talk today about the Karpenter feature gates and how it helped to end up here 👇️

And these were the challenges starring at me:

Traffic could skyrocket at any moment, requiring immediate node scaling without delays.
We heavily relied on Spot Instances. These instances could be reclaimed at any time, creating potential downtime.
Over time, some nodes in the cluster could drift from their intended configurations, resulting in wasted resources.
High workloads could stress some nodes to failure, requiring quick detection and repairs.

To tackle these challenges, I enabled three powerful Karpenter Feature Gates:

SpotToSpotConsolidation:
Migrates workloads from at-risk spot instances to more stable, cost-effective options before termination.
Drift:
Automatically detects misaligned or underutilized nodes and replaces them to maintain efficiency.
NodeRepair:
Automatically detects unhealthy nodes and repairs or replaces them without manual intervention.

Implementation:

1. Enable Feature Gates

Helm Chart Configuration to update and deploy:

settings:

featureGates:

SpotToSpotConsolidation: true

Drift: true

NodeRepair: true

2. Configure Provisioner

apiVersion: karpenter.sh/v1alpha5

kind: Provisioner

metadata:

name: techops-provisioner

spec:

requirements:

- key: "karpenter.sh/capacity-type"

operator: In

values: ["spot"]

provider:

instanceTypes: ["m5.large", "m5.xlarge"]

ttlSecondsAfterEmpty: 30

consolidation:

enabled: true

This configuration ensures:

Spot Instances are prioritized for cost savings.
Instances match workload requirements.
Empty nodes are terminated after 30 secs.
SpotToSpotConsolidation is enabled via the consolidation.enabled parameter.

Note: We already configured and deployed Helm with the Drift and NodeRepair feature gates enabled. These parameters work automatically in the background and don’t need to be added to the Provisioner file.

3. Monitoring and Observability

SpotToSpotConsolidation Logs:

{"level":"info","msg":"Migrating workload from spot node techops1 to more stable node techops2"}

Drift Detection Logs:

{"level":"info","msg":"Drift detected on node techops1. Marking for termination and replacement."}

NodeRepair Logs:

{"level":"info","msg":"Node repair initiated for unhealthy node techops1"}
{"level":"info","msg":"Node replaced successfully"}

Final Results:

30% cost savings
Lean, healthy infrastructure
Zero downtime throughout the event

90% of the engineers I knew who were hyping Kubernetes in 2021 are now trying to escape managing it.
The other 10% went on to make careers giving talks about how to manage it.
— Govardhana Miriyala Kannaiah (@govardhana_mk)
11:24 AM • Nov 29, 2024

You may even like:

KEDA vs Karpenter - Which One to Choose

Looking to promote your company, product, service, or event to 23,000+ TechOps Professionals? Let's work together.