Kubernetes Scaling Strategies

In partnership with

TechOps Examples

Hey โ€” It's Govardhana MK ๐Ÿ‘‹

๐ŸŽˆ๐ŸŽ‰ We are celebrating our 150th edition today! The journey has been truly remarkable, and I couldnโ€™t thank you enough for your continuous support, which gives me the energy to expand the offerings. Many more centuries to come...

Along with a use case deep dive, we identify the remote job opportunities, top news, tools, and articles in the TechOps industry.

๐Ÿ‘‹ Before we begin... a big thank you to today's sponsor SUPERHUMAN AI

Find out why 1M+ professionals read Superhuman AI daily.

In 2 years you will be working for AI

Or an AI will be working for you

Here's how you can future-proof yourself:

  1. Join the Superhuman AI newsletter โ€“ read by 1M+ people at top companies

  2. Master AI tools, tutorials, and news in just 3 minutes a day

  3. Become 10X more productive using AI

Join 1,000,000+ pros at companies like Google, Meta, and Amazon that are using AI to get ahead.

IN TODAY'S EDITION

๐Ÿง  Use Case
  • Kubernetes Scaling Strategies

๐Ÿš€ Top News

๐Ÿ‘€ Remote Jobs

๐Ÿ“š๏ธ Resources

๐Ÿ“ข Reddit Threads

๐Ÿ› ๏ธ TOOL OF THE DAY

Git Productivity Toolkit - A collection of scripts that extend Git with various sub-commands to make life easier.

๐Ÿง  USE CASE

Kubernetes Scaling Strategies

Whether you need to handle traffic spikes, optimize resource usage, or optimize costs, choosing the right scaling strategy can make or break your clusterโ€™s performance. Let's look at the prominent ones.

1. Manual Scaling with kubectl scale

This is useful when you have predictable workloads or just need to increase/decrease replicas quickly.

You manually adjust the replica count for a Deployment or StatefulSet using kubectl

kubectl scale deployment techops-app --replicas=5
Heads Up:
  • This method doesnโ€™t auto adjust for traffic changes.

  • If you forget to scale down, you might waste resources and money.

  • No protection against overloading pods, they could be running at max CPU with no automatic scale up.

2. Horizontal Pod Autoscaler (HPA)

HPA automates scaling by adjusting the number of pod replicas based on CPU, memory, or custom metrics.

How It Works:
  • HPA queries the metrics server for CPU/memory utilization.

  • If usage exceeds the threshold, HPA calculates new replica count.

  • Updates the Deployment/ReplicaSet with the new replica number.

Example HPA for a deployment:

kubectl autoscale deployment techops-app --cpu-percent=50 --min=2 --max=10

This means when the CPU usage exceeds 50%, the deployment scales up.

Heads Up:
  • HPA only works with CPU, memory, or custom metrics, it canโ€™t react to queue lengths or requests per second.

  • Requires Metrics Server to be running. Install it if missing.

  • Sync period matters as HPA does not react instantly to spikes and checks metrics every 15 seconds by default.

3. Vertical Pod Autoscaler (VPA)

HPA scales horizontally by adding pods. But what if you want to optimize resource allocation per pod? Thatโ€™s where VPA helps.

Instead of increasing pod count, VPA adjusts CPU/memory requests for existing pods.

How It Works:
  • VPA reads pod usage metrics over time.

  • Provides resource recommendations.

  • Can automatically apply new limits (which requires pod restart).

Example VPA for a Deployment:

apiVersion: autoscaling.k8s.io/v1

kind: VerticalPodAutoscaler

metadata:

name: my-app-vpa

spec:

targetRef:

apiVersion: apps/v1

kind: Deployment

name: techops-app

updatePolicy:

updateMode: "Auto"

Example output:

Recommendation:
  Target:
    Cpu:  300m
    Memory:  512Mi

This means VPA suggests increasing CPU request to 300m and memory to 512Mi.

Heads Up:
  • VPA restarts pods when updating resource requests, which may disrupt running applications.

  • Not ideal for high availability workloads that canโ€™t afford downtime.

  • Works well for batch jobs or long running workloads, but not great for latency sensitive apps.

4. Kubernetes Event Driven Autoscaler (KEDA)

What if your scaling decisions need to be based on external events (e.g., Kafka messages, RabbitMQ queues, Prometheus alerts)? HPA and VPA wonโ€™t help here, but KEDA will.

KEDA enables event driven scaling by feeding metrics from external sources into HPA.

How It Works:
  • Event sources (Kafka, RabbitMQ, etc.) emit metrics.

  • KEDA reads metrics and provides them to Kubernetes.

  • Kubernetes triggers HPA to scale accordingly.

Example Scaling Based on RabbitMQ Queue Length:

apiVersion: keda.sh/v1alpha1

kind: ScaledObject

metadata:

name: rabbitmq-consumer

spec:

scaleTargetRef:

name: techops-app

minReplicaCount: 1

maxReplicaCount: 10

triggers:

- type: rabbitmq

metadata:

queueName: techops-queue

queueLength: "10"

Now, Kubernetes will scale up pods when queue length exceeds 10 messages.

Heads Up:
  • KEDA requires external event sources - not useful for CPU/memory based scaling.

  • You need to define proper thresholds. Otherwise, your system might scale too aggressively.

  • Works well with HPA but doesnโ€™t replace it. KEDA feeds metrics into HPA, which actually performs scaling.

For many workloads, a hybrid approach works best.

HPA + VPA โ†’ Prevents overprovisioning and avoids resource starvation.

HPA + KEDA โ†’ Reduces latency and scales instantly on events.

HPA + VPA + KEDA โ†’ Cuts costs while handling both load spikes and steady growth.

Looking to promote your company, product, service, or event to 40,000+ Cloud Native Professionals? Let's work together.