- TechOps Examples
- Posts
- Kubernetes Scaling Strategies
Kubernetes Scaling Strategies
TechOps Examples
Hey โ It's Govardhana MK ๐
๐๐ We are celebrating our 150th edition today! The journey has been truly remarkable, and I couldnโt thank you enough for your continuous support, which gives me the energy to expand the offerings. Many more centuries to come...
Along with a use case deep dive, we identify the remote job opportunities, top news, tools, and articles in the TechOps industry.
๐ Before we begin... a big thank you to today's sponsor SUPERHUMAN AI
Find out why 1M+ professionals read Superhuman AI daily.
In 2 years you will be working for AI
Or an AI will be working for you
Here's how you can future-proof yourself:
Join the Superhuman AI newsletter โ read by 1M+ people at top companies
Master AI tools, tutorials, and news in just 3 minutes a day
Become 10X more productive using AI
Join 1,000,000+ pros at companies like Google, Meta, and Amazon that are using AI to get ahead.
IN TODAY'S EDITION
๐ง Use Case
Kubernetes Scaling Strategies
๐ Top News
๐ Remote Jobs
Trumid is hiring a Senior Platform Support Engineer
Remote Location: Worldwide
Reflektive Labs is hiring a DevOps Engineer
Remote Location: India
๐๏ธ Resources
๐ข Reddit Threads
๐ ๏ธ TOOL OF THE DAY
Git Productivity Toolkit - A collection of scripts that extend Git with various sub-commands to make life easier.
๐ง USE CASE
Kubernetes Scaling Strategies
Whether you need to handle traffic spikes, optimize resource usage, or optimize costs, choosing the right scaling strategy can make or break your clusterโs performance. Let's look at the prominent ones.
1. Manual Scaling with kubectl scale
This is useful when you have predictable workloads or just need to increase/decrease replicas quickly.
You manually adjust the replica count for a Deployment or StatefulSet using kubectl
kubectl scale deployment techops-app --replicas=5
Heads Up:
This method doesnโt auto adjust for traffic changes.
If you forget to scale down, you might waste resources and money.
No protection against overloading pods, they could be running at max CPU with no automatic scale up.
2. Horizontal Pod Autoscaler (HPA)
HPA automates scaling by adjusting the number of pod replicas based on CPU, memory, or custom metrics.
How It Works:
HPA queries the metrics server for CPU/memory utilization.
If usage exceeds the threshold, HPA calculates new replica count.
Updates the Deployment/ReplicaSet with the new replica number.

Example HPA for a deployment:
kubectl autoscale deployment techops-app --cpu-percent=50 --min=2 --max=10
This means when the CPU usage exceeds 50%, the deployment scales up.
Heads Up:
HPA only works with CPU, memory, or custom metrics, it canโt react to queue lengths or requests per second.
Requires Metrics Server to be running. Install it if missing.
Sync period matters as HPA does not react instantly to spikes and checks metrics every 15 seconds by default.
3. Vertical Pod Autoscaler (VPA)
HPA scales horizontally by adding pods. But what if you want to optimize resource allocation per pod? Thatโs where VPA helps.
Instead of increasing pod count, VPA adjusts CPU/memory requests for existing pods.
How It Works:
VPA reads pod usage metrics over time.
Provides resource recommendations.
Can automatically apply new limits (which requires pod restart).

Example VPA for a Deployment:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: techops-app
updatePolicy:
updateMode: "Auto"
Example output:
Recommendation:
Target:
Cpu: 300m
Memory: 512Mi
This means VPA suggests increasing CPU request to 300m and memory to 512Mi.
Heads Up:
VPA restarts pods when updating resource requests, which may disrupt running applications.
Not ideal for high availability workloads that canโt afford downtime.
Works well for batch jobs or long running workloads, but not great for latency sensitive apps.
4. Kubernetes Event Driven Autoscaler (KEDA)
What if your scaling decisions need to be based on external events (e.g., Kafka messages, RabbitMQ queues, Prometheus alerts)? HPA and VPA wonโt help here, but KEDA will.
KEDA enables event driven scaling by feeding metrics from external sources into HPA.
How It Works:
Event sources (Kafka, RabbitMQ, etc.) emit metrics.
KEDA reads metrics and provides them to Kubernetes.
Kubernetes triggers HPA to scale accordingly.

Example Scaling Based on RabbitMQ Queue Length:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: rabbitmq-consumer
spec:
scaleTargetRef:
name: techops-app
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: rabbitmq
metadata:
queueName: techops-queue
queueLength: "10"
Now, Kubernetes will scale up pods when queue length exceeds 10 messages.
Heads Up:
KEDA requires external event sources - not useful for CPU/memory based scaling.
You need to define proper thresholds. Otherwise, your system might scale too aggressively.
Works well with HPA but doesnโt replace it. KEDA feeds metrics into HPA, which actually performs scaling.
For many workloads, a hybrid approach works best.
HPA + VPA โ Prevents overprovisioning and avoids resource starvation.
HPA + KEDA โ Reduces latency and scales instantly on events.
HPA + VPA + KEDA โ Cuts costs while handling both load spikes and steady growth.