- TechOps Examples
- Posts
- Kubernetes Node Not Ready - How To Fix It
Kubernetes Node Not Ready - How To Fix It
Good day. It's Wednesday, Aug. 28, and in this issue, we're covering:
Kubernetes Node Not Ready - How To Fix It ?
Google Cloud Run now supports GPUs to host your LLMs
Streamline Local Development with Dev Containers and Test containers
The Ultimate Docker Cheat Sheet
Talos Kubernetes on Proxmox using OpenTofu
End-to-End DevOps Project: Building, Deploying, and Monitoring a Full-Stack Application
You share. We listen. As always, send us feedback at [email protected]
Before moving ahead....a great news ✋
We partnered with 1440 to bring you this FREE offering.
All your news. None of the bias.
Be the smartest person in the room by reading 1440! Dive into 1440, where 3.5 million readers find their daily, fact-based news fix. We navigate through 100+ sources to deliver a comprehensive roundup from every corner of the internet – politics, global events, business, and culture, all in a quick, 5-minute newsletter. It's completely free and devoid of bias or political influence, ensuring you get the facts straight.
Use Case
Kubernetes Node Not Ready - How To Fix It ?
It is very familiar to see a mix of node statuses in a Kubernetes cluster, especially when troubleshooting. Sometimes, nodes might be marked as NotReady
due to various issues.
Typically it looks like:
techops_examples@master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 51m v1.31.0
node-worker-1 NotReady worker 49m v1.31.0
node-worker-2 Ready worker 47m v1.31.0
Behind the scenes:
The kubelet on each node is responsible for reporting the node's status to the control plane, specifically to the node-lifecycle-controller. The control plane then assesses this data (or the absence of it) to determine the node’s state.
This is what happens in the background:
Behind the Scenes
The node’s kubelet sends information about various checks it performs, including:
Whether the network for the container runtime is functional.
If the CSI (Container Storage Interface) provider on the node is fully initialized.
The completeness of the container runtime status checks.
The operational state of the container runtime itself.
The functionality of the pod lifecycle event generator.
Whether the node is in the process of shutting down.
The availability of sufficient CPU, memory, or pod capacity on the node.
This information is then relayed to the node-lifecycle-controller, which uses it to assign the node one of the following statuses:
True: All checks have passed, indicating the node is operational and healthy.
False: One or more checks have failed, showing the node has issues and isn’t functioning correctly.
Unknown: The kubelet hasn’t communicated with the control plane within the expected timeframe, leaving the node's status unclear.
When the status is marked as Unknown, it usually indicates that the node has lost contact with the control plane, possibly due to network problems, kubelet crashes, or other communication failures.
Diagnosis:
1. Node Status Check:
Run → Kubectl get nodes
and watch out for the status ‘NotReady’
techops_examples@master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 51m v1.31.0
node-worker-1 NotReady worker 49m v1.31.0
node-worker-2 Ready worker 47m v1.31.0
2. Node Details and Conditions Check:
To dive deeper into why a node might be NotReady
, use the kubectl describe
command to get detailed information on the node's condition:
MemoryPressure: Node is low on memory.
DiskPressure: Node is running out of disk space.
PIDPressure: Node has too many processes running.
techops_examples@master:~$ kubectl describe node node-worker-1
Name: node-worker-1
Roles: worker
Labels: kubernetes.io/hostname=node-worker-1
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
CreationTimestamp: 2024-08-28T09:25:10Z
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False 2024-08-28T10:14:52Z 2024-08-28T09:26:35Z KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False 2024-08-28T10:14:52Z 2024-08-28T09:26:35Z KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False 2024-08-28T10:14:52Z 2024-08-28T09:26:35Z KubeletHasSufficientPID kubelet has sufficient PID available
Ready False 2024-08-28T10:14:52Z 2024-08-28T09:27:45Z KubeletNotReady PLEG is not healthy: pleg was last seen active 5m58.89150698s ago; threshold is 3m
This output shows the node's current conditions and highlights the specific reason (PLEG is not healthy
) for the NotReady
status, allowing you to take appropriate action.
3. Network Misconfiguration Check:
Run → ping <node-IP>
to check connectivity between the nodes.
If there's packet loss, it indicates a possible network issue that might be causing the node's NotReady
status.
techops_examples@master:~$ ping 10.0.0.67
PING 10.0.0.67 (10.0.0.67) 56(84) bytes of data.
--- 10.0.0.67 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3054ms
4. Kubelet Issue Check:
Run → systemctl status kubelet
on the node to verify if the kubelet service is running properly.
If the kubelet is down, it may be the reason for the node's NotReady
status.
techops_examples@node-worker-1:~$ systemctl status kubelet
kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2024-08-28 09:25:10 UTC; 1h 29min ago
Main PID: 2345 (kubelet)
Tasks: 13 (limit: 4915)
Memory: 150.1M
CPU: 8min 27.345s
CGroup: /system.slice/kubelet.service
└─2345 /usr/bin/kubelet
5. Kube-proxy Issue Check:
Run → kubectl get pods -n kube-system -o wide | grep kube-proxy
to check the status of the kube-proxy
pods on the node.
If the kube-proxy
pod is in a crash loop or not running, it could cause network issues leading to the NotReady
status.
techops_examples@master:~$ kubectl get pods -n kube-system -o wide | grep kube-proxy
kube-proxy-5b7c8dfd9f-lk1bp 1/1 Running 0 1h 10.0.0.67 node-worker-1
How To Fix:
1. Resolve Lack of Resources:
Increase Resources: Scale up the node or optimize pod resource requests and limits.
Monitor & Clean: Use
top
orhtop
to monitor usage, stop non-Kubernetes processes, and check for hardware issues.
2. Resolve Kubelet Issues:
Check Status: Run systemctl status kubelet
.
active (running): Kubelet is fine; the issue might be elsewhere.
active (exited): Restart with
sudo systemctl restart kubelet
.inactive (dead): Check logs with
sudo cat /var/log/kubelet.log
to diagnose.
3. Resolve Kube-proxy Issues:
Check Logs: Use
kubectl logs <kube-proxy-pod-name> -n kube-system
to review logs.DaemonSet: Ensure the kube-proxy DaemonSet is configured correctly. If needed, delete the kube-proxy pod to force a restart.
4. Checking Connectivity:
Network Setup: Verify network configuration, ensure necessary ports are open.
Test Connections: Use
ping <node-IP>
andtraceroute <node-IP>
to check network connectivity.
I believe the next time you see "NotReady," you'll know the reason and where to begin checking!
p.s. if you think someone else you know may like this newsletter, share with them to join here
Tool Of The Day
k8sGPT - a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English.
Trends & Updates
Resources & Tutorials
Picture Of The Day
Did someone forward this email to you? Sign up here
Interested in reaching smart techies?
Our newsletter puts your products and services in front of the right people - engineering leaders and senior engineers - who make important tech decisions and big purchases.