Three probe types, one common source of production incidents. Here's what each one does and how to configure them without shooting yourself in the foot.
Kubernetes health probes are one of those features that seem simple until something goes wrong in production. A pod gets killed in a restart loop. Traffic hits a service that isn't ready. A slow-starting application gets terminated before it finishes initialising. In most cases, the root cause is a probe that's misconfigured — either too aggressive, too lenient, or confused with another probe type.
There are three probe types, and understanding what each one is for matters more than knowing the configuration syntax.
A liveness probe answers one question: is this container still alive and functioning, or is it stuck in a state it can't recover from? If the liveness probe fails, Kubernetes kills the container and starts a new one. This is the probe you use for deadlock detection — for situations where the process is still running but is no longer capable of doing useful work. The key thing to understand is that liveness is not for slow startup or transient errors. If a container fails a liveness probe during startup because it hasn't finished initialising, Kubernetes will restart it — and if the restart doesn't help, you end up in a CrashLoopBackOff. This is one of the most common misconfiguration mistakes.
A readiness probe answers a different question: is this container ready to receive traffic? If the readiness probe fails, Kubernetes removes the pod from the service endpoints. No requests are routed to it. The container keeps running — it just isn't serving traffic. Readiness is the right tool for transient unavailability: a dependency is down, a cache is warming up, the container is handling a burst and needs a moment before it can take more requests. Unlike liveness, a failed readiness probe doesn't cause a restart. It's a temporary signal, not a death sentence.
A startup probe is the newest of the three, and it exists specifically to solve the slow-start problem that liveness probes create. If you have a container that takes 60 or 90 seconds to initialise — a JVM application, a service loading a large model, anything with substantial startup work — you can't set a liveness probe with a generous initialDelaySeconds without also making your deadlock detection slow for the life of the pod. The startup probe runs instead of the liveness probe until it succeeds. Once it does, the liveness probe takes over. This separation gives you fast deadlock detection during normal operation without penalising slow-starting containers.
In practice, the most robust configuration uses all three. Set a startup probe with a high failureThreshold and a reasonable periodSeconds — enough time for your slowest valid startup. Set a liveness probe with a low failureThreshold and a short periodSeconds, because once the application is running, a genuine deadlock should be detected quickly. Set a readiness probe that checks whether the application can actually handle requests — not just that the process is alive, but that the dependencies are reachable and the service is warm.
The other common mistake is using an HTTP endpoint that always returns 200 regardless of application state. A probe that can't fail provides no signal. Your liveness endpoint should actually check whether the application is functional. Your readiness endpoint should check whether it's ready to serve. These are different questions and they deserve different implementations.
Probes configured correctly are invisible — your pods start cleanly, bad containers get replaced quickly, and traffic never hits a service that isn't ready. Probes configured incorrectly are a reliable source of production incidents. The investment in getting them right is small compared to the cost of debugging a CrashLoopBackOff at 2am.
Related posts
Stay in the loop
Get new posts delivered to your inbox
Platform engineering, frontend craft, and systems thinking. No spam.