KubernetesPlatform engineeringReliability

Kubernetes Probes: Getting Liveness, Readiness, and Startup Right

Three probe types, one common source of production incidents. Here's what each one does and how to configure them without shooting yourself in the foot.

17 March 2026·6 min read

Kubernetes health probes are one of those features that seem simple until something goes wrong in production. A pod gets killed in a restart loop. Traffic hits a service that isn't ready. A slow-starting application gets terminated before it finishes initialising. In most cases, the root cause is a probe that's misconfigured — either too aggressive, too lenient, or confused with another probe type.

There are three probe types, and understanding what each one is for matters more than knowing the configuration syntax.

A liveness probe answers one question: is this container still alive and functioning, or is it stuck in a state it can't recover from? If the liveness probe fails, Kubernetes kills the container and starts a new one. This is the probe you use for deadlock detection — for situations where the process is still running but is no longer capable of doing useful work. The key thing to understand is that liveness is not for slow startup or transient errors. If a container fails a liveness probe during startup because it hasn't finished initialising, Kubernetes will restart it — and if the restart doesn't help, you end up in a CrashLoopBackOff. This is one of the most common misconfiguration mistakes.

A readiness probe answers a different question: is this container ready to receive traffic? If the readiness probe fails, Kubernetes removes the pod from the service endpoints. No requests are routed to it. The container keeps running — it just isn't serving traffic. Readiness is the right tool for transient unavailability: a dependency is down, a cache is warming up, the container is handling a burst and needs a moment before it can take more requests. Unlike liveness, a failed readiness probe doesn't cause a restart. It's a temporary signal, not a death sentence.

A startup probe is the newest of the three, and it exists specifically to solve the slow-start problem that liveness probes create. If you have a container that takes 60 or 90 seconds to initialise — a JVM application, a service loading a large model, anything with substantial startup work — you can't set a liveness probe with a generous initialDelaySeconds without also making your deadlock detection slow for the life of the pod. The startup probe runs instead of the liveness probe until it succeeds. Once it does, the liveness probe takes over. This separation gives you fast deadlock detection during normal operation without penalising slow-starting containers.

In practice, the most robust configuration uses all three. Set a startup probe with a high failureThreshold and a reasonable periodSeconds — enough time for your slowest valid startup. Set a liveness probe with a low failureThreshold and a short periodSeconds, because once the application is running, a genuine deadlock should be detected quickly. Set a readiness probe that checks whether the application can actually handle requests — not just that the process is alive, but that the dependencies are reachable and the service is warm.

The other common mistake is using an HTTP endpoint that always returns 200 regardless of application state. A probe that can't fail provides no signal. Your liveness endpoint should actually check whether the application is functional. Your readiness endpoint should check whether it's ready to serve. These are different questions and they deserve different implementations.

Probes configured correctly are invisible — your pods start cleanly, bad containers get replaced quickly, and traffic never hits a service that isn't ready. Probes configured incorrectly are a reliable source of production incidents. The investment in getting them right is small compared to the cost of debugging a CrashLoopBackOff at 2am.

Share:Post

RSS

Why GitOps Changes How Your Team Ships to Production

5 min read

Read →

Terraform Modules: The Right Way to Scale Multi-Cloud Infrastructure

6 min read

Read →

Incident Response on Kubernetes: A Practical Runbook

7 min read

Read →

Stay in the loop

Get new posts delivered to your inbox

Platform engineering, frontend craft, and systems thinking. No spam.

Enjoyed this?

If you're thinking about your own website or brand, let's talk.

Get in touch →

← All posts

← Back to blog

KubernetesPlatform engineeringReliability

Kubernetes Probes: Getting Liveness, Readiness, and Startup Right

Three probe types, one common source of production incidents. Here's what each one does and how to configure them without shooting yourself in the foot.

17 March 2026·6 min read

There are three probe types, and understanding what each one is for matters more than knowing the configuration syntax.

Share:Post

RSS

Why GitOps Changes How Your Team Ships to Production

5 min read

Read →

Terraform Modules: The Right Way to Scale Multi-Cloud Infrastructure

6 min read

Read →

Incident Response on Kubernetes: A Practical Runbook

7 min read