Deploy Error Decoder

CrashLoopBackOff — what it means and how to fix it

Quick answer: CrashLoopBackOff means your container starts, exits, and Kubernetes keeps restarting it with an ever-growing backoff delay. The crash is your app's, not Kubernetes'. Read the previous instance's logs with kubectl logs <pod> --previous, check the exit code and events in kubectl describe pod <pod>, then fix the root cause — usually bad config, a failing probe, a missing dependency, or an out-of-memory kill.

What you'll see

The pod never reaches Running for long, and the restart count climbs:

$ kubectl get pods
NAME                     READY   STATUS             RESTARTS   AGE
api-7d9f8c6b4-x2k9p      0/1     CrashLoopBackOff   6          4m

"BackOff" just means Kubernetes is waiting longer between each restart (10s, 20s, 40s… up to 5 min). It's a symptom — the real failure is whatever makes the container exit. Your job is to read why it exited.

Why a container crash-loops

The app exits non-zero on startup

A missing env var, an unreachable database, or an unhandled exception kills the process the moment it boots.

A liveness probe is failing

If the app is slow to start, the liveness probe trips and Kubernetes kills a perfectly healthy container before it's ready.

OOMKilled — out of memory

The container exceeded its memory limit and was killed. You'll see Reason: OOMKilled and exit code 137 in describe.

Missing config or secret

A referenced ConfigMap/Secret key doesn't exist, or a mounted file the app expects isn't there, so it aborts immediately.

Diagnose it in three steps

1

Read the logs of the crashed instance

The current container may be too new to have logs — use --previous to see the one that just died:

kubectl logs <pod> --previous
kubectl logs <pod> --previous --tail=50
2

Check the exit code and events

kubectl describe pod <pod>
# Look at: Last State, Reason, Exit Code, and the Events list.
# 137 = OOMKilled or SIGKILL.  1/2 = app error.  127 = command not found.
3

Reproduce without the loop

Override the entrypoint so the container stays up and you can poke around:

kubectl run debug --rm -it --image=YOUR_IMAGE --command -- sh
# then run your start command by hand and watch it fail
The real fix

Separate "is it alive" from "is it ready"

A huge share of crash loops are self-inflicted: a liveness probe with too short a delay kills an app that's simply slow to boot. Use a readiness probe to gate traffic, and give liveness a generous startup window (or a startupProbe):

startupProbe:
  httpGet: { path: /healthz, port: 3000 }
  failureThreshold: 30
  periodSeconds: 5        # up to 150s to boot before liveness kicks in
livenessProbe:
  httpGet: { path: /healthz, port: 3000 }
  periodSeconds: 10

If it's OOMKilled, raise the memory limit or fix the leak. If it's a config error, the fix is in your logs from step 1 — crash loops are loud once you read the right container.

How Infraveil handles this

Supervise services without the Kubernetes restart-loop tax

If you're fighting CrashLoopBackOff, it's worth asking whether you need Kubernetes' complexity at all. Infraveil is a backend operations control plane that runs on your own servers: it starts your services, health-checks them, and restarts what genuinely fails — without probe tuning, backoff math, or YAML archaeology. A failed deploy is caught at the gate and rolled back, not looped forever.

Health checks and auto-restart, with a clear startup window so slow boots aren't killed
Failed deploys are gated and rolled back — not restarted into an infinite loop
Every crash, restart, and recovery in one log view, on servers you control

Frequently asked questions

What does CrashLoopBackOff mean?

Your container starts, exits, and Kubernetes restarts it repeatedly with a growing delay between attempts. The state describes the restart pattern; the actual fault is whatever makes the container exit.

How do I see why the pod is crashing?

Run kubectl logs <pod> --previous to read the crashed container's output, and kubectl describe pod <pod> to see the exit code and events.

What does exit code 137 mean?

137 means the process was killed by SIGKILL (128 + 9). Most often it's OOMKilled — the container hit its memory limit. Raise the limit or fix the memory leak.

Can a health check cause CrashLoopBackOff?

Yes. A liveness probe that starts too early or has too short a timeout will kill an app that's merely slow to boot. Use a startupProbe or a longer initial delay, and gate traffic with a readiness probe instead.