Deploy Error Decoder

SIGTERM & graceful shutdown — stop dropping requests on deploy

Quick answer: SIGTERM is the polite "please stop" signal your process receives on every deploy, restart, or scale-down. Catch it, stop accepting new connections, let in-flight requests finish, close your database and queue connections, then exit. If you ignore it, the platform waits a grace period and sends SIGKILL — killing the process mid-request and dropping live traffic.

What's actually happening on deploy

When an orchestrator (Kubernetes, systemd, Docker, your PaaS) stops your app, it doesn't pull the plug. It sends SIGTERM and starts a countdown — typically 30 seconds. Two things can happen:

You handle it

You stop taking new requests, finish the ones in flight, close connections, and exit 0. No user notices the deploy.

You ignore it

The countdown expires, the platform sends SIGKILL (exit code 137), and any request still running is severed mid-flight.

An app that exits cleanly on SIGTERM shows exit code 143 (128 + 15). That's the goal.

What a graceful shutdown must do

Stop accepting new work

Close the listening socket so the load balancer stops routing new requests to this instance.

Drain in-flight requests

Let requests already being processed finish before you exit — don't cut them off.

Close connections & flush

End database pools, message-queue consumers, and flush logs/metrics so nothing is left half-written.

Have a timeout fallback

If draining hangs, force-exit before the platform's SIGKILL does it for you, so you control the outcome.

A correct handler (Node / Express)

const server = app.listen(process.env.PORT || 3000);

function shutdown(signal) {
  console.log(`${signal} received — draining`);
  server.close(async () => {        // 1. stop accepting, wait for in-flight
    await db.end();                 // 2. close DB pool
    await queue.close();            // 3. close consumers
    process.exit(0);                // 4. clean exit (143 overall)
  });
  // 5. fallback: don't hang forever
  setTimeout(() => process.exit(1), 10_000).unref();
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));   // Ctrl-C in dev

The same shape applies everywhere: trap the signal, stop intake, drain, close, exit — with a hard timeout so a stuck request can't keep the process alive past the grace period.

Common mistakes

Why "I handle SIGTERM" still drops requests

  • PID 1 in Docker doesn't forward signals. If your container starts the app via a shell (sh -c "node ..."), the shell is PID 1 and may swallow SIGTERM. Use the exec form or an init like tini.
  • The event loop is blocked. A synchronous task means your handler never runs before SIGKILL arrives.
  • Keep-alive connections aren't closed. server.close() waits on idle keep-alive sockets; set a keepAliveTimeout or close them explicitly.
  • No readiness flip. Mark the instance "not ready" first so the load balancer drains it before you stop the server.
How Infraveil handles this

Zero-drop deploys, handled by the agent

Infraveil is a backend operations control plane that runs on your own servers. Its agent owns the start/stop lifecycle: on every deploy it stops sending traffic to the old instance, lets it drain, and only then brings it down — so requests aren't severed mid-flight. The whole rollout pauses for your approval and is recorded, with one-click rollback if the new version misbehaves.

Connections drain on the old instance before it's stopped — no SIGKILL surprises
Health-gated cutover means traffic only moves to a ready instance
Every deploy and restart is approval-gated and recorded with one-click rollback

Frequently asked questions

What is SIGTERM?

SIGTERM (signal 15) is the standard "terminate gracefully" request sent to a process on deploy, restart, or scale-down. Unlike SIGKILL, it can be caught and handled, giving your app a chance to shut down cleanly.

What's the difference between SIGTERM and SIGKILL?

SIGTERM politely asks the process to stop and can be handled. SIGKILL (signal 9) cannot be caught or ignored — the kernel kills the process immediately. Platforms send SIGTERM first, then SIGKILL if you don't exit in time.

Why do requests get dropped on deploy?

Because the app was stopped while still serving traffic. Either it didn't handle SIGTERM, or it didn't drain in-flight requests before exiting, so the platform's SIGKILL cut them off.

What is exit code 143?

143 = 128 + 15, meaning the process exited due to SIGTERM. Seeing 143 on shutdown is a good sign — it means your app responded to the signal rather than being force-killed (137).