What is Little's Law?

Little's Law says the average number of requests being handled at once (concurrency) equals the arrival rate times the average time each request takes: L = lambda x W. So if you serve 200 requests per second and each takes 100 milliseconds, you have about 200 x 0.1 = 20 requests in flight at any moment. That concurrency is what your servers must be able to handle simultaneously - it is the number that actually sizes your backend, more than raw requests per second.

Why provision for less than 100% utilization?

Because traffic is bursty and latency rises sharply as a system approaches saturation - queues form, response times spike, and a small surge tips a fully-loaded system over. Sizing to run at around 60-70% of capacity leaves headroom to absorb bursts and a failed instance without falling over. The calculator applies your chosen target utilization so the recommended capacity has that margin built in.

How does concurrency map to workers or instances?

It depends on your concurrency model. A synchronous worker (one request at a time) handles one unit of concurrency, so you need roughly as many workers as your required concurrency. An async runtime or a threaded server handles many concurrent requests per process, so you need far fewer processes. Set the per-worker concurrency to match your stack, and the tool converts required concurrency into workers and then instances.

Free tool · Runs in your browser

How much capacity do you need?

Requests per second doesn’t size your backend — concurrency does. Enter your peak traffic and response time and get the concurrency required (Little’s Law), the workers and instances to provision, and the headroom to survive a burst.

Peak requests / second

Average response timemilliseconds

Concurrency per worker1 = synchronous; higher for async/threaded

Workers per instance

Target utilizationrun at this % of capacity (headroom)

The number that actually sizes your backend

Teams provision by requests per second and then get surprised when a perfectly adequate-looking server falls over — because the thing that exhausts a server is not the request rate, it’s how many requests are in flight at once, which is the rate multiplied by how long each one takes. A slow endpoint at modest traffic can need far more capacity than a fast one at high traffic. Little’s Law makes that concrete, and headroom turns a fragile fully-loaded estimate into one that survives a Monday-morning spike.

Sizing is the easy, static part. Knowing your real concurrency and latency as they shift, and adding or shedding capacity before users feel it, is the ongoing version — the kind of live capacity awareness a control plane gives you over the infrastructure you own.

Related free tools

Production-ready checkerScore your deploy Deploy error decoderDecode any deploy error CORS config generatorFix CORS the right way DATABASE_URL builderBuild or decode a conn string

All free tools →

Size it. Then watch it move.

Infraveil tracks the real load on your backend across the hosts you own — concurrency, latency, saturation — so capacity decisions are based on what is actually happening, not a back-of-envelope guess.

See how it works

Get the capacity-planning playbook

Concurrency, headroom, and scaling a backend you run yourself. No spam.