neurohab

— about the project

We got tired of waiting for cold starts.

last updated · march 2026

neurohab started as an internal tool. We were building a research product that needed to call the same fine-tuned 13B model thousands of times an hour, with predictable tail latency. Existing inference platforms either cold-started our weights every few minutes, queued us behind larger tenants, or required us to operate the GPU fleet ourselves. None of those were good options for a small team trying to ship.

So we built a thin runtime: a single-tenant engine that pins your model to dedicated GPUs and stays warm. The control plane is small on purpose. The observability is built in because we needed it for ourselves first.

who's behind it

Two engineers — one with a background in distributed systems, one in ML infrastructure. Previously at research labs in the US and Japan. Operating out of Tokyo and Berlin. We are not raising; we are not hiring yet.

where we are

Cohort 04 is open. We onboard tenants slowly because we still hand-tune the autoscaler thresholds for each model family. By cohort 06 we expect this to be generic. The hosted control plane will stay closed for now; the runtime itself will be source-available later this year, under a non-compete license — the details are still being worked out.

contact

hello@neurohab.io — for invites, billing, anything else.
security@neurohab.io — for vulnerability reports, GPG key on request.