Ship Today, Scale Tomorrow #1: Different Servers for Different Jobs

1 minute read

Working with early-stage CTOs, I see two traps repeatedly: over-engineering that steals time today, or decisions that choke growth tomorrow. I wanted to share lessons from the field.

“We already have a Python server for the API, so we figured we’d let it do the AI model training too.”

This explained why their API server had grown from a tiny VM into three always-on GPU servers. API requests failed competing with training jobs. Training runs failed competing with each other. All to handle periodic bursts of heavy workloads.

People can think “this is the server we have” - you already provisioned this server, so it’s easy to add more work to it. But one of the great things about the cloud is you can spin up different machines for different purposes, and only pay for what you use.

We moved training to a batch system that spun up GPU machines only when needed, and as many as needed. The API dropped back to one tiny server. No conflicts, lower costs, reliable performance.

Back-office work (batch jobs, training, reports) comes in bursts and tolerates delays. Front-office work (like user-facing APIs) needs speed and consistency. When they mix, they interfere with each other. When you’re small, one server is probably fine. When you start feeling resource contention, it’s time to split.

Nitzan

Ship Today, Scale Tomorrow #1: Different Servers for Different Jobs

You May Also Enjoy

Getting a Git remote from a Sapling repo

Flashing ESPHome with Docker

Tapestry

Introducing ESLint to your codebase smoothly