Platform

PostgreSQL for SMB workloads — the patterns that actually matter

Most SMBs don't need sharding, replication, or Kubernetes. They need good indexes, vacuum tuning, and a connection pooler. Here's what we configure on every deployment and what we don't touch until the traffic forces us to.

21 May 20269 min readKrypton Forge Labs

PostgreSQL is the default for nearly every project we ship. The Odoo 18 backends, the n8n workflow staging tables, the Paraslace ERP database, the analytics jobs — all Postgres. And across maybe two dozen SMB deployments, we have learned something useful: the database is almost never the bottleneck at SMB scale, but bad defaults make it one anyway.

This is the checklist we run on every new Postgres deployment. It is not a guide to running at 100,000 QPS. It is a guide to not breaking at 200 QPS because you skipped three settings.

The three things that matter before anything else

Indexes. The most boring and highest-leverage thing. Every SMB database we inherit has the same problem: primary key indexes exist, foreign key indexes don't. The ORM created the PK index automatically. Nobody thought to index the customer_id column on the orders table until the dashboard started timing out.

Our rule: index every foreign key. Index every column you filter or join on. Check pg_stat_user_indexes to find indexes the planner has never used and drop them. The overhead of an unused index on write throughput is small; the overhead on the developer wondering "what does that index do" six months later is larger. But for a database with under 100 GB of data, too many indexes is almost never the problem. Too few is.

Autovacuum. The most common SMB Postgres failure mode we see: the database slows to a crawl, the disk fills up, and nobody knows why. The answer is always autovacuum falling behind on a write-heavy table. The defaults were tuned for 2005-era hardware.

We set these three on every deployment:

# postgresql.conf
autovacuum_max_workers = 3
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.01

The last one matters most. The default scale factor of 0.2 means Postgres won't vacuum a table until 20% of its rows are dead. On a table with 10 million rows, that's 2 million dead rows before anything happens. Set it to 0.01 and autovacuum runs 20x more often but each run is dramatically smaller. The total work is the same. The latency impact is not.

Connection pooling. Every ORM opens more connections than you think. Every SMB app that "just connects directly" hits the max_connections ceiling within a month of real usage. We run PgBouncer in transaction-pooling mode on every deployment. The overhead is negligible — a few megabytes of RAM and some config lines. The alternative is a production outage at 11 PM because the nightly batch job opened 200 connections alongside the 50 the app was using.

# pgbouncer.ini
[databases]
main = host=localhost port=5432 dbname=main

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = scram-sha-256
pool_mode = transaction
default_pool_size = 20
max_client_conn = 200

The app connects to port 6432, PgBouncer maintains a pool of 20 actual Postgres connections, and 200 clients can share them. This single binary has prevented more production fires than every other piece of infrastructure we run.

What we don't do until we have to

Replication. None of our SMB clients need it at launch. One server, one Postgres instance, nightly pg_dump backups. Replication adds complexity, cost, and failure modes that are not justified by the uptime requirements of a 35-person textile manufacturer or a small SaaS in its first year.

Sharding. If you're reading this and thinking about sharding your SMB database, you are solving a problem you do not have. Partitioning by date range on a large log table, maybe. Sharding, no.

Kubernetes. Postgres on Kubernetes works. It also requires an operator, persistent volume management, backup integration, and someone who understands the entire stack. For a team of one to three engineers, a single VPS with Postgres installed via apt is faster, cheaper, and dramatically easier to debug.

The query tuning that earns its keep

EXPLAIN ANALYZE is the only query tuning tool that tells the truth. Not EXPLAIN. EXPLAIN ANALYZE. It actually runs the query and reports real timings.

EXPLAIN (ANALYZE, BUFFERS) SELECT
  o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > NOW() - INTERVAL '30 days'
ORDER BY o.created_at DESC;

The BUFFERS option tells you whether the query hit disk or memory. If shared hit is near zero and read is high, your working set doesn't fit in shared_buffers or the OS page cache. That's actionable.

The slow query log is free monitoring. Enable it:

log_min_duration_statement = 500
log_statement = 'none'

Every week, sort the log by duration, pick the top three queries, and explain-analyze them. This one habit finds more performance problems than any monitoring dashboard we've ever bought.

The backup strategy that works

For SMB databases, the backup strategy is pg_dump, compressed, daily, uploaded off-server. That's it.

#!/bin/bash
pg_dump -Fc main > /backups/main_$(date +%Y%m%d).dump
# Then rclone or scp to an off-server location

WAL archiving and point-in-time recovery are better. They are also more complex, require more disk, and fail silently in ways that pg_dump does not. For databases under 50 GB, pg_dump is the right tradeoff.

The honest limits

We have not run Postgres at FAANG scale. Our largest deployment is about 200 GB of data and a few hundred queries per second. Everything in this post has been tested at that scale and below. If you are running 20 TB and 10,000 QPS, you have problems we haven't had yet. But if you are running an SMB or an early-stage SaaS on a single VPS, the checklist above will keep your database boring. Boring is good.

Postgres is not the bottleneck. Bad defaults, missing indexes, and connection exhaustion are. Fix those three and Postgres will quietly do its job for years.