A Postgres lake beats your data warehouse, for the things that matter.

Most enterprise data problems are operational, not analytical. Postgres handles them. Add the warehouse only when a specific workload proves it has to.

JFJames FinneganCo-founder · written from the build

There's a moment in most enterprise architecture conversations where someone says we'll need a warehouse. It usually happens around minute thirty. By minute forty-five, someone's named Snowflake or BigQuery and someone else is sketching a Fivetran pipeline on the whiteboard.

Most of the time, this is the wrong move. Or rather: it's the wrong first move.

The default position we've come around to: Postgres as the substrate for everything operational. Customer data, transaction history, audit ledgers, AI-generated content, agent state, job queues, configuration. Postgres holds it all. Other systems (ClickHouse, Redis, Meilisearch) earn their way in only when a specific workload proves Postgres can't serve it.

It almost always serves it. And when it doesn't, you know exactly why, instead of guessing.

What Postgres-as-lake actually means

It's not a clever new product. It's a pattern. Three rules:

Raw JSONB before transformation. Every external payload (every API response, every webhook, every uploaded file) gets stored as raw JSONB in an immutable raw layer before any transformation runs. This is the load-bearing decision. When the schema needs to evolve six months later, you don't have to re-fetch from the source API and burn credits twice. The raw is already there. You just write a new transformer.

Append-only audit ledgers. Compliance entries, state transitions, agent actions, scoring changes are all immutable. INSERT-only, never UPDATE, never DELETE. Postgres handles this fine, with partitioning once the table grows large.

Row-level security on every multi-tenant table. If you're building anything that serves more than one customer, RLS isn't optional. You write the policies once, and every query (every ORM call, every analyst's ad-hoc SQL, every poorly-written report someone spins up two years from now) is automatically filtered. The cost of not doing this and discovering a tenant leak is much higher than the cost of writing the policies.

That's it. That's the lake. It runs on standard Postgres, and on Aurora it scales further than most teams will ever push it.

When you actually need a warehouse

Two cases.

The analytical workload is breaking transactional latency. Heavy aggregation queries can push your P95 transactional latency past where the dashboard feels snappy. When that happens, an analytical column store fed by change-data-capture from Postgres earns its keep. Analytics queries hit the column store. Transactional writes still go to Postgres. The warehouse becomes a cache for analytics, not a source of truth.

The data has to leave the operational system. If finance or BI or a downstream partner needs the data in a different shape, on a different schedule, in a different region, sure, ETL it into a warehouse. But you're doing this for them, not for the application. The application keeps reading from Postgres.

That's the whole list. Most teams don't actually have either of those problems. They have a warehouse anyway, because the architecture diagram had a warehouse on it from the first slide.

Why this matters more for AI systems

Because AI generates more data than your old apps did. Every agent run produces inputs, intermediate state, tool calls, tool results, final outputs, cost telemetry, latency telemetry. A real autonomous system writes thousands of structured records per day per agent. Multiply by a small fleet and you're at millions of rows a month. Ninety percent of what you do with them is operational lookup, not aggregation.

Operational lookup is what Postgres is for. Indexes, JSONB containment queries, partial indexes on hot subsets, materialized views for the few aggregations that matter. There's almost never a point where the right answer is rip out Postgres and put a warehouse in. The right answer is almost always tune that index, or add a read replica, or spin up a small ClickHouse for this specific analytical pattern.

If you've followed the pgvector conversation over the last year, the same logic applies a level deeper. Vector search inside Postgres is now competitive with most dedicated vector databases for the workloads most teams have. The stack collapses. The escape hatch stays open. And the AI features your team builds around pgvector (semantic search, retrieval, recommendations) share governance, backups, and access policies with the rest of the operational system, instead of hiding in a side database with its own rules.

The escape hatch matters

The other thing about Postgres-as-lake: the data is never trapped. Standard Postgres, standard SQL, standard backup tools. No proprietary warehouse, no vendor-specific dialect, no contract that holds the data hostage. Operational systems that lock data into a vendor's analytical engine end up paying twice: once for the platform, and once for the eventual migration. Operational systems built on Postgres pay once.

For the things that matter (the things your team queries every day to do their actual jobs) that math compounds in your favor.

— James

DataPostgres

Straterai Field Notes

Plain-English writing on building AI-native systems — how agents actually work, where they fail, and what we learn shipping them for real companies.

No spam. A couple of emails a month. Unsubscribe anytime.