concept

DuckDB vs ClickHouse vs Postgres for Analytics

created 2026-05-27 pattern · database · analytics · decision-framework · duckdb · clickhouse · postgres

DuckDB vs ClickHouse vs Postgres for Analytics

TL;DR

Three answers to “I need to do analytics.” Most teams pick the wrong one — usually ClickHouse, because it sounds impressive. In 2026, the right default for new analytical workloads is roughly: Postgres if you already have one and the data fits, DuckDB if you don’t, ClickHouse only if you’ve measured and proven you need it. If your data fits on one disk in 2026, you probably don’t need a cluster.

The three engines at a glance

AxisPostgresDuckDBClickHouse
ShapeRow-oriented OLTP, OLAP-capableColumnar OLAP, embeddedColumnar OLAP, distributed
DeploymentServerLibrary inside your processServer cluster
ConcurrencyMany writers, many readersOne writer, many readersMany writers, many readers
Scale ceiling~10s of TB on big iron~1 TB single machinePB across cluster
Ingest rate10s of k/s100s of k/sMillions/s
Query latency10–500 mssub-ms to seconds (no network)10–100 ms (network hop)
Ops complexityFamiliarNone (it’s a library)Real (Keeper, replicas, sharding)
LicensePostgreSQLMITApache 2.0
Sweet spotTransactional + moderate analyticsEmbedded analytics, ETL, ad-hocHigh-volume event streams, observability

Postgres for analytics — more often than you think

Most Postgres-already-deployed projects don’t need anything else. Things people don’t realize Postgres can handle:

  • Hundreds of millions of rows for most queries, if you use BRIN indexes on time-series and partitioning on hot tables
  • Materialized views for pre-aggregated rollups (refresh nightly or on-demand)
  • Logical replication to a read-replica dedicated to analytics, so analytics doesn’t fight OLTP
  • Extensions for the gnarly cases: pg_stat_statements, pg_partman, timescaledb (real time-series), citus (sharding when you genuinely need it)

If you’re already running Postgres and someone asks for an “analytics dashboard,” try Postgres first with a materialized view and BRIN index. You may be one CTE away from done.

When Postgres stops being enough

  • You’re scanning >10–50 GB per query routinely (columnar wins big here)
  • You’re ingesting > ~50k rows/s sustained
  • Your queries are mostly aggregates over wide tables → row-oriented storage costs you 10–100× on I/O
  • Mixed OLTP + heavy analytics on the same table is causing lock/vacuum pressure

DuckDB — the embedded analytics dark horse

DuckDB is “SQLite for analytics.” Single file on disk, runs inside your application process, no server, no port, no replication. Columnar storage, vectorized execution, full SQL.

Where DuckDB wins

  • Embedded analytics in apps — your Go/Python/TS service does its own dashboards locally; no separate data warehouse to deploy and pay for
  • ETL and ad-hoc data work — query Parquet/CSV/JSON files directly from disk or S3 without loading them first
  • Single-machine, single-user analytical tools — admin panels, internal reporting, scraped data analysis
  • Notebooks and analyst workflows — pandas/Polars folks reach for DuckDB because it’s just SQL on dataframes with no infra
  • Edge analytics — runs in WASM, can sit in a Cloudflare Worker or a desktop app

Killer features

  • Query Parquet/CSV directly: SELECT * FROM 'data.parquet' WHERE … — no load step
  • Reads partitioned Hive layouts from S3 with pruning
  • Bundles with your app (single binary deploy)
  • Sub-millisecond latency because there’s no network hop

When DuckDB stops being enough

  • More than one process needs to write concurrently (it’s single-writer)
  • You need replication/HA for the analytical store itself
  • You’re past ~1 TB on a single machine
  • You need real-time multi-user concurrent reads at high QPS

ClickHouse — when you actually have the data

ClickHouse is real-time analytical infrastructure. Designed for “we ingest a billion events a day and want sub-second queries over arbitrary slices of last week.”

Where ClickHouse wins

  • Observability and log aggregation — replacing Elastic for traces/metrics/logs
  • Web analytics — its original use case at Yandex
  • Event streams / clickstreams — millions of inserts per second, materialized views aggregating in real time
  • Multi-tenant SaaS analytics — when many tenants query a shared big table
  • Sharded scale — horizontal scaling on commodity hardware

Killer features

  • MergeTree engine family for write-optimized columnar storage
  • Async inserts that batch on the server side (so you can write naïvely from many clients)
  • Materialized views that refresh incrementally on insert
  • ReplicatedMergeTree for HA via Keeper/ZooKeeper
  • Distributed table engine for transparent sharding

What ClickHouse costs you

  • An operational footprint: Keeper, replicas, sharding strategy, schema migrations, version upgrades
  • Eventual consistency between replicas (small window, but it’s there)
  • Update/delete is expensive — designed append-mostly
  • Joins are weaker than Postgres — you denormalize and live with it
  • Schema design is a real skill (sort key, partition key, codec choice all matter for perf)

The decision tree

Do you already run Postgres?
├── Yes
│   └── Does the analytics query you need run in <2s on it
│       with one materialized view + a BRIN index?
│       ├── Yes → 🟢 Stay on Postgres. Done.
│       └── No  → continue ↓

└── Are you building a single-process app/tool/dashboard?
    ├── Yes → 🟢 DuckDB. Single file, no infra, ship the binary.

    └── Are you ingesting >100k rows/s OR data >1TB
        OR many concurrent writers OR multi-tenant analytics?
        ├── Yes → 🟢 ClickHouse. You're now in its territory.

        └── No → 🟢 DuckDB still, or a Postgres read-replica.

Concrete sizing examples

ScenarioRight answer
200 MB cadastre dataset, admin tool, single userDuckDB (this is katastar)
50M rows of user events/day, real-time dashboardsClickHouse
100k rows/day audit log, queried weeklyPostgres — materialized view, done
5 GB CSV from a vendor, one-off analysisDuckDB — query the CSV directly
Multi-tenant SaaS, each tenant has 1B eventsClickHouse
Internal BI dashboard over your Postgres OLTP DBPostgres read-replica, maybe + dbt
Vercel-hosted Next.js app needing embedded analyticsDuckDB (also runs in serverless)
Replacing Elastic for log search at scaleClickHouse

The case study: katastar

Katastar uses DuckDB. ~157 MB of Croatian cadastre data, single-user admin tool, bind-mounted file inside a Go binary, deployed as one container on Coolify.

ClickHouse would have been overkill on every axis:

  • Ops: a cluster (or even a single ClickHouse server with Keeper) for what fits in 157 MB
  • Latency: localhost network hop vs no hop at all
  • Build: separate service, separate Dockerfile, separate deploy lifecycle
  • Concurrency: katastar has one user — single-writer DuckDB is fine

Postgres would have been fine but redundant — there’s nothing else in katastar that needs OLTP, and DuckDB queries the same data with better analytical performance and zero extra infrastructure. The decision was right.

If katastar grew to (a) hundreds of concurrent users querying live, (b) live cadastre updates at high QPS, or (c) cross-municipality joins on billions of rows, then we’d revisit. None of those are remotely true.

⚠️ The common mistakes

🚫 ClickHouse because “we’ll have lots of data eventually”

The most expensive mistake. You’re building infrastructure for a load you don’t have, paying ops cost forever for hypothetical scale. Build for today + 6 months. Re-architect when you measure the need.

🚫 Postgres past its breaking point because “we already have it”

The opposite mistake. You bolt analytics onto your OLTP database, your nightly aggregation takes 4 hours, your dashboard query locks your users table, and you’ve now made both products slower. Move analytics to a read-replica or to DuckDB/ClickHouse when the symptoms appear.

🚫 DuckDB shoved into a multi-user web app

It’s a library, not a database server. If two processes need to write the same DuckDB file, you’ll hit lock errors. Use it for embedded analytics, ETL, single-user tools — not as the backing store for a multi-tenant SaaS.

🚫 Reaching for Snowflake/BigQuery on day one

For a brand-new project: a managed warehouse adds latency, costs more per query than self-hosting at small scale, and locks you in. They’re great at scale; they’re wasteful at prototype size.

✅ Picker checklist

Before reaching for ClickHouse, can you say “yes” to most of these?

  • You’ve measured the actual data size, not the rumored future size
  • You’ve measured the actual ingest rate, not the marketing-hoped one
  • You’ve tried the workload on Postgres with at least one materialized view
  • You have at least one team member who’s run ClickHouse in prod before
  • You’re prepared to design MergeTree sort keys / partition keys deliberately
  • You’ve budgeted for Keeper/replica ops, not just the server itself

If you’re answering “no” to most of these, DuckDB or Postgres, not ClickHouse.

  • when-rag-when-not — same shape of decision, different domain
  • context-windows-and-tokens — “scale-first” mistakes recur across AI infra too
  • (coming) When to reach for managed warehouses (Snowflake / BigQuery / Redshift)