One Billion Rows

At 1 billion rows, every shortcut comes back to collect interest. Here's how we achieved sub-second queries with near-realtime ingestion.

We just crossed 1,000,000,000 rows in a single customer's events table.

It did not go smoothly.

For the past two months in private beta, we pushed one real production workload to a scale where nice architectural diagrams stop mattering and physics starts enforcing rules. At that point, small inefficiencies compound, metadata balloons, caches thrash, and comfortable assumptions collapse. We knew it would at some point, of course. But there's a difference between anticipating scale and living through it.

Today that table holds:

1B rows
1.6TB of raw data
33 columns per row (30 VARCHAR with low to medium cardinality, 1 TIMESTAMP, 2 very-large JSON)
Near-realtime ingestion
Continuous production traffic

Queries like this run in about a second:

1
2
3
4
SELECT event, COUNT(*)
FROM events
GROUP BY 1
ORDER BY 2 DESC;

So does this:

1
2
3
4
SELECT uuid
FROM events
ORDER BY timestamp DESC
LIMIT 10;

And even this (mixed filters, JSON extraction, and a GROUP BY):

1
2
3
4
5
6
7
8
9
SELECT
  date_trunc('day', timestamp),
  bigjson_extract(properties, '$.my_attr'),
  COUNT(*)
FROM events
WHERE event = 'My Event'
  AND timestamp > NOW() - interval 1 month
GROUP BY 1, 2
ORDER BY 1, 2;

Most of the time, it's sub-second.

Getting there broke almost everything.

When Scale Gets Real

At 10M rows, everything looks fine. At 100M, you start noticing patterns. At 1B, every shortcut you ever took comes back to collect interest.

We experimented with:

Partitioning strategies that looked great on paper, and terrible in practice
Different lakehouse sort orders
Multiple Parquet compression algorithms
Compaction heuristics to fight the small-file problem
Reimplementing a BIGJSON field to avoid naïve JSON extraction costs
Different caching strategies and eviction configurations
Scaling underlying machines and memory envelopes
Contributing upstream improvements to DuckDB and DuckLake (eg. #668)

In a lakehouse architecture, nothing is isolated: storage layout, compression, sort order, caching, and vectorized execution are tightly coupled. Every layer mattered, and every "small" choice had a way of surfacing later as either a bill, a latency spike, or a new operational headache.

You don't feel that at 10 million rows; you absolutely feel it at a billion.

The Compression × Sort Order Surprise

One of the most counterintuitive lessons was how tightly compression and sort order are bound together.

Most teams think of compression as a storage concern (something that affects cost per GB. In a lakehouse, compression is a performance primitive.

Sort order influences:

How well Parquet encodes values
How effective run-length and dictionary encoding become
How much data (and how many files) must be scanned for common filters
How vectorized execution behaves in DuckDB
How much I/O your system actually performs

Better sort order → better compression patterns → less I/O → faster queries → lower cost.

At scale, this isn't marginal. It's structural.

We ended up tuning sort strategies not just for filtering efficiency, but for compression behavior under realistic mixed workloads: time-based scans, GROUP BY event, JSON extraction on properties, and recent-first queries.

Every change cascaded. We found that optimizing for a constrained environment (bounded hardware) forces a better architecture than simply scaling up resources. When you solve for a specific memory or CPU envelope, the gains are structural; and because they fix the underlying physics, they benefit the entire system. Starting with maximum resources often masks fundamental flaws; optimizing for constraints forces you to fix them. Those gains then benefit everything.

Keeping Data Awake

The harder problem wasn't just performance; it was economics.

Running sub-second queries on a billion-row table is easy if you're willing to burn warehouse credits, overprovision clusters, and accept unpredictable bills.

But when queries cost money, people stop asking them.

And when people stop asking questions, data goes back to sleep.

This table isn't archived. It isn't cold storage. It isn't "optimized for a BI refresh every 6 hours."

It is active: