SEPTEMBER 30, 2025

6 MIN READ

SYLVAIN UTARD

NetQuack 4000x Faster

NetQuack 4000x Faster

We rewrote NetQuack DuckDB extension, replacing regex with character parsing. Result: 4000x faster—37 seconds down to 0.012 seconds.

Listen to this article (Gen-AI)

0:00
3:40
Blog

When we started using DuckDB more heavily at Altertable, we quickly discovered the NetQuack (community-maintained) extension for URL parsing. While functional, its performance was... disappointing. Extracting domains from URLs was taking 37 seconds for 30,000 URLs. For a database that prides itself on analytical performance, this felt wrong.

After digging into the code, the culprits were obvious: heavy reliance on std::regex for parsing and database queries to validate TLDs against the Public Suffix List. Inspired by how ClickHouse handles URL parsing, we rewrote the entire extension. This was one of 17 upstream contributions we made in 90 days. The results? Up to 4000x performance improvements across all functions.

What Was Wrong

The original NetQuack implementation had two major performance killers. First, every URL component extraction used std::regex:

// Original approach - slow regex parsing
std::regex url_regex("^(https?|ftp)://([^/]+)(.*)$");
std::smatch matches;
if (std::regex_match(url, matches, url_regex)) {
return matches[2].str(); // Extract host
}

Second, TLD extraction required querying a database table populated from the Mozilla Public Suffix List:

// Find the longest matching public suffix
std::string public_suffix;
int public_suffix_index = -1;
for (size_t j = 0; j < parts.size (); j++) {
// Build the candidate suffix
std::string candidate;
for (size_t k = j; k < parts.size (); k++) {
candidate += (k == j ? "" : ".") + parts[k];
}
// Query the public suffix list
auto query = "SELECT 1 FROM public_suffix_list WHERE " \
"suffix = '" + candidate + "'";
auto query_result = con.Query (query);
if (query_result->RowCount () > 0) {
public_suffix = candidate;
public_suffix_index = j;
break;
}
}

This meant every domain extraction triggered multiple database queries. For analytical workloads processing thousands of URLs, this was a disaster.

The ClickHouse-Inspired Solution

We completely rewrote the extension using character-by-character parsing instead of regex:

inline std::string_view getURLHost(const char* data, size_t size) {
Pos pos = data;
Pos end = data + size;
// Skip protocol if present
if (size >= 2 && *pos == '/' && *(pos + 1) == '/') {
pos += 2;
} else {
// Parse scheme://host pattern
for (++pos; pos < end; ++pos) {
if (*pos == ':' && *(pos + 1) == '/' && *(pos + 2) == '/') {
pos += 3;
break;
}
}
}
// Extract host until delimiter
Pos start_of_host = pos;
for (; pos < end; ++pos) {
switch (*pos) {
case ':': case '/': case '?': case '#':
return std::string_view(start_of_host, pos - start_of_host);
}
}
return std::string_view(start_of_host, pos - start_of_host);
}

This approach uses std::string_view for zero-copy operations, processes one character at a time with simple switch statements, and avoids regex compilation overhead entirely.

Then, the bigger win was eliminating database dependencies. We use gperf to generate a perfect hash function from Mozilla's Public Suffix List:

#!/usr/bin/env bash
# Download latest PSL
wget -O public_suffix_list.dat https://publicsuffix.org/list/public_suffix_list.dat
# Extract TLDs and generate gperf input
grep -v "^//" public_suffix_list.dat |
grep -v "^$" |
grep -v "^\!" |
grep -v "^\*" |
sort -u > src/utils/tld_lookup.gperf
# Generate perfect hash function
gperf src/utils/tld_lookup.gperf > src/utils/tld_lookup_generated.hpp

This creates a collision-free hash function compiled directly into the extension binary. Now TLD lookups are O(1) memory operations instead of database queries:

The perfect hash table adds only ~500KB to the binary size but eliminates runtime PSL downloads, database table storage, query execution overhead, and memory allocations for temporary strings.

// O(1) TLD lookup with zero collisions
bool isValidTLD(const char* str, size_t len) {
return TLDLookupHash::isValidTLD(str, len) != nullptr;
}
std::string getEffectiveTLD(const std::string& hostname) {
// Try all possible suffixes, find longest match
std::string longest_tld;
for (size_t pos = 0; pos < hostname.length(); ++pos) {
if (hostname[pos] == '.') {
std::string candidate = hostname.substr(pos + 1);
if (isValidTLD(candidate) && candidate.length() > longest_tld.length()) {
longest_tld = candidate;
}
}
}
return longest_tld.empty() ? getLastPart(hostname) : longest_tld;
}

The Results

We benchmarked both versions against 30,000 URLs. The improvements were staggering:

FunctionBeforeAfterImprovement
extract_schema0.069s0.010s6.9x faster
extract_host0.086s0.008s10.7x faster
extract_port0.533s0.009s59.2x faster
extract_path0.127s0.008s15.8x faster
extract_query_string0.202s0.004s50.5x faster
extract_domain37.503s0.012s3125.2x faster
extract_subdomain35.831s0.010s3583.1x faster
extract_tld36.003s0.009s4000.3x faster
extract_extension0.329s0.007s47.0x faster

The TLD-dependent functions saw the most dramatic improvements because they eliminated both regex parsing AND database queries. The architecture change was fundamental:

  • Before: URL Input → Regex Parsing → Database Query → Result
  • After: URL Input → Character Parsing → Perfect Hash Lookup → Result

The old implementation had O(n) regex compilation per URL plus O(log n) database queries with disk I/O. The new implementation has O(n) single-pass character scanning with O(1) perfect hash lookups in memory.

And despite the complete rewrite, all existing tests pass and edge cases are handled correctly.

Key Takeaways

Profile before optimizing. The 37-second domain extraction time immediately pointed to the real bottlenecks. Always measure first.

Eliminate I/O in hot paths. Database queries in URL parsing functions were the primary performance killer. Moving to compile-time data structures eliminated this entirely.

Character parsing beats regex for simple patterns. For structured data like URLs, character-by-character parsing is often faster and more predictable than regex.

Perfect hashing works great for static data. When you have a known, static dataset (like TLDs), perfect hash functions provide optimal lookup performance.

Zero-copy string operations matter. Using std::string_view eliminated unnecessary string copying, especially important for analytical workloads.

Try It Yourself

The optimized NetQuack extension is currently under review in Pull Request #13. While we wait for the maintainer to review and hopefully merge these changes, you can:

  1. Test the optimized version by building from our branch:

    git clone https://github.com/hatamiarash7/duckdb-netquack.git
    cd duckdb-netquack
    git checkout altertable-ai/su/optim
    GEN=ninja make
  2. Run the benchmarks yourself to see the performance improvements:

    ./scripts/benchmark.sh
  3. Check out the source code on GitHub to see the detailed implementation.


This optimization journey reminded us why performance matters in analytical databases. When you're processing millions of URLs, the difference between 37 seconds and 0.012 seconds isn't just a nice-to-have—it's the difference between interactive analysis and waiting for coffee to brew.

Sometimes the best optimization is throwing out the old approach entirely and starting fresh with the right architectural principles. In this case, ClickHouse showed us the way.

Share

Sylvain Utard, Co-Founder & CEO at Altertable

Sylvain Utard

Co-Founder & CEO

Seasoned leader in B2B SaaS and B2C. Scaled 100+ teams at Algolia (1st hire) & Sorare. Passionate about data, performance and productivity.

Stay Updated

Get the latest insights on data, AI, and modern infrastructure delivered to your inbox

Related Articles

Continue exploring topics related to this article

Lessons from Search
JANUARY 13TH, 2026
Sylvain Utard

Lessons from Search

Performance, Architecture, Engineering

Real-time analytics systems face the same small-file problem that search engines solved decades ago. DuckLake's new tiered compaction primitives bring battle-tested merge strategies to streaming analytics, making low-latency ingestion sustainable.

READ ARTICLE
Upside-Down Architecture
JANUARY 20TH, 2026
Yannick Utard

Upside-Down Architecture

Architecture, Engineering

Most analytics queries scan less than 100MB, yet traditional architectures still assume compute must live in a remote warehouse. We explore a hybrid model where compute moves between our servers and your local machine, powered by DuckDB and open table formats.

READ ARTICLE
Stop Batching Analytics
DECEMBER 30TH, 2025
Sylvain Utard

Stop Batching Analytics

Analytics, Architecture, Performance

Why we're forcing analytics through complex batch pipelines when append-only data should work like logs. The warehouse constraint that stopped making sense.

READ ARTICLE
From STDIO to OAuth
OCTOBER 21ST, 2025
Sylvain Utard

From STDIO to OAuth

Engineering, Open Source

How MCP evolved from local stdio to OAuth 2.0 for cloud-scale AI, using Dynamic Client Registration for secure agent access.

READ ARTICLE
Speed Shapes Understanding
OCTOBER 7TH, 2025
Sylvain Utard

Speed Shapes Understanding

Performance, Product

Speed isn't just a luxury: it's the difference between insight and inertia. We've been deep in TPC-H benchmarks, tuning our analytical engine for AI agents.

READ ARTICLE
Altertable Logo

Wake Up To Insights

Join product, growth, and engineering teams enabling continuous discovery