When we started using DuckDB more heavily at Altertable, we quickly discovered the NetQuack (community-maintained) extension for URL parsing. While functional, its performance was... disappointing. Extracting domains from URLs was taking 37 seconds for 30,000 URLs. For a database that prides itself on analytical performance, this felt wrong.
After digging into the code, the culprits were obvious: heavy reliance on std::regex for parsing and database queries to validate TLDs against the Public Suffix List. Inspired by how ClickHouse handles URL parsing, we rewrote the entire extension. This was one of 17 upstream contributions we made in 90 days. The results? Up to 4000x performance improvements across all functions.
What Was Wrong
The original NetQuack implementation had two major performance killers. First, every URL component extraction used std::regex:
// Original approach - slow regex parsingstd::regex url_regex("^(https?|ftp)://([^/]+)(.*)$");std::smatch matches;if (std::regex_match(url, matches, url_regex)) {return matches[2].str(); // Extract host}
Second, TLD extraction required querying a database table populated from the Mozilla Public Suffix List:
// Find the longest matching public suffixstd::string public_suffix;int public_suffix_index = -1;for (size_t j = 0; j < parts.size (); j++) {// Build the candidate suffixstd::string candidate;for (size_t k = j; k < parts.size (); k++) {candidate += (k == j ? "" : ".") + parts[k];}// Query the public suffix listauto query = "SELECT 1 FROM public_suffix_list WHERE " \"suffix = '" + candidate + "'";auto query_result = con.Query (query);if (query_result->RowCount () > 0) {public_suffix = candidate;public_suffix_index = j;break;}}
This meant every domain extraction triggered multiple database queries. For analytical workloads processing thousands of URLs, this was a disaster.
The ClickHouse-Inspired Solution
We completely rewrote the extension using character-by-character parsing instead of regex:
inline std::string_view getURLHost(const char* data, size_t size) {Pos pos = data;Pos end = data + size;// Skip protocol if presentif (size >= 2 && *pos == '/' && *(pos + 1) == '/') {pos += 2;} else {// Parse scheme://host patternfor (++pos; pos < end; ++pos) {if (*pos == ':' && *(pos + 1) == '/' && *(pos + 2) == '/') {pos += 3;break;}}}// Extract host until delimiterPos start_of_host = pos;for (; pos < end; ++pos) {switch (*pos) {case ':': case '/': case '?': case '#':return std::string_view(start_of_host, pos - start_of_host);}}return std::string_view(start_of_host, pos - start_of_host);}
This approach uses std::string_view for zero-copy operations, processes one character at a time with simple switch statements, and avoids regex compilation overhead entirely.
Then, the bigger win was eliminating database dependencies. We use gperf to generate a perfect hash function from Mozilla's Public Suffix List:
#!/usr/bin/env bash# Download latest PSLwget -O public_suffix_list.dat https://publicsuffix.org/list/public_suffix_list.dat# Extract TLDs and generate gperf inputgrep -v "^//" public_suffix_list.dat |grep -v "^$" |grep -v "^\!" |grep -v "^\*" |sort -u > src/utils/tld_lookup.gperf# Generate perfect hash functiongperf src/utils/tld_lookup.gperf > src/utils/tld_lookup_generated.hpp
This creates a collision-free hash function compiled directly into the extension binary. Now TLD lookups are O(1) memory operations instead of database queries:
The perfect hash table adds only ~500KB to the binary size but eliminates runtime PSL downloads, database table storage, query execution overhead, and memory allocations for temporary strings.
// O(1) TLD lookup with zero collisionsbool isValidTLD(const char* str, size_t len) {return TLDLookupHash::isValidTLD(str, len) != nullptr;}std::string getEffectiveTLD(const std::string& hostname) {// Try all possible suffixes, find longest matchstd::string longest_tld;for (size_t pos = 0; pos < hostname.length(); ++pos) {if (hostname[pos] == '.') {std::string candidate = hostname.substr(pos + 1);if (isValidTLD(candidate) && candidate.length() > longest_tld.length()) {longest_tld = candidate;}}}return longest_tld.empty() ? getLastPart(hostname) : longest_tld;}
The Results
We benchmarked both versions against 30,000 URLs. The improvements were staggering:
| Function | Before | After | Improvement |
|---|---|---|---|
extract_schema | 0.069s | 0.010s | 6.9x faster |
extract_host | 0.086s | 0.008s | 10.7x faster |
extract_port | 0.533s | 0.009s | 59.2x faster |
extract_path | 0.127s | 0.008s | 15.8x faster |
extract_query_string | 0.202s | 0.004s | 50.5x faster |
extract_domain | 37.503s | 0.012s | 3125.2x faster |
extract_subdomain | 35.831s | 0.010s | 3583.1x faster |
extract_tld | 36.003s | 0.009s | 4000.3x faster |
extract_extension | 0.329s | 0.007s | 47.0x faster |
The TLD-dependent functions saw the most dramatic improvements because they eliminated both regex parsing AND database queries. The architecture change was fundamental:
- Before:
URL Input → Regex Parsing → Database Query → Result - After:
URL Input → Character Parsing → Perfect Hash Lookup → Result
The old implementation had O(n) regex compilation per URL plus O(log n) database queries with disk I/O. The new implementation has O(n) single-pass character scanning with O(1) perfect hash lookups in memory.
And despite the complete rewrite, all existing tests pass and edge cases are handled correctly.
Key Takeaways
Profile before optimizing. The 37-second domain extraction time immediately pointed to the real bottlenecks. Always measure first.
Eliminate I/O in hot paths. Database queries in URL parsing functions were the primary performance killer. Moving to compile-time data structures eliminated this entirely.
Character parsing beats regex for simple patterns. For structured data like URLs, character-by-character parsing is often faster and more predictable than regex.
Perfect hashing works great for static data. When you have a known, static dataset (like TLDs), perfect hash functions provide optimal lookup performance.
Zero-copy string operations matter. Using std::string_view eliminated unnecessary string copying, especially important for analytical workloads.
Try It Yourself
The optimized NetQuack extension is currently under review in Pull Request #13. While we wait for the maintainer to review and hopefully merge these changes, you can:
-
Test the optimized version by building from our branch:
git clone https://github.com/hatamiarash7/duckdb-netquack.gitcd duckdb-netquackgit checkout altertable-ai/su/optimGEN=ninja make -
Run the benchmarks yourself to see the performance improvements:
./scripts/benchmark.sh -
Check out the source code on GitHub to see the detailed implementation.
This optimization journey reminded us why performance matters in analytical databases. When you're processing millions of URLs, the difference between 37 seconds and 0.012 seconds isn't just a nice-to-have—it's the difference between interactive analysis and waiting for coffee to brew.
Sometimes the best optimization is throwing out the old approach entirely and starting fresh with the right architectural principles. In this case, ClickHouse showed us the way.





