SEPTEMBER 23, 2025

5 MIN READ

SYLVAIN UTARD

Upstreaming with AI

Upstreaming with AI

How we contributed 17 upstream PRs in 90 days—where AI accelerated our workflow, what we learned, and practical tips for open source success with AI assistance.

Listen to this article (Gen-AI)

0:00
3:44
Blog

Over the last ~90 days we opened 17 upstream PRs across DuckDB, Trino, ClickHouse, DataFusion, dbt, OpenTelemetry, and MCP tooling—in Rust, Ruby, Python, C++, SQL, and Helm/Kubernetes. AI helped with orientation, spec inference, and scaffolding; humans made design calls and ensured correctness. Links and diffs below; practical notes at the end: tests first, small diffs, disclose AI.

By “upstreaming,” we mean contributing code, tests, and docs back to the original projects we depend on so improvements land at the source and flow downstream to everyone.

Why upstreaming is hard

In practice, here are the frictions you hit before the first line of code lands upstream:

  • Architecture and invariants. Mature projects encode years of decisions: performance budgets, async/threading models, memory ownership, error semantics, and backward-compat promises. Those contracts aren’t in one file.
  • Codebase shape and conventions. Module boundaries, naming, feature flags, build targets, and release workflows vary widely—especially across monorepos and plugin ecosystems.
  • Language and tooling drift. You’ll often touch stacks you don’t use daily (C++, Python, Helm). Editions change, linters disagree, and safety fences differ.
  • Local repro and datasets. Getting a faithful environment, fixtures, or golden files is nontrivial; cross-platform quirks hide bugs.
  • Tests and CI. Matrix builds, flaky integrations, and golden snapshots fail in surprising ways. Knowing what to update vs. what to investigate is a skill.
  • Review process. PR/MR templates, labels, changelogs, and release notes all matter—and maintainers’ time is scarce.
  • Docs gap. READMEs drift; the best knowledge often lives in issues, RFCs, and commit history.

This isn’t a gripe—it’s the reality that gives open source its edge: hard-won architecture contracts and invariants, divergent codebase conventions, cross-language/tooling drift, the grind of reproducible environments and datasets, tests and CI that bite, scarce reviewer bandwidth, and docs that lag. In that context you’re working with real constraints, serving real users, and making deliberate tradeoffs.

Where AI helped (and where it didn't)

Here’s how we use AI to reduce friction, while keeping engineers in the driver’s seat:

  • Error to entry point. From an error code, ask Cursor where to look first.
  • Stack trace triage. From a GDB stack trace, propose likely root cause or where to instrument.
  • Behavior to diff. From an example of new behavior, identify where to apply the patch with minimal blast radius.
  • Plugin by analogy. From another plugin, outline how to create a new one to solve X or Y.

Humans still own approach, correctness, and tradeoffs. AI compresses time to context and draft time, and it doesn’t make design decisions for us.

Where it didn't

Here are a few places where AI fell short or required careful human oversight in our workflow:

  • Test deletion over fixes. In an early experimentation rewriting lkml from Python to Ruby, AI agents "made CI green" by removing failing specs instead of repairing parser/serializer semantics. The original library: joshtemple/lkml. Our Ruby experimentation: altertable-ai/lkml.
  • Hallucinations without grounding. We saw frequent invented APIs/flags in unfamiliar stacks. Plugging the exact docs/spec page into context (so the AI can use RAG) materially reduced hallucinations and improved first-try correctness.

What we shipped

Here's a breakdown of our contributions across different projects and technologies, organized by category.

DuckDB and extensions

Trino clients

Modeling and parsing

Infrastructure

Observability

AI

What to keep in mind

  • Quality matters most. The feedback that counts is “tests pass, edge cases covered.” AI can compress time-to-context, but correctness and clarity remain human work.
  • Prefer small diffs. Keep the blast radius bounded; land surgical changes with good tests and context.
  • Cross-language lift. Moving between Ruby, Rust, C++, and Helm is feasible when AI translates conventions and you own the decisions. It saves time on syntax and tooling, leaving room for real design choices.
  • Be transparent. Disclosure helps maintainers calibrate reviews and builds trust around provenance. In August, Mitchell Hashimoto introduced a simple rule: if AI helped you make a PR, say so. We adopt that in practice: a short “AI assistance” note in PRs (what was generated, what was reviewed by a human, links to prompts if useful).
  • Upstream early. Small, surgical fixes compound across every downstream that pulls them. If we hit a papercut or spot an opportunity to unstick others, we upstream the fix. It’s faster long-term and makes the ecosystem better.

To everyone who reviewed, merged, or debated with us: THANK YOU. The stack is better because you care.

Share

Sylvain Utard, Co-Founder & CEO at Altertable

Sylvain Utard

Co-Founder & CEO

Seasoned leader in B2B SaaS and B2C. Scaled 100+ teams at Algolia (1st hire) & Sorare. Passionate about data, performance and productivity.

Stay Updated

Get the latest insights on data, AI, and modern infrastructure delivered to your inbox

Related Articles

Continue exploring topics related to this article

From STDIO to OAuth
OCTOBER 21ST, 2025
Sylvain Utard

From STDIO to OAuth

Engineering, Open Source

How MCP evolved from local stdio to OAuth 2.0 for cloud-scale AI, using Dynamic Client Registration for secure agent access.

READ ARTICLE
Under the Hood: Agents
JULY 15TH, 2025
Sylvain Utard

Under the Hood: Agents

AI Agents, Architecture, Engineering

Altertable agents think ahead. Powered by custom lakehouse and MCP, they monitor, investigate, and act on your data autonomously.

READ ARTICLE
From Task Executors to Outcome Owners
JANUARY 28TH, 2026
Kevin Granger

From Task Executors to Outcome Owners

AI Agents, Product, Culture

How AI is transforming data analyst, data engineer, and data scientist roles from task execution to strategic ownership. Learn how data teams are evolving in 2026 and what skills matter most in the AI era.

READ ARTICLE
Upside-Down Architecture
JANUARY 20TH, 2026
Yannick Utard

Upside-Down Architecture

Architecture, Engineering

Most analytics queries scan less than 100MB, yet traditional architectures still assume compute must live in a remote warehouse. We explore a hybrid model where compute moves between our servers and your local machine, powered by DuckDB and open table formats.

READ ARTICLE
Lessons from Search
JANUARY 13TH, 2026
Sylvain Utard

Lessons from Search

Performance, Architecture, Engineering

Real-time analytics systems face the same small-file problem that search engines solved decades ago. DuckLake's new tiered compaction primitives bring battle-tested merge strategies to streaming analytics, making low-latency ingestion sustainable.

READ ARTICLE
Altertable Logo

Wake Up To Insights

Join product, growth, and engineering teams enabling continuous discovery