JULY 1, 2025

6 MIN READ

SYLVAIN UTARD

Semantic Layers in 2025

Semantic Layers in 2025

We tested four semantic layer approaches: Looker, Omni, Cube.dev, and dbt MetricFlow. Here's what we learned about each.

Listen to this article (Gen-AI)

0:00
4:03
Blog

What is a semantic layer and why should you care?

Picture the raw tables in your data warehouse: cryptic column names (like user_id, plan_code, ts_event) and a gauntlet of SQL joins needed just to answer “What was weekly retention for Pro customers?”. Without a semantic layer, you often tackle this by writing complex SQL or creating dedicated summary tables (essentially baking business logic into the data's physical state). A semantic layer, by contrast, is a thin, declarative map that turns those columns into high-level business concepts:

  • Typed & formatted dimensions: for example, created_at is recognized as a timestamp; charts automatically pick appropriate time grains and calendar settings, and label axes accordingly.
  • Reusable metrics: for example, total_revenue is defined once (as a sum in USD), and every analysis or dashboard uses that single source of truth for revenue.
  • Explicit relationships: for example, ordersusers is defined as a relationship, so queries can automatically generate the correct JOINs every time.

The payoff: you (and AI agents) can explore data at a conceptual level. No more repetitive SQL; you get cleaner lineage and faster, more trustworthy insights.

Example

For example, below is a toy semantic model definition (in a LookML-style YAML syntax) illustrating one dimension and one measure:

dimension: signup_date {
sql: ${TABLE}.created_at ;;
type: time
timeframes: [day, week, month]
}
measure: total_revenue {
sql: ${TABLE}.amount ;;
type: sum
value_format_name: usd
}

With that defined, any tool that understands the semantic layer can instantly plot Total Revenue by Signup Month with axes correctly labeled, currency formatted, and drill-down options intact.

Why AI-Native Agents Need a Semantic Layer

When we talk about AI agents in data, we're really talking about small pieces of software that read, reason, and act on our behalf. They can spot patterns at lightspeed, but only if they can trust the meaning of every table, metric, and event they touch. That trust comes from a shared semantic layer sitting between raw storage and every agent-powered workflow.

What the semantic layer gives youWhy agents need it
One canonical vocabulary. Every metric (LTV), entity (workspace vs. org), and relationship lives in a single, version-controlled catalog.Prevents agents from inventing slightly different definitions that break cross-team comparisons.
Reusable business logic. Transformations such as “net revenue” or cohort rules are written once and inherited everywhere.Lets agents chain reasoning steps without re-implementing SQL or asking humans to reconcile logic each time.
Context & lineage. Each column or measure carries rich metadata: owner, source system, calculation history, privacy flags.Gives agents the backdrop they need to explain why a recommendation is safe, compliant, and interpretable.

Without a semantic layer, AI agents are clever interns rifling through a chaotic file cabinet; with one, they’re seasoned analysts working from a living playbook. If we want insights to find us before we ask, we have to give the agents a language and that language is the semantic layer.

Four approaches, hands-on

Below we summarize our hands-on experience with four different semantic layer solutions. We spun up each product on the same database to keep the comparison consistent. Here's how they fared:

Looker (Google Cloud)

Looker is a full-stack BI platform built around LookML, its Git-versioned modeling language for defining data models.

view: order_items {
sql_table_name: analytics.order_items ;;
dimension: order_id {
primary_key: yes
sql: ${TABLE}.order_id ;;
}
measure: total_revenue {
type: sum
sql: ${TABLE}.price ;;
value_format_name: usd
}
}

Why we liked it

  • Warehouse-native execution; no additional server needed (metrics are computed in your existing data warehouse).
  • One-stop shop: modeling, visualization, scheduling, and embedding are all available in one platform.
  • AI-assisted modeling & querying: Gemini-powered LookML and visualization assistants help build models and enable natural language queries (NLQ).

Where it strained

  • Proprietary LookML.
  • High cost for large teams (per-seat pricing can climb quickly as adoption spreads).
  • Limited reuse outside Looker.

Omni

Omni is a newer analytics workspace that blends a notebook-style SQL experience with a governed semantic layer.

fields:
- name: created_date
sql: ${TABLE}.created_at
data_type: date
- name: total_revenue
sql: ${TABLE}.price
aggregate_type: sum

Why we liked it

  • Rapid iteration: blend of spreadsheet & SQL in one UI, then round-trip definitions to dbt.
  • Tight dbt integration.
  • Built-in AI assistants that respect the defined layer.

Where it strained

  • Pricing not publicly disclosed. Expect an opaque SaaS subscription, on top of your warehouse usage.
  • Mixed learning curve. Some users appreciate the familiar Excel-style interface, others find the blend of workbooks and data models initially confusing.

Cube.dev

Cube.dev is an open-source, headless semantic layer that exposes metrics via REST, GraphQL, and SQL APIs, with optional pre-aggregations for speed.

cube(`Orders`, {
sql: `SELECT * FROM public.orders`,
measures: {
count: { type: `count` },
totalRevenue: { sql: `amount`, type: `sum` },
},
dimensions: {
id: { sql: `id`, type: `number`, primaryKey: true },
createdAt: { sql: `created_at`, type: `time` },
},
preAggregations: {
dailyRollup: {
type: `rollup`,
measures: [Orders.totalRevenue, Orders.count],
timeDimension: Orders.createdAt,
granularity: `day`,
},
},
});

Why we liked it

  • Truly headless, a single semantic model can serve many front-ends.
  • Fine-grained performance tuning (caching, pre-aggregations, etc.).
  • “Semantic-layer-sync” for existing BI tools.

Where it strained

  • Developer-centric learning curve.
  • When hosted, usage-based pricing relying on their own CCU (Cube Consumption Unit) model.
  • Steeper learning curve for designing effective pre-aggregations.

dbt + MetricFlow

The popular dbt transformation framework many teams already use now includes MetricFlow, allowing you to declare semantic models in YAML right alongside your SQL models in dbt.

semantic_models:
- name: order_items
defaults:
agg_time_dimension: created_at
entities:
- id: order_id
dimensions:
- name: created_at
type: time
measures:
- name: total_revenue
expr: price
agg: sum

Why we liked it

  • Lives in Git, metric definitions are version-controlled (with code reviews, CI, and environments) just like the rest of your dbt project.
  • Warehouse-native execution; no additional server needed (metrics are computed in your existing data warehouse).
  • Open-source version suits self-hosting.

Where it strained

  • Complex compiled SQL, large, multi-metric queries can explode into thousands-line SQL, taxing warehouses and making debugging tricky.
  • No built-in visualization.

How this shapes our roadmap

At Altertable we believe the future isn't another dashboard: it's an always-on insight engine. Semantic layers are a key ingredient, but only if they're:

  • Open & headless so insights travel everywhere users work.
  • AI-ready with rich types, units, and relations that agents can reason about.
  • Infra-lite because teams shouldn't babysit yet another server.

Each tool above nails part of that trifecta, but none check all three boxes yet which is why we're building, testing, and learning out loud.

Share

Sylvain Utard, Co-Founder & CEO at Altertable

Sylvain Utard

Co-Founder & CEO

Seasoned leader in B2B SaaS and B2C. Scaled 100+ teams at Algolia (1st hire) & Sorare. Passionate about data, performance and productivity.

Stay Updated

Get the latest insights on data, AI, and modern infrastructure delivered to your inbox

Related Articles

Continue exploring topics related to this article

Stop Batching Analytics
DECEMBER 30TH, 2025
Sylvain Utard

Stop Batching Analytics

Analytics, Architecture, Performance

Why we're forcing analytics through complex batch pipelines when append-only data should work like logs. The warehouse constraint that stopped making sense.

READ ARTICLE
Upside-Down Architecture
JANUARY 20TH, 2026
Yannick Utard

Upside-Down Architecture

Architecture, Engineering

Most analytics queries scan less than 100MB, yet traditional architectures still assume compute must live in a remote warehouse. We explore a hybrid model where compute moves between our servers and your local machine, powered by DuckDB and open table formats.

READ ARTICLE
Lessons from Search
JANUARY 13TH, 2026
Sylvain Utard

Lessons from Search

Performance, Architecture, Engineering

Real-time analytics systems face the same small-file problem that search engines solved decades ago. DuckLake's new tiered compaction primitives bring battle-tested merge strategies to streaming analytics, making low-latency ingestion sustainable.

READ ARTICLE
Rethinking the Lakehouse
JULY 30TH, 2025
Yannick Utard

Rethinking the Lakehouse

Architecture, Performance, Data Stack

Breaking down our storage and query architecture: why we're leaning into Apache Iceberg and why DuckDB is emerging as our real-time query engine of choice.

READ ARTICLE
Under the Hood: Agents
JULY 15TH, 2025
Sylvain Utard

Under the Hood: Agents

AI Agents, Architecture, Engineering

Altertable agents think ahead. Powered by custom lakehouse and MCP, they monitor, investigate, and act on your data autonomously.

READ ARTICLE
Altertable Logo

Wake Up To Insights

Join product, growth, and engineering teams enabling continuous discovery