Skip to content

Semantic Layer

The semantic layer is how you teach Marivo what your data means. You declare datasources, entities, dimensions, and metrics in Python, and agents address them by semantic ref (a qualified name like sales.revenue) instead of raw table and column names.

Three rules hold across every object:

  • Python declarations are the contract. Decorated functions and builder calls are the source of truth for names, definitions, and shapes.
  • Ibis expressions are the execution language. Decorator bodies return ibis expressions, never raw SQL strings.
  • SQL text is metadata only. When you need SQL (for parity checking), it lives in provenance=ms.from_sql(sql=..., dialect=...), never in an executable body.

You work through two namespaces:

import marivo.datasource as md # connections (md.duckdb, md.ref, ...)
import marivo.semantic as ms # meaning (ms.entity, ms.metric, ...)

Every object lives under a domain and is addressed by a qualified ref:

  • Domain-level objects: <domain>.<object> — e.g. sales.revenue, sales.orders.
  • Entity-scoped objects (dimensions and measures): <domain>.<entity>.<field> — e.g. sales.orders.region.

A project is a folder of declaration files. Datasources are declared once under models/datasources/; semantics live under models/semantic/<domain>/, with a _domain.py per domain:

your-project/
marivo.toml
models/
datasources/
warehouse.py # md.duckdb(name="warehouse", ...)
semantic/
sales/
_domain.py # ms.domain(name="sales") + entities, metrics, ...

Every semantic object is identified by a semantic ref — a typed, immutable handle that carries both the qualified name and the kind of object it refers to. Refs are the same type at authoring time and in the analysis loop: an ms.entity(...) call, a catalog.get(...).ref lookup, and an analysis intent parameter all use the same SemanticRef family.

All refs share two read-only attributes:

AttributeTypeMeaning
.idstrQualified semantic id (e.g. "sales.revenue").
.kindSemanticKindObject kind — one of the eight values below.

str(ref) returns .id, so refs can be used wherever a string id is expected. Equality and hashing are by (type, id), so two refs of the same subclass and id are interchangeable.

Each SemanticKind value has a concrete ref subclass:

KindSubclassReturned byCallable?
domainDomainRefms.domain(...)No
datasourceDatasourceRefmd.ref(...)No
entityEntityRefms.entity(...)No
dimensionDimensionRef@ms.dimensionYes — in metric bodies
time_dimensionTimeDimensionRef@ms.time_dimensionYes — in metric bodies
measureMeasureRef@ms.measureYes — in metric bodies
metricMetricRefms.aggregate(...), @ms.metric, ms.ratio(...), …No
relationshipRelationshipRefms.relationship(...)No

Callable field refs (DimensionRef, MeasureRef, TimeDimensionRef) resolve to an ibis expression when called inside a metric body. All other refs raise a teaching error if accidentally called — they are identity tokens, not decorators.

Because authoring refs and catalog refs are the same type family, you can pass an authoring ref directly to an analysis intent without wrapping it:

revenue = ms.aggregate(name="revenue", measure=amount, agg="sum")
# revenue is a MetricRef — pass it directly to observe:
frame = session.observe(revenue, timescope={...})

The catalog’s catalog.get("sales.revenue").ref returns the same MetricRef subclass. There is no .ref.ref chain or type mismatch between authoring and analysis.

  • mv.make_ref(id, kind) — construct the per-kind subclass for a given kind (used internally by the catalog).
  • as_ref_id(value) — extract the .id string from a SemanticRef, SemanticObject, or plain str. String-tolerant: raw ids pass through.

Every semantic object accepts an optional ai_context dict. This is where business meaning and guardrails live — the context an agent reads before it uses the object. All keys are optional, but unknown keys are rejected.

FieldTypeRequiredDefaultMeaning
business_definitionstrNoNoneWhat the object means in business terms, in a sentence or two.
guardrailslist[str]No[]Rules an agent must respect: required filters, exclusions, scope limits.
synonymslist[str]No[]Alternate names so agents can resolve natural-language references.
exampleslist[str]No[]Example questions or phrasings this object answers.
instructionsstrNoNoneDirect guidance on how (and how not) to use the object.
owner_notesstrNoNoneNotes from the human owner: provenance, caveats, known issues.
ai_context={
"business_definition": "Gross order amount before refunds.",
"guardrails": ["Validate refund exclusions before using as net revenue."],
"synonyms": ["sales", "gmv"],
"examples": ["What was revenue by region last week?"],
}

Datasources are declared in models/datasources/*.py with a typed helper per backend. The helper registers the connection; it does not return a value. Semantic files refer to a datasource by name through md.ref("warehouse").

models/datasources/warehouse.py
import marivo.datasource as md
md.duckdb(
name="warehouse",
path="warehouse.duckdb",
ai_context={
"business_definition": "Local DuckDB warehouse for sales analysis.",
"guardrails": ["Use only for development or approved local analysis."],
},
)

Every helper (md.duckdb, md.mysql, md.postgres, md.trino, md.clickhouse) shares these parameters:

ParameterTypeRequiredDefaultMeaning
namestrYesGlobal datasource name (letters, digits, _, -). Used by md.ref(name).
descriptionstrNoNoneShort human-readable summary.
ai_contextAiContextNoNoneAgent-facing context (see above).
extradictNoNoneRare JSON-safe ibis keyword arguments the typed helper does not model.

Backend-specific parameters:

HelperRequiredOptional
md.duckdbpath (default ":memory:"), read_only (default False)
md.mysqlhost, databaseport (3306), autocommit, user_env, password_env
md.postgreshost, databaseport (5432), schema, autocommit, user_env, password_env
md.trinohost, catalogport (8080), schema, source, timezone, http_scheme, client_tags, session_properties, user_env, auth_env
md.clickhousehostport (9000 / 9440 secure), database, secure, settings, user_env, password_env
models/datasources/lake.py
import marivo.datasource as md
md.trino(
name="lake",
host="trino.example.internal",
catalog="hive",
user_env="TRINO_USER",
auth_env="TRINO_AUTH",
)

ms.domain(...) opens a namespace. Call it once per _domain.py. It returns a DomainRef you can pass as domain= to override the active domain for an object declared in a sibling file.

ParameterTypeRequiredDefaultMeaning
namestrYesDomain namespace, e.g. "sales". Objects become <name>.<object>.
defaultboolNoTrueWhen True, decorators in this file resolve to this domain unless domain= is passed.
ai_contextAiContextNoNoneAgent-facing context for the domain.
import marivo.semantic as ms
ms.domain(name="sales")

An entity is one physical source (a table or file) plus its primary key. It is the anchor that dimensions, measures, and metrics attach to.

ParameterTypeRequiredDefaultMeaning
namestrYesEntity name. Becomes <domain>.<name>.
datasourceDatasourceRef | strYesmd.ref("warehouse") or the global datasource name.
sourcesource builderYesms.table(...), ms.parquet(...), or ms.csv(...).
primary_keylist[str]NoNoneColumn names forming the primary key.
versioningms.snapshot | ms.validityNoNoneSnapshot or SCD2 validity versioning (see below).
domainDomainRefNofile defaultOverride the active domain.
ai_contextAiContextNoNoneAgent-facing context.
warehouse = md.ref("warehouse")
orders = ms.entity(
name="orders",
datasource=warehouse,
source=ms.table("orders"),
primary_key=["order_id"],
ai_context={"business_definition": "One row per order."},
)
BuilderRequiredOptionalUse for
ms.table(name)namedatabaseA table in the datasource (use database="schema" for Trino/MySQL).
ms.parquet(path)pathhive_partitioning, columnsParquet files (typically through DuckDB).
ms.csv(path)pathheader, delimiter, columnsCSV files (typically through DuckDB).

For entities whose rows change over time, declare how to read the current state:

  • ms.snapshot(partition_field, grain="day", timezone=None, format=None) — daily partitioned snapshots; reads the latest partition.
  • ms.validity(valid_from, valid_to, interval, open_end, timezone=None) — SCD2 validity intervals. interval is "closed_open" ([from, to)) or "closed_closed"; open_end lists the sentinel values that mean “still current” (e.g. (None,) for SQL NULL, or ("9999-12-31",)).

A dimension is a categorical attribute you group or filter by. It is a decorator whose body returns a single ibis expression over the entity table.

ParameterTypeRequiredDefaultMeaning
namestrNofunction nameDimension name. Becomes <domain>.<entity>.<name>.
entityEntityRef | strYesThe owning entity.
domainDomainRefNofile defaultOverride the active domain.
ai_contextAiContextNoNoneAgent-facing context.
@ms.dimension(
entity=orders,
name="region",
ai_context={"business_definition": "Sales reporting region."},
)
def region(table):
return table.region

A time dimension is a special dimension that carries grain and parsing metadata. Only time dimensions can serve as the time axis for session.observe.

ParameterTypeRequiredDefaultMeaning
namestrNofunction nameDimension name.
entityEntityRef | strYesThe owning entity.
granularitygrain literalYesyear, quarter, month, week, day, hour, minute, or second — the finest grain at which queries are meaningful.
parseparse variantNoNoneHow the source column becomes a time value (see below). Omit for native temporal columns — the parse variant is inferred at analysis time.
is_defaultboolNoFalseMarks the default time axis when the entity has several. observe uses it when time_dimension= is omitted.
domainDomainRefNofile defaultOverride the active domain.
ai_contextAiContextNoNoneAgent-facing context.

The parse= value declares the physical encoding of the column. When omitted, the parse variant is inferred from the column’s ibis dtype at analysis time (native date, datetime, and timestamp columns do not need an explicit parse). For string or integer columns, provide ms.strptime(format) or ms.hour_prefix(prefix). The variant must be compatible with granularity (e.g. an hour grain needs a time-bearing format).

BuilderSource column is…Key parameters
(omit parse)a native temporal column
ms.datetime()a native datetimetimezone (IANA), sample_interval
ms.timestamp()a native timestamptimezone (IANA), sample_interval
ms.strptime(format)a string/integer to parsetimezone, sample_interval
ms.hour_prefix(prefix)an hour-only partitionsample_intervalprefix is the day-grain time-dimension id that supplies the date

timezone defaults to the datasource engine timezone; set it (e.g. "UTC") only when the column’s wall-clock meaning differs. sample_interval like (5, "minute") marks a periodically-sampled axis used by semi-additive folds.

# Day partition stored as the string "20260131"
@ms.time_dimension(
entity=orders,
name="log_date",
granularity="day",
parse=ms.strptime("%Y%m%d"),
is_default=True,
)
def log_date(table):
return table.dt
# Native UTC timestamp, usable for sub-day buckets
@ms.time_dimension(
entity=orders,
name="event_ts",
granularity="minute",
parse=ms.timestamp(timezone="UTC"),
)
def event_ts(table):
return table.event_ts

A measure is a row-level quantitative expression you intend to aggregate (e.g. an amount or quantity). Like a dimension, it is a decorator returning one ibis expression — but it carries additivity and an optional unit.

ParameterTypeRequiredDefaultMeaning
namestrNofunction nameMeasure name. Becomes <domain>.<entity>.<name>.
entityEntityRef | strYesThe owning entity.
additivityadditivity valueYes"additive", "non_additive", or ms.semi_additive(...).
unitstrNoNoneUCUM unit token: "USD", "CNY", "%", "ms", "{order}".
domainDomainRefNofile defaultOverride the active domain.
ai_contextAiContextNoNoneAgent-facing context.
@ms.measure(entity=orders, additivity="additive", unit="CNY")
def amount(table):
return table.amount

A metric is the trusted, analysis-ready number an agent starts from. Marivo has several authoring shapes — pick by how the number is computed.

Simple metric from a measure — ms.aggregate

Section titled “Simple metric from a measure — ms.aggregate”

Aggregates a measure. No body; additivity is inherited from the measure.

ParameterTypeRequiredDefaultMeaning
namestrYesMetric name.
measureMeasureRef | strYesThe measure to aggregate.
aggaggregationYes"sum", "mean", "count", "count_distinct", "min", "max", …
foldfoldNoNoneTime-fold override for semi-additive measures.
unitstrNoinheritedOverride the unit derived from the measure.
domain / ai_contextNoAs elsewhere.
revenue = ms.aggregate(name="revenue", measure=amount, agg="sum")

Use the decorator when the number is an expression. The body returns one ibis aggregation; you declare additivity directly.

ParameterTypeRequiredDefaultMeaning
namestrNofunction nameMetric name.
entitieslist[EntityRef | str]YesEntities the body reads.
additivityadditivity valueYes"additive", "non_additive", or ms.semi_additive(...).
root_entityEntityRef | strNothe single entityRequired when entities has more than one.
fanout_policy"block" | "aggregate_then_join"No"block"How to handle join fan-out across entities.
unitstrNoNoneUCUM unit token.
provenanceSqlProvenanceNoNonems.from_sql(sql=..., dialect=...) for parity checking.
domain / ai_contextNoAs elsewhere.
@ms.metric(
entities=[orders],
additivity="additive",
name="revenue",
provenance=ms.from_sql(
sql="SELECT SUM(amount) AS revenue FROM orders",
dialect="duckdb",
),
ai_context={"business_definition": "Gross order amount before refunds."},
)
def revenue(table):
return table.amount.sum()

Derived metrics — ms.ratio / ms.weighted_average / ms.linear

Section titled “Derived metrics — ms.ratio / ms.weighted_average / ms.linear”

Body-free metrics composed from other metrics. The computation comes entirely from the components.

BuilderRequiredComputes
ms.ratio(name, numerator, denominator)both refsnumerator / denominator (e.g. average order value, rates)
ms.weighted_average(name, value, weight)both refsweighted average; decompose later splits mix vs rate
ms.linear(name, add, subtract)add (≥2 terms total)sum of add minus subtract (e.g. net = gross - refunds)

Each also accepts unit, domain, and ai_context.

net_revenue = ms.linear(name="net_revenue", add=[gross_revenue], subtract=[refunds])
aov = ms.ratio(name="aov", numerator=total_amount, denominator=orders_count)
  • ms.semi_additive(over, fold) — for snapshot/status facts that are additive across most axes but folded over a time axis. over is the status time dimension; fold is "last", "first", "mean", "max", or ("quantile", 0.95).
  • ms.from_sql(sql, dialect) — attaches SQL as provenance only, enabling ms.parity_check(...). It is never executed as the metric body.

Declares how two entities join, so metrics and dimensions can reach across them. Keys are dimension refs, not raw column names.

ParameterTypeRequiredDefaultMeaning
namestrYesRelationship name.
from_entityEntityRef | strYesSource entity.
to_entityEntityRef | strYesTarget entity.
keyslist[JoinKey]YesOne or more ms.join_on(from_key, to_key) pairs.
domain / ai_contextNoAs elsewhere.
ms.relationship(
name="orders_to_customers",
from_entity=orders,
to_entity=customers,
keys=[ms.join_on(order_customer_id, customer_id)],
)

Once declarations are in place, load the catalog and inspect it:

import marivo.semantic as ms
catalog = ms.load() # SemanticCatalog
catalog.list().show() # everything, grouped
catalog.list(kind="metric").show() # just metrics
revenue = catalog.get("sales.revenue") # one object
region = catalog.get("sales.orders.region") # also an object

Before any analysis, check readiness — the structural gate that keeps half-specified objects out of analysis:

report = ms.readiness()
if report.status == "blocked":
report.show() # blockers, with the next step for each

Two more checks support authoring:

  • ms.richness() — advisory coverage/depth report; never blocks.
  • ms.parity_check("sales.revenue") — runs the metric against its provenance SQL and compares results. Requires provenance=ms.from_sql(...).

For how readiness decides what is “ready,” see Readiness. For how analysis records what it concludes, see Evidence. Then continue to the Analysis Workflow.