Skip to content

Analysis Workflow

Every Marivo analysis starts from a metric and runs as a write-run-read loop: an agent writes an intent (an operator call plus its fields), runs it, and reads a typed result before deciding the next step.

The pieces:

  • A session holds the guiding question, the semantic catalog, and the persisted results of every step.
  • An intent is one operator call — observe, compare, decompose, … — with its parameters. The parameters are the analysis specification.
  • A frame is the typed result of an intent. Frames are the boundary between steps: each operator consumes specific frame types and produces another.
import marivo.analysis as mv
import marivo.semantic as ms
session = mv.session.get_or_create(name="revenue-investigation", question="Why did Q4 drop?")
catalog = session.catalog
revenue = catalog.get("sales.revenue") # a metric object
region = catalog.get("sales.orders.region") # a dimension object

get_or_create is idempotent: it attaches to an existing session of that name or creates one, and sets it current. The narrow session API is get_or_create, current(), list(), and delete(name).

Operators take catalog objects and a few shared value objects. You will reuse these across almost every intent:

ValueShapeMeaning
metric inputcatalog.get("sales.revenue")A catalog metric object or its SemanticRef subclass (e.g. MetricRef). Authoring refs from ms.aggregate(...) also work. Bare strings are rejected.
dimension inputcatalog.get("sales.orders.region")A catalog dimension object or its DimensionRef / TimeDimensionRef (see Semantic refs). Used for dimensions, where keys, and axis.
timescope{"start": "2026-10-01", "end": "2027-01-01"}Half-open time range — start inclusive, end exclusive.
grain"day" | "week" | "month" | "quarter" | "year" | "hour" | …Time bucket size. Present ⇒ time series or panel.
dimensions[region, country]Segment axes. In v1 all must resolve to the metric’s entity.
where{region: "US"} or {amount: {"op": ">", "value": 100}}Pre-aggregation row filter (see ops below).
AlignmentPolicymv.window_bucket()How two windows are paired for compare / correlate.

The result’s semantic kind follows from grain and dimensions: scalar (neither), time_series (grain only), segmented (dimensions only), or panel (both).

Keys are catalog dimensions; values are a scalar (==), a list (in), or a structured {"op": ..., "value": ...} form:

FormMeaning
"US"== (equality)
["US", "CA"]in (membership)
{"op": "!=", "value": "US"}not equal
{"op": ">", "value": 100} (>=, <, <= likewise)numeric comparison
{"op": "between", "value": ["2026-07-01", "2026-09-30"]}inclusive range (exactly two values)

The starting point for any analysis: materialize a metric over a time range and/or segments.

ParameterTypeRequiredDefaultMeaning
metricmetric object / refYesThe metric to materialize.
timescopedictNoNoneHalf-open {"start", "end"} window.
graingrainNoNoneTime bucket; present ⇒ time series or panel.
dimensionslist[ref]NoNoneSegment axes.
wheredictNoNonePre-aggregation row filter.
time_dimensionrefNoentity defaultPick the time axis when the entity declares several.
expect_shapeshapeNoNoneGuard; raises before backend work if the predicted shape differs.
current = session.observe(
revenue,
timescope={"start": "2026-10-01", "end": "2027-01-01"},
grain="month",
dimensions=[region],
)

Quantify change between two observe results (current minus baseline). The frames must share metric and semantic kind.

ParameterTypeRequiredDefaultMeaning
currentMetricFrameYesCurrent-period frame.
baselineMetricFrameYesBaseline-period frame.
alignmentAlignmentPolicyNowindow_bucketHow buckets/segments are paired.
baseline = session.observe(
revenue,
timescope={"start": "2025-10-01", "end": "2026-01-01"},
grain="month",
dimensions=[region],
)
delta = session.compare(current, baseline)

compare pairs buckets with window_bucket by default. Pass alignment= to override — mv.dow_aligned(), mv.holiday_aligned(), or mv.holiday_and_dow_aligned(); the calendar-backed kinds also take calendar=mv.CalendarRef(...).

Attribute a delta’s movement across one segment axis — why did it change?

ParameterTypeRequiredDefaultMeaning
frameDeltaFrameYesThe delta to explain.
axisdimensionYesThe segment axis to attribute over.
attribution = session.decompose(delta, axis=region)
attribution.show()

Measure the association between two metrics over aligned buckets.

ParameterTypeRequiredDefaultMeaning
a, bMetricFrameYesThe two frames to associate.
measure_a, measure_bstrNoframe measureNumeric column on each frame.
alignmentAlignmentPolicyNowindow_bucketBucket pairing.
method"pearson"No"pearson"Correlation method (v1: Pearson, zero lag).

Project a time series or panel forward.

ParameterTypeRequiredDefaultMeaning
historyMetricFrame (time_series/panel)YesContinuous history, no NaNs.
horizonintYesBuckets to project (≥ 1).
model"naive" | "seasonal_naive" | "drift"No"seasonal_naive"Forecast strategy.
seasonality_periodintNoby grainOverride the seasonal period (day=7, week=52, month=12, quarter=4).
interval_levelfloatNo0.95Confidence level for the prediction interval.
measure_columnstrNoframe measureColumn to forecast.
history = session.observe(revenue, timescope={"start": "2026-01-01", "end": "2026-04-01"}, grain="day")
projection = session.forecast(history, horizon=30)

Run quality checks (row counts, null ratios, time coverage, duplicate keys) over a MetricFrame. Returns per-check rows, blocking issues, and recommended follow-ups.

ParameterTypeRequiredDefaultMeaning
frameMetricFrameYesThe frame to inspect.

Paired test of whether a metric’s mean changed between two periods.

ParameterTypeRequiredDefaultMeaning
a, bMetricFrameYesCurrent and baseline frames.
hypothesis"mean_changed"No"mean_changed"Test type (v1).
value_a, value_bstrNoframe measureNumeric column on each frame.
alignmentAlignmentPolicyNowindow_bucketPairing for the test.
samplingSamplingPolicyNoinferredPairing/min-sample rules.
alphafloatNo0.05Significance level.

Discovery — session.discover.*CandidateSet

Section titled “Discovery — session.discover.* → CandidateSet”

Discovery operators search a frame for noteworthy items and return a ranked CandidateSet. Pass value="<column>" to disambiguate when a frame has several numeric columns; threshold is the cutoff (lower ⇒ more candidates).

HelperSource shapeRequiredKey options
point_anomaliesMetricFrame time_series/panelvalue, threshold=3.0
period_shiftsDeltaFrame time_series/panel≥ 4 bucketsvalue, threshold=2.0
driver_axesDeltaFramesearch_spacevalue, limit
interesting_slicesMetricFrame or DeltaFramesearch_space, value, threshold=2.0, limit
interesting_windowstime_series/panel framevalue, threshold=2.0
cross_sectional_outliersMetricFrame segmented/panelpeer_scope, value, threshold=3.0
series = session.observe(revenue, timescope={"start": "2026-01-01", "end": "2026-04-01"}, grain="day")
candidates = session.discover.point_anomalies(series, threshold=2.0)
candidates.show()

Transforms reshape a frame while preserving its family (MetricFrameMetricFrame, DeltaFrameDeltaFrame).

TransformKey parametersEffect
filterpredicate (callable)Keep rows where the predicate returns true.
slicewhere (axis → value/list/range)Keep rows matching exact axis values.
rollupdrop_axesDrop axes and re-aggregate measures.
topkby, limit, orderKeep the top N rows by a measure (order="decrease" default).
bottomkby, limitKeep the bottom N rows.
rankby, method, rank_columnAdd a rank column ordered by a measure.
normalizemode, baselineindex / share / pct_change / per_unit / z_score (MetricFrame only).
windowwindowRestrict to a time window.

When a step needs something the built-in intents do not model, drop to scratch frames — then promote back into the typed flow before continuing.

  • session.explore_ibis(builder, datasource=...) — run a custom ibis query → ExplorationResult.
  • session.from_pandas(df) — import external data → ExplorationResult.
  • session.promote_metric_frame(...) / promote_delta_frame(...) / promote_attribution_frame(...) — upgrade a scratch frame into a typed frame. Promotion never infers metadata; you supply metric, semantic_kind, measure_column, etc. (or a PromotionPolicy with semantic_anchors), and it fails closed when anything required is missing.

Every operator records evidence into the session, so conclusions stay auditable.

  • session.knowledge() — established facts, driver facts, open anomalies, and suggested follow-ups for the whole session.
  • session.evidence.findings(...), .propositions(...), .assessments(...), .proposition(id), .latest_assessment(id), .trace(id) — look up the evidence objects, and trace a proposition back to the findings that support it.

See Evidence for the full model.

import marivo.analysis as mv
session = mv.session.get_or_create(name="revenue-check", question="Why did Q4 drop?")
catalog = session.catalog
revenue = catalog.get("sales.revenue")
region = catalog.get("sales.orders.region")
current = session.observe(
revenue,
timescope={"start": "2026-10-01", "end": "2027-01-01"},
grain="month",
dimensions=[region],
)
baseline = session.observe(
revenue,
timescope={"start": "2025-10-01", "end": "2026-01-01"},
grain="month",
dimensions=[region],
)
delta = session.compare(current, baseline)
attribution = session.decompose(delta, axis=region)
attribution.show()

From the delta you can branch: session.discover.period_shifts(delta) to find when it moved, or session.forecast(current, horizon=3) to project it forward.

FrameProduced by
MetricFrameobserve (and promote_metric_frame)
DeltaFramecompare
AttributionFramedecompose
AssociationResultcorrelate
ForecastFrameforecast
QualityReportassess_quality
HypothesisTestResulthypothesis_test
CandidateSetdiscover.*
ExplorationResultfrom_pandas, explore_ibis