Querying Metrics Tutorial¶

In this tutorial, you'll learn how to execute queries against the semantic layer using both the unified Project API and the SemanticQuery interface.

What You'll Learn: - Unified Project API (recommended) - simplest approach - Simple queries (total, no grouping) - Queries with one dimension - Queries with multiple dimensions - Filtered queries - Multiple metrics together

Option A: Unified Project API (Recommended)¶

The simplest way to query metrics is through the Project class:

from odibi import Project

# Load project - semantic layer inherits connections
project = Project.load("odibi.yaml")

# Query metrics - tables auto-loaded from connections
result = project.query("revenue BY region")
print(result.df)

The Project API: - Reads connections from your odibi.yaml - Resolves source: $pipeline.node or connection.path to full paths - Auto-loads Delta tables when queried - No manual context.register() calls needed

Example queries:

# Total revenue
result = project.query("revenue")

# Revenue by region
result = project.query("revenue BY region")

# Multiple metrics and dimensions
result = project.query("revenue, order_count BY region, month")

# With filter
result = project.query("revenue BY category WHERE region = 'North'")

Option B: Manual SemanticQuery Interface¶

For more control, use the SemanticQuery class directly.

Source Data¶

We'll use the star schema and semantic config from previous tutorials:

fact_orders (20 sample rows)¶

order_id	customer_sk	product_sk	date_sk	quantity	line_total	status
ORD001	1	1	20240115	1	1299.99	completed
ORD002	1	2	20240115	2	59.98	completed
ORD003	2	3	20240116	1	249.99	completed
ORD004	3	4	20240116	3	149.97	completed
ORD005	4	5	20240117	1	599.99	completed
ORD006	5	6	20240117	1	149.99	completed
ORD007	6	7	20240118	2	799.98	completed
ORD008	7	8	20240118	4	183.96	completed
ORD009	8	9	20240119	1	79.99	completed
ORD010	9	10	20240119	1	189.99	completed
ORD011	10	1	20240120	1	1299.99	completed
ORD012	11	2	20240120	5	149.95	completed
ORD013	12	3	20240121	2	499.98	completed
ORD014	1	4	20240121	1	49.99	completed
ORD015	2	5	20240122	1	599.99	pending
ORD016	3	6	20240122	2	299.98	completed
ORD017	4	7	20240123	1	399.99	completed
ORD018	5	8	20240123	3	137.97	completed
ORD019	6	9	20240124	2	159.98	completed
ORD020	7	10	20240124	1	189.99	completed

dim_customer (12 rows)¶

customer_sk	name	region	city
1	Alice Johnson	North	Chicago
2	Bob Smith	South	Houston
3	Carol White	North	Detroit
4	David Brown	East	New York
5	Emma Davis	West	Seattle
6	Frank Miller	South	Miami
7	Grace Lee	East	Boston
8	Henry Wilson	West	Portland
9	Ivy Chen	North	Minneapolis
10	Jack Taylor	South	Dallas
11	Karen Martinez	East	Philadelphia
12	Leo Anderson	West	Denver

dim_product (10 rows)¶

product_sk	name	category
1	Laptop Pro 15	Electronics
2	Wireless Mouse	Electronics
3	Office Chair	Furniture
4	USB-C Hub	Electronics
5	Standing Desk	Furniture
6	Mechanical Keyboard	Electronics
7	Monitor 27"	Electronics
8	Desk Lamp	Furniture
9	Webcam HD	Electronics
10	Filing Cabinet	Furniture

Step 1: Simple Query - Total (No Grouping)¶

The simplest query returns a single aggregated value.

Query¶

from odibi.semantics import SemanticQuery, parse_semantic_config
from odibi.context import EngineContext
from odibi.enums import EngineType
import yaml

# Load config
with open("semantic_config.yaml") as f:
    config = parse_semantic_config(yaml.safe_load(f))

# Create query interface
query = SemanticQuery(config)

# Setup context with data
context = EngineContext(df=None, engine_type=EngineType.PANDAS)
context.register("fact_orders", fact_orders_df)
context.register("dim_customer", dim_customer_df)
context.register("dim_product", dim_product_df)
context.register("dim_date", dim_date_df)

# Execute query
result = query.execute("revenue", context)

Query String¶

"revenue"

Generated SQL¶

SELECT SUM(line_total) AS revenue
FROM fact_orders
WHERE status = 'completed'

Result (1 row)¶

revenue
8,953.56

Accessing the Result¶

print(f"Total Revenue: ${result.df['revenue'].iloc[0]:,.2f}")
# Output: Total Revenue: $8,953.56

print(f"Row count: {result.row_count}")
# Output: Row count: 1

print(f"Execution time: {result.elapsed_ms:.2f}ms")
# Output: Execution time: 12.34ms

print(f"Generated SQL: {result.sql_generated}")
# Output: SELECT SUM(line_total) AS revenue FROM fact_orders WHERE status = 'completed'

Step 2: Query with One Dimension¶

Add a dimension to group the results.

Query String¶

"revenue BY region"

Generated SQL¶

SELECT 
    c.region,
    SUM(f.line_total) AS revenue
FROM fact_orders f
JOIN dim_customer c ON f.customer_sk = c.customer_sk
WHERE f.status = 'completed'
GROUP BY c.region

Python Code¶

result = query.execute("revenue BY region", context)
print(result.df)

Result (4 rows)¶

region	revenue
North	2,549.88
South	2,349.93
East	1,923.88
West	2,129.87

Step 3: Query with Multiple Dimensions¶

Add more dimensions for a cross-tabulation.

Query String¶

"revenue, order_count BY region, category"

Generated SQL¶

SELECT 
    c.region,
    p.category,
    SUM(f.line_total) AS revenue,
    COUNT(*) AS order_count
FROM fact_orders f
JOIN dim_customer c ON f.customer_sk = c.customer_sk
JOIN dim_product p ON f.product_sk = p.product_sk
WHERE f.status = 'completed'
GROUP BY c.region, p.category

Python Code¶

result = query.execute("revenue, order_count BY region, category", context)
print(result.df)

Result (8 rows - region × category)¶

region	category	revenue	order_count
North	Electronics	1,549.94	4
North	Furniture	999.94	3
South	Electronics	1,449.95	4
South	Furniture	899.98	3
East	Electronics	1,323.91	4
East	Furniture	599.97	3
West	Electronics	1,079.93	3
West	Furniture	1,049.94	4

Step 4: Query with Filter¶

Add a WHERE clause to filter results.

Query String¶

"revenue BY category WHERE region = 'North'"

Generated SQL¶

SELECT 
    p.category,
    SUM(f.line_total) AS revenue
FROM fact_orders f
JOIN dim_customer c ON f.customer_sk = c.customer_sk
JOIN dim_product p ON f.product_sk = p.product_sk
WHERE f.status = 'completed'
  AND c.region = 'North'
GROUP BY p.category

Python Code¶

result = query.execute("revenue BY category WHERE region = 'North'", context)
print(result.df)

Result (2 rows)¶

category	revenue
Electronics	1,549.94
Furniture	999.94

Multiple Filters¶

You can combine multiple filter conditions:

# Multiple conditions with AND
result = query.execute(
    "revenue BY category WHERE region = 'North' AND year = 2024",
    context
)

# IN clause
result = query.execute(
    "revenue BY region WHERE region IN ('North', 'South')",
    context
)

Step 5: Multiple Metrics¶

Query multiple metrics in a single call.

Query String¶

"revenue, order_count, avg_order_value BY region"

Python Code¶

result = query.execute("revenue, order_count, avg_order_value BY region", context)
print(result.df)

Result (4 rows)¶

region	revenue	order_count	avg_order_value
North	2,549.88	7	364.27
South	2,349.93	7	335.70
East	1,923.88	7	274.84
West	2,129.87	7	304.27

Complete Python Script¶

Here's a complete, runnable example:

from odibi.semantics import SemanticQuery, parse_semantic_config
from odibi.context import EngineContext
from odibi.enums import EngineType
import pandas as pd
import yaml

# ===========================================
# 1. Load the semantic config
# ===========================================
config_dict = {
    "metrics": [
        {
            "name": "revenue",
            "description": "Total revenue from completed orders",
            "expr": "SUM(line_total)",
            "source": "fact_orders",
            "filters": ["status = 'completed'"]
        },
        {
            "name": "order_count",
            "expr": "COUNT(*)",
            "source": "fact_orders",
            "filters": ["status = 'completed'"]
        },
        {
            "name": "avg_order_value",
            "expr": "AVG(line_total)",
            "source": "fact_orders",
            "filters": ["status = 'completed'"]
        }
    ],
    "dimensions": [
        {"name": "region", "source": "dim_customer", "column": "region"},
        {"name": "category", "source": "dim_product", "column": "category"},
        {"name": "month", "source": "dim_date", "column": "month_name"}
    ]
}

config = parse_semantic_config(config_dict)
query = SemanticQuery(config)

# ===========================================
# 2. Load data and create context
# ===========================================
fact_orders = pd.read_parquet("warehouse/fact_orders")
dim_customer = pd.read_parquet("warehouse/dim_customer")
dim_product = pd.read_parquet("warehouse/dim_product")
dim_date = pd.read_parquet("warehouse/dim_date")

context = EngineContext(df=None, engine_type=EngineType.PANDAS)
context.register("fact_orders", fact_orders)
context.register("dim_customer", dim_customer)
context.register("dim_product", dim_product)
context.register("dim_date", dim_date)

# ===========================================
# 3. Execute queries
# ===========================================

# Query 1: Total revenue
print("=" * 50)
print("Query 1: Total Revenue")
print("=" * 50)
result = query.execute("revenue", context)
print(f"Total Revenue: ${result.df['revenue'].iloc[0]:,.2f}")
print()

# Query 2: Revenue by region
print("=" * 50)
print("Query 2: Revenue by Region")
print("=" * 50)
result = query.execute("revenue BY region", context)
print(result.df.to_string(index=False))
print()

# Query 3: Multiple metrics by region
print("=" * 50)
print("Query 3: Multiple Metrics by Region")
print("=" * 50)
result = query.execute("revenue, order_count, avg_order_value BY region", context)
print(result.df.to_string(index=False))
print()

# Query 4: Revenue by region and category
print("=" * 50)
print("Query 4: Revenue by Region and Category")
print("=" * 50)
result = query.execute("revenue BY region, category", context)
print(result.df.to_string(index=False))
print()

# Query 5: Filtered query
print("=" * 50)
print("Query 5: North Region Only")
print("=" * 50)
result = query.execute("revenue, order_count BY category WHERE region = 'North'", context)
print(result.df.to_string(index=False))
print()

# ===========================================
# 4. Show execution details
# ===========================================
print("=" * 50)
print("Execution Details (last query)")
print("=" * 50)
print(f"Metrics: {result.metrics}")
print(f"Dimensions: {result.dimensions}")
print(f"Row count: {result.row_count}")
print(f"Execution time: {result.elapsed_ms:.2f}ms")

Output¶

==================================================
Query 1: Total Revenue
==================================================
Total Revenue: $8,953.56

==================================================
Query 2: Revenue by Region
==================================================
 region   revenue
  North  2549.88
  South  2349.93
   East  1923.88
   West  2129.87

==================================================
Query 3: Multiple Metrics by Region
==================================================
 region   revenue  order_count  avg_order_value
  North  2549.88            7           364.27
  South  2349.93            7           335.70
   East  1923.88            7           274.84
   West  2129.87            7           304.27

==================================================
Query 4: Revenue by Region and Category
==================================================
 region     category   revenue
  North  Electronics  1549.94
  North    Furniture   999.94
  South  Electronics  1449.95
  South    Furniture   899.98
   East  Electronics  1323.91
   East    Furniture   599.97
   West  Electronics  1079.93
   West    Furniture  1049.94

==================================================
Query 5: North Region Only
==================================================
    category   revenue  order_count
 Electronics  1549.94            4
   Furniture   999.94            3

==================================================
Execution Details (last query)
==================================================
Metrics: ['revenue', 'order_count']
Dimensions: ['category']
Row count: 2
Execution time: 8.45ms

Query Syntax Reference¶

Pattern	Example	Description
Single metric	`"revenue"`	Total, no grouping
Metric + dimension	`"revenue BY region"`	Grouped by one dimension
Multiple metrics	`"revenue, order_count"`	Multiple metrics together
Multiple dimensions	`"revenue BY region, category"`	Cross-tabulation
With filter	`"revenue BY region WHERE year = 2024"`	Filtered results
Complex filter	`"revenue BY region WHERE region IN ('North', 'South')"`	IN clause

Error Handling¶

# Invalid metric
try:
    result = query.execute("invalid_metric BY region", context)
except ValueError as e:
    print(f"Error: {e}")
    # Error: Unknown metric 'invalid_metric'. Available: ['revenue', 'order_count', 'avg_order_value']

# Invalid dimension
try:
    result = query.execute("revenue BY invalid_dimension", context)
except ValueError as e:
    print(f"Error: {e}")
    # Error: Unknown dimension 'invalid_dimension'. Available: ['region', 'category', 'month']

What You Learned¶

In this tutorial, you learned:

Simple queries return totals: "revenue"
One dimension groups results: "revenue BY region"
Multiple dimensions create cross-tabs: "revenue BY region, category"
Filters constrain results: "revenue BY region WHERE year = 2024"
Multiple metrics can be queried together: "revenue, order_count BY region"
QueryResult contains the DataFrame, metrics, dimensions, and execution info

Next Steps¶

Now let's learn how to pre-compute metrics for dashboard performance.

Next: Materializing Metrics

Previous	Up	Next
Defining Dimensions	Tutorials	Materializing Metrics

Reference¶

For complete documentation, see: Querying Reference

Querying Metrics Tutorial¶

Option A: Unified Project API (Recommended)¶

Option B: Manual SemanticQuery Interface¶

Source Data¶

fact_orders (20 sample rows)¶

dim_customer (12 rows)¶

dim_product (10 rows)¶

Step 1: Simple Query - Total (No Grouping)¶

Query¶

Query String¶

Generated SQL¶

Result (1 row)¶

Accessing the Result¶

Step 2: Query with One Dimension¶

Query String¶

Generated SQL¶

Python Code¶

Result (4 rows)¶

Step 3: Query with Multiple Dimensions¶

Query String¶

Generated SQL¶

Python Code¶

Result (8 rows - region × category)¶

Step 4: Query with Filter¶

Query String¶

Generated SQL¶

Python Code¶

Result (2 rows)¶

Multiple Filters¶

Step 5: Multiple Metrics¶

Query String¶

Python Code¶

Result (4 rows)¶

Complete Python Script¶

Output¶

Query Syntax Reference¶

Error Handling¶

What You Learned¶

Next Steps¶

Navigation¶

Reference¶