Querying Metrics Tutorial¶
In this tutorial, you'll learn how to execute queries against the semantic layer using both the unified Project API and the SemanticQuery interface.
What You'll Learn: - Unified Project API (recommended) - simplest approach - Simple queries (total, no grouping) - Queries with one dimension - Queries with multiple dimensions - Filtered queries - Multiple metrics together
Option A: Unified Project API (Recommended)¶
The simplest way to query metrics is through the Project class:
from odibi import Project
# Load project - semantic layer inherits connections
project = Project.load("odibi.yaml")
# Query metrics - tables auto-loaded from connections
result = project.query("revenue BY region")
print(result.df)
The Project API:
- Reads connections from your odibi.yaml
- Resolves source: $pipeline.node or connection.path to full paths
- Auto-loads Delta tables when queried
- No manual context.register() calls needed
Example queries:
# Total revenue
result = project.query("revenue")
# Revenue by region
result = project.query("revenue BY region")
# Multiple metrics and dimensions
result = project.query("revenue, order_count BY region, month")
# With filter
result = project.query("revenue BY category WHERE region = 'North'")
Option B: Manual SemanticQuery Interface¶
For more control, use the SemanticQuery class directly.
Source Data¶
We'll use the star schema and semantic config from previous tutorials:
fact_orders (20 sample rows)¶
| order_id | customer_sk | product_sk | date_sk | quantity | line_total | status |
|---|---|---|---|---|---|---|
| ORD001 | 1 | 1 | 20240115 | 1 | 1299.99 | completed |
| ORD002 | 1 | 2 | 20240115 | 2 | 59.98 | completed |
| ORD003 | 2 | 3 | 20240116 | 1 | 249.99 | completed |
| ORD004 | 3 | 4 | 20240116 | 3 | 149.97 | completed |
| ORD005 | 4 | 5 | 20240117 | 1 | 599.99 | completed |
| ORD006 | 5 | 6 | 20240117 | 1 | 149.99 | completed |
| ORD007 | 6 | 7 | 20240118 | 2 | 799.98 | completed |
| ORD008 | 7 | 8 | 20240118 | 4 | 183.96 | completed |
| ORD009 | 8 | 9 | 20240119 | 1 | 79.99 | completed |
| ORD010 | 9 | 10 | 20240119 | 1 | 189.99 | completed |
| ORD011 | 10 | 1 | 20240120 | 1 | 1299.99 | completed |
| ORD012 | 11 | 2 | 20240120 | 5 | 149.95 | completed |
| ORD013 | 12 | 3 | 20240121 | 2 | 499.98 | completed |
| ORD014 | 1 | 4 | 20240121 | 1 | 49.99 | completed |
| ORD015 | 2 | 5 | 20240122 | 1 | 599.99 | pending |
| ORD016 | 3 | 6 | 20240122 | 2 | 299.98 | completed |
| ORD017 | 4 | 7 | 20240123 | 1 | 399.99 | completed |
| ORD018 | 5 | 8 | 20240123 | 3 | 137.97 | completed |
| ORD019 | 6 | 9 | 20240124 | 2 | 159.98 | completed |
| ORD020 | 7 | 10 | 20240124 | 1 | 189.99 | completed |
dim_customer (12 rows)¶
| customer_sk | name | region | city |
|---|---|---|---|
| 1 | Alice Johnson | North | Chicago |
| 2 | Bob Smith | South | Houston |
| 3 | Carol White | North | Detroit |
| 4 | David Brown | East | New York |
| 5 | Emma Davis | West | Seattle |
| 6 | Frank Miller | South | Miami |
| 7 | Grace Lee | East | Boston |
| 8 | Henry Wilson | West | Portland |
| 9 | Ivy Chen | North | Minneapolis |
| 10 | Jack Taylor | South | Dallas |
| 11 | Karen Martinez | East | Philadelphia |
| 12 | Leo Anderson | West | Denver |
dim_product (10 rows)¶
| product_sk | name | category |
|---|---|---|
| 1 | Laptop Pro 15 | Electronics |
| 2 | Wireless Mouse | Electronics |
| 3 | Office Chair | Furniture |
| 4 | USB-C Hub | Electronics |
| 5 | Standing Desk | Furniture |
| 6 | Mechanical Keyboard | Electronics |
| 7 | Monitor 27" | Electronics |
| 8 | Desk Lamp | Furniture |
| 9 | Webcam HD | Electronics |
| 10 | Filing Cabinet | Furniture |
Step 1: Simple Query - Total (No Grouping)¶
The simplest query returns a single aggregated value.
Query¶
from odibi.semantics import SemanticQuery, parse_semantic_config
from odibi.context import EngineContext
from odibi.enums import EngineType
import yaml
# Load config
with open("semantic_config.yaml") as f:
config = parse_semantic_config(yaml.safe_load(f))
# Create query interface
query = SemanticQuery(config)
# Setup context with data
context = EngineContext(df=None, engine_type=EngineType.PANDAS)
context.register("fact_orders", fact_orders_df)
context.register("dim_customer", dim_customer_df)
context.register("dim_product", dim_product_df)
context.register("dim_date", dim_date_df)
# Execute query
result = query.execute("revenue", context)
Query String¶
Generated SQL¶
Result (1 row)¶
| revenue |
|---|
| 8,953.56 |
Accessing the Result¶
print(f"Total Revenue: ${result.df['revenue'].iloc[0]:,.2f}")
# Output: Total Revenue: $8,953.56
print(f"Row count: {result.row_count}")
# Output: Row count: 1
print(f"Execution time: {result.elapsed_ms:.2f}ms")
# Output: Execution time: 12.34ms
print(f"Generated SQL: {result.sql_generated}")
# Output: SELECT SUM(line_total) AS revenue FROM fact_orders WHERE status = 'completed'
Step 2: Query with One Dimension¶
Add a dimension to group the results.
Query String¶
Generated SQL¶
SELECT
c.region,
SUM(f.line_total) AS revenue
FROM fact_orders f
JOIN dim_customer c ON f.customer_sk = c.customer_sk
WHERE f.status = 'completed'
GROUP BY c.region
Python Code¶
Result (4 rows)¶
| region | revenue |
|---|---|
| North | 2,549.88 |
| South | 2,349.93 |
| East | 1,923.88 |
| West | 2,129.87 |
Step 3: Query with Multiple Dimensions¶
Add more dimensions for a cross-tabulation.
Query String¶
Generated SQL¶
SELECT
c.region,
p.category,
SUM(f.line_total) AS revenue,
COUNT(*) AS order_count
FROM fact_orders f
JOIN dim_customer c ON f.customer_sk = c.customer_sk
JOIN dim_product p ON f.product_sk = p.product_sk
WHERE f.status = 'completed'
GROUP BY c.region, p.category
Python Code¶
Result (8 rows - region × category)¶
| region | category | revenue | order_count |
|---|---|---|---|
| North | Electronics | 1,549.94 | 4 |
| North | Furniture | 999.94 | 3 |
| South | Electronics | 1,449.95 | 4 |
| South | Furniture | 899.98 | 3 |
| East | Electronics | 1,323.91 | 4 |
| East | Furniture | 599.97 | 3 |
| West | Electronics | 1,079.93 | 3 |
| West | Furniture | 1,049.94 | 4 |
Step 4: Query with Filter¶
Add a WHERE clause to filter results.
Query String¶
Generated SQL¶
SELECT
p.category,
SUM(f.line_total) AS revenue
FROM fact_orders f
JOIN dim_customer c ON f.customer_sk = c.customer_sk
JOIN dim_product p ON f.product_sk = p.product_sk
WHERE f.status = 'completed'
AND c.region = 'North'
GROUP BY p.category
Python Code¶
Result (2 rows)¶
| category | revenue |
|---|---|
| Electronics | 1,549.94 |
| Furniture | 999.94 |
Multiple Filters¶
You can combine multiple filter conditions:
# Multiple conditions with AND
result = query.execute(
"revenue BY category WHERE region = 'North' AND year = 2024",
context
)
# IN clause
result = query.execute(
"revenue BY region WHERE region IN ('North', 'South')",
context
)
Step 5: Multiple Metrics¶
Query multiple metrics in a single call.
Query String¶
Python Code¶
Result (4 rows)¶
| region | revenue | order_count | avg_order_value |
|---|---|---|---|
| North | 2,549.88 | 7 | 364.27 |
| South | 2,349.93 | 7 | 335.70 |
| East | 1,923.88 | 7 | 274.84 |
| West | 2,129.87 | 7 | 304.27 |
Complete Python Script¶
Here's a complete, runnable example:
from odibi.semantics import SemanticQuery, parse_semantic_config
from odibi.context import EngineContext
from odibi.enums import EngineType
import pandas as pd
import yaml
# ===========================================
# 1. Load the semantic config
# ===========================================
config_dict = {
"metrics": [
{
"name": "revenue",
"description": "Total revenue from completed orders",
"expr": "SUM(line_total)",
"source": "fact_orders",
"filters": ["status = 'completed'"]
},
{
"name": "order_count",
"expr": "COUNT(*)",
"source": "fact_orders",
"filters": ["status = 'completed'"]
},
{
"name": "avg_order_value",
"expr": "AVG(line_total)",
"source": "fact_orders",
"filters": ["status = 'completed'"]
}
],
"dimensions": [
{"name": "region", "source": "dim_customer", "column": "region"},
{"name": "category", "source": "dim_product", "column": "category"},
{"name": "month", "source": "dim_date", "column": "month_name"}
]
}
config = parse_semantic_config(config_dict)
query = SemanticQuery(config)
# ===========================================
# 2. Load data and create context
# ===========================================
fact_orders = pd.read_parquet("warehouse/fact_orders")
dim_customer = pd.read_parquet("warehouse/dim_customer")
dim_product = pd.read_parquet("warehouse/dim_product")
dim_date = pd.read_parquet("warehouse/dim_date")
context = EngineContext(df=None, engine_type=EngineType.PANDAS)
context.register("fact_orders", fact_orders)
context.register("dim_customer", dim_customer)
context.register("dim_product", dim_product)
context.register("dim_date", dim_date)
# ===========================================
# 3. Execute queries
# ===========================================
# Query 1: Total revenue
print("=" * 50)
print("Query 1: Total Revenue")
print("=" * 50)
result = query.execute("revenue", context)
print(f"Total Revenue: ${result.df['revenue'].iloc[0]:,.2f}")
print()
# Query 2: Revenue by region
print("=" * 50)
print("Query 2: Revenue by Region")
print("=" * 50)
result = query.execute("revenue BY region", context)
print(result.df.to_string(index=False))
print()
# Query 3: Multiple metrics by region
print("=" * 50)
print("Query 3: Multiple Metrics by Region")
print("=" * 50)
result = query.execute("revenue, order_count, avg_order_value BY region", context)
print(result.df.to_string(index=False))
print()
# Query 4: Revenue by region and category
print("=" * 50)
print("Query 4: Revenue by Region and Category")
print("=" * 50)
result = query.execute("revenue BY region, category", context)
print(result.df.to_string(index=False))
print()
# Query 5: Filtered query
print("=" * 50)
print("Query 5: North Region Only")
print("=" * 50)
result = query.execute("revenue, order_count BY category WHERE region = 'North'", context)
print(result.df.to_string(index=False))
print()
# ===========================================
# 4. Show execution details
# ===========================================
print("=" * 50)
print("Execution Details (last query)")
print("=" * 50)
print(f"Metrics: {result.metrics}")
print(f"Dimensions: {result.dimensions}")
print(f"Row count: {result.row_count}")
print(f"Execution time: {result.elapsed_ms:.2f}ms")
Output¶
==================================================
Query 1: Total Revenue
==================================================
Total Revenue: $8,953.56
==================================================
Query 2: Revenue by Region
==================================================
region revenue
North 2549.88
South 2349.93
East 1923.88
West 2129.87
==================================================
Query 3: Multiple Metrics by Region
==================================================
region revenue order_count avg_order_value
North 2549.88 7 364.27
South 2349.93 7 335.70
East 1923.88 7 274.84
West 2129.87 7 304.27
==================================================
Query 4: Revenue by Region and Category
==================================================
region category revenue
North Electronics 1549.94
North Furniture 999.94
South Electronics 1449.95
South Furniture 899.98
East Electronics 1323.91
East Furniture 599.97
West Electronics 1079.93
West Furniture 1049.94
==================================================
Query 5: North Region Only
==================================================
category revenue order_count
Electronics 1549.94 4
Furniture 999.94 3
==================================================
Execution Details (last query)
==================================================
Metrics: ['revenue', 'order_count']
Dimensions: ['category']
Row count: 2
Execution time: 8.45ms
Query Syntax Reference¶
| Pattern | Example | Description |
|---|---|---|
| Single metric | "revenue" |
Total, no grouping |
| Metric + dimension | "revenue BY region" |
Grouped by one dimension |
| Multiple metrics | "revenue, order_count" |
Multiple metrics together |
| Multiple dimensions | "revenue BY region, category" |
Cross-tabulation |
| With filter | "revenue BY region WHERE year = 2024" |
Filtered results |
| Complex filter | "revenue BY region WHERE region IN ('North', 'South')" |
IN clause |
Error Handling¶
# Invalid metric
try:
result = query.execute("invalid_metric BY region", context)
except ValueError as e:
print(f"Error: {e}")
# Error: Unknown metric 'invalid_metric'. Available: ['revenue', 'order_count', 'avg_order_value']
# Invalid dimension
try:
result = query.execute("revenue BY invalid_dimension", context)
except ValueError as e:
print(f"Error: {e}")
# Error: Unknown dimension 'invalid_dimension'. Available: ['region', 'category', 'month']
What You Learned¶
In this tutorial, you learned:
- Simple queries return totals:
"revenue" - One dimension groups results:
"revenue BY region" - Multiple dimensions create cross-tabs:
"revenue BY region, category" - Filters constrain results:
"revenue BY region WHERE year = 2024" - Multiple metrics can be queried together:
"revenue, order_count BY region" - QueryResult contains the DataFrame, metrics, dimensions, and execution info
Next Steps¶
Now let's learn how to pre-compute metrics for dashboard performance.
Next: Materializing Metrics
Navigation¶
| Previous | Up | Next |
|---|---|---|
| Defining Dimensions | Tutorials | Materializing Metrics |
Reference¶
For complete documentation, see: Querying Reference