Semantic Layer Runner¶
The Semantic Layer Runner executes your metric definitions as SQL Server views and generates execution stories for observability. This guide walks you through everything from basic concepts to production deployment.
What is the Semantic Layer Runner?¶
In a typical data warehouse, you have:
- Bronze layer - Raw data ingested from sources
- Silver layer - Cleaned and transformed data
- Gold layer - Business-ready fact and dimension tables
- Semantic layer - Pre-aggregated views for analytics and BI tools
The Semantic Layer Runner bridges the gap between your Gold layer tables and your BI tools (Power BI, Tableau, etc.) by:
- Generating SQL views from your metric definitions
- Executing those views against SQL Server (or Azure SQL)
- Creating execution stories that document what happened
- Generating lineage showing how data flows from sources to views
Think of it as an automated "view factory" that turns your YAML metric definitions into real database views.
Why Use the Semantic Layer Runner?¶
Without the Runner¶
You would manually: 1. Write SQL view definitions by hand 2. Execute them against SQL Server 3. Track which views succeeded or failed 4. Document the SQL for auditing 5. Update views when metrics change
With the Runner¶
You: 1. Define metrics in YAML (human-readable) 2. Run one command 3. Get views created, documented, and tracked automatically
Benefits: - Consistency - All views follow the same pattern - Auditability - Every execution is documented in stories - Maintainability - Change YAML, re-run, views update - Observability - Know exactly what happened and when
Prerequisites¶
Before using the Semantic Layer Runner, you need:
- Gold layer tables - Your fact tables with calculated metrics (e.g.,
fact_orders) - A SQL Server connection - Where views will be created
- A storage connection - Where stories and SQL files will be saved (e.g., Azure Data Lake)
Configuration¶
The Semantic Layer Runner is configured in your odibi.yaml project file. Here's a complete example:
project: SalesAnalytics
engine: spark
# Define your connections
connections:
# Where your gold layer data lives
gold:
type: delta
account_name: mydatalake
container: datalake
base_path: gold/sales
auth:
mode: account_key
account_key: ${AZURE_STORAGE_KEY}
# Where SQL views will be created
sql_server:
type: azure_sql
host: myserver.database.windows.net
database: analytics_db
port: 1433
auth:
mode: sql_login
username: ${SQL_USER}
password: ${SQL_PASSWORD}
# Where stories will be saved
stories:
type: azure_blob
account_name: mydatalake
container: datalake
base_path: stories
auth:
mode: account_key
account_key: ${AZURE_STORAGE_KEY}
# Story configuration
story:
connection: stories
path: stories
auto_generate: true
generate_lineage: true
# Semantic layer configuration
semantic:
# Which SQL Server connection to use for view creation
connection: sql_server
# Where to save generated SQL files (optional but recommended)
sql_output_path: gold/sales/views
# Define your views
views:
- name: vw_sales_daily
description: "Daily sales metrics by store"
source: fact_orders
db_schema: semantic
metrics:
- name: revenue
expr: "SUM(revenue)"
description: "Total revenue"
- name: order_count
expr: "COUNT(*)"
description: "Total number of orders"
- name: avg_order_value
expr: "AVG(order_value)"
description: "Average order value"
- name: total_sales
expr: "SUM(total_sales)"
description: "Total sales amount"
dimensions:
- name: date
column: Date
- name: store
column: store_id
grain: day
- name: vw_sales_monthly
description: "Monthly sales metrics by store"
source: fact_orders
db_schema: semantic
metrics:
- name: revenue
expr: "SUM(revenue)"
- name: order_count
expr: "COUNT(*)"
- name: total_sales
expr: "SUM(total_sales)"
dimensions:
- name: year_month
column: "FORMAT(Date, 'yyyy-MM')"
- name: store
column: store_id
grain: month
Configuration Reference¶
semantic.connection¶
The name of the SQL Server connection where views will be created.
semantic.sql_output_path¶
Optional path where generated SQL files will be saved. Useful for: - Version control of SQL definitions - Auditing what SQL was executed - Debugging view creation issues
semantic.views¶
A list of view definitions. Each view becomes a SQL Server view.
| Field | Required | Description |
|---|---|---|
name |
Yes | View name (will be created as db_schema.name) |
description |
No | Human-readable description |
source |
Yes | Source table name (your gold layer fact table) |
db_schema |
No | SQL Server schema (default: semantic) |
ensure_schema |
No | Auto-create schema if missing (default: true) |
metrics |
Yes | List of metrics to include |
dimensions |
Yes | List of dimensions to group by |
grain |
No | Aggregation grain: day, week, month, quarter, year |
filters |
No | WHERE clause filters |
Metric Definition¶
Each metric defines an aggregation:
metrics:
- name: revenue # Column name in the view
expr: "SUM(revenue)" # SQL aggregation expression
description: "Total revenue" # Optional documentation
Common expressions:
- SUM(column) - Total
- AVG(column) - Average
- COUNT(*) - Row count
- COUNT(DISTINCT column) - Unique count
- MAX(column) / MIN(column) - Maximum/Minimum
Dimension Definition¶
Each dimension defines a grouping column:
dimensions:
- name: date # Column name in the view
column: Date # Source column (can be an expression)
Examples:
dimensions:
# Simple column reference
- name: store_id
column: store_id
# Date formatting
- name: year_month
column: "FORMAT(Date, 'yyyy-MM')"
# Derived column
- name: is_weekend
column: "CASE WHEN DATEPART(dw, Date) IN (1,7) THEN 1 ELSE 0 END"
Running the Semantic Layer¶
Using Python¶
from odibi import Project
# Load your project configuration
project = Project.load("odibi.yaml")
# Run the semantic layer
# This will:
# 1. Generate SQL for each view
# 2. Execute the SQL against SQL Server
# 3. Save the SQL files
# 4. Generate an execution story
result = project.run_semantic_layer()
# Check results
print(f"Views created: {result['views_created']}")
print(f"Views failed: {result['views_failed']}")
print(f"Duration: {result['duration']:.2f}s")
# Story paths (if auto_generate is true)
if result['story_paths']:
print(f"Story JSON: {result['story_paths']['json']}")
print(f"Story HTML: {result['story_paths']['html']}")
Using SemanticLayerRunner Directly¶
For more control, use the SemanticLayerRunner class:
from odibi.semantics.runner import SemanticLayerRunner
from odibi.config import ProjectConfig
# Load configuration
config = ProjectConfig.from_yaml("odibi.yaml")
# Create runner
runner = SemanticLayerRunner(config)
# Run with options
result = runner.run(
generate_story=True, # Create execution story
generate_lineage=True, # Create combined lineage
)
# Access metadata
if runner.metadata:
for view in runner.metadata.views:
status = "✓" if view.status == "success" else "✗"
print(f"{status} {view.view_name}: {view.duration:.2f}s")
if view.error_message:
print(f" Error: {view.error_message}")
Custom SQL Executor¶
If you need custom SQL execution logic:
import pyodbc
def my_sql_executor(sql: str) -> None:
"""Custom SQL executor with logging."""
conn = pyodbc.connect(my_connection_string)
cursor = conn.cursor()
print(f"Executing: {sql[:100]}...")
cursor.execute(sql)
conn.commit()
cursor.close()
conn.close()
# Run with custom executor
result = runner.run(
execute_sql=my_sql_executor,
generate_story=True,
)
Custom File Writer¶
For custom storage backends:
def write_to_s3(path: str, content: str) -> None:
"""Write files to S3 instead of Azure."""
import boto3
s3 = boto3.client('s3')
s3.put_object(Bucket='my-bucket', Key=path, Body=content)
result = runner.run(
write_file=write_to_s3,
generate_story=True,
)
Generated SQL¶
The runner generates standard SQL Server view definitions. Here's an example of what gets created:
Input (YAML)¶
views:
- name: vw_sales_monthly
source: fact_orders
db_schema: semantic
metrics:
- name: revenue
expr: "SUM(revenue)"
- name: total_sales
expr: "SUM(total_sales)"
dimensions:
- name: year_month
column: "FORMAT(Date, 'yyyy-MM')"
- name: store_id
column: store_id
filters:
- "revenue > 0"
Output (SQL)¶
-- View: semantic.vw_sales_monthly
-- Generated by Odibi Semantic Layer Runner
-- Source: fact_orders
-- Generated at: 2026-01-02T10:30:00
-- Ensure schema exists
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'semantic')
BEGIN
EXEC('CREATE SCHEMA semantic')
END
GO
-- Create or replace view
CREATE OR ALTER VIEW semantic.vw_sales_monthly AS
SELECT
FORMAT(Date, 'yyyy-MM') AS year_month,
store_id AS store_id,
SUM(revenue) AS revenue,
SUM(total_sales) AS total_sales
FROM fact_orders
WHERE revenue > 0
GROUP BY FORMAT(Date, 'yyyy-MM'), store_id
GO
Execution Stories¶
Every run generates an execution story - a detailed record of what happened. Stories are saved as both JSON (for programmatic access) and HTML (for human viewing).
Story Location¶
Stories are saved to:
{story.path}/{semantic_name}/{date}/run_{time}.json
{story.path}/{semantic_name}/{date}/run_{time}.html
Example:
stories/Sales_semantic/2026-01-02/run_10-30-45.json
stories/Sales_semantic/2026-01-02/run_10-30-45.html
Story Contents¶
The JSON story includes:
{
"name": "Sales_semantic",
"started_at": "2026-01-02T10:30:45",
"completed_at": "2026-01-02T10:31:12",
"duration": 27.3,
"views_created": 5,
"views_failed": 0,
"views": [
{
"view_name": "vw_sales_daily",
"source_table": "fact_orders",
"status": "success",
"duration": 2.1,
"sql_generated": "CREATE OR ALTER VIEW semantic.vw_sales_daily AS ...",
"sql_file_path": "gold/sales/views/vw_sales_daily.sql",
"metrics_included": ["revenue", "order_count", "avg_order_value", "total_sales"],
"dimensions_included": ["date", "store"]
},
{
"view_name": "vw_sales_monthly",
"source_table": "fact_orders",
"status": "success",
"duration": 1.8,
"sql_generated": "...",
"sql_file_path": "gold/sales/views/vw_sales_monthly.sql",
"metrics_included": ["revenue", "order_count", "total_sales"],
"dimensions_included": ["year_month", "store"]
}
],
"sql_files_saved": [
"gold/sales/views/vw_sales_daily.sql",
"gold/sales/views/vw_sales_monthly.sql"
],
"graph_data": {
"nodes": [
{"id": "fact_orders", "type": "table", "layer": "gold"},
{"id": "vw_sales_daily", "type": "view", "layer": "semantic"},
{"id": "vw_sales_monthly", "type": "view", "layer": "semantic"}
],
"edges": [
{"from": "fact_orders", "to": "vw_sales_daily"},
{"from": "fact_orders", "to": "vw_sales_monthly"}
]
}
}
HTML Story¶
The HTML story provides a visual representation with:
- Summary - Overall status, duration, counts
- View Cards - Each view with status, metrics, dimensions
- SQL Details - Expandable section showing generated SQL
- Lineage Diagram - Visual graph of source → view relationships
Error Handling¶
When a view fails to create, the runner:
- Continues processing other views (doesn't stop on first error)
- Records the error in the story
- Returns failed views in the result
Handling Failures¶
result = runner.run()
if result['views_failed']:
print("Some views failed:")
for view in runner.metadata.views:
if view.status == "failed":
print(f" {view.view_name}: {view.error_message}")
Common Errors¶
| Error | Cause | Solution |
|---|---|---|
Invalid column name |
Source column doesn't exist | Check source table schema |
Invalid object name |
Source table doesn't exist | Ensure gold layer table exists |
Cannot create schema |
Permission denied | Grant schema creation rights |
Login failed |
Authentication error | Check connection credentials |
Best Practices¶
1. Use Descriptive Names¶
2. Document Your Metrics¶
metrics:
- name: revenue
expr: "SUM(revenue)"
description: "Total revenue - sum of all order values"
3. Separate Schemas by Purpose¶
views:
# Operational views for daily dashboards
- name: vw_sales_daily
db_schema: operational
# Executive views for monthly reports
- name: vw_sales_monthly
db_schema: executive
4. Save SQL Files for Auditing¶
5. Check Stories After Runs¶
Always review execution stories, especially in production:
result = runner.run()
if result['story_paths']:
print(f"Review story at: {result['story_paths']['html']}")
Integration with Pipelines¶
The typical workflow is:
- Bronze pipeline - Ingest raw data
- Silver pipeline - Clean and transform
- Gold pipeline - Build fact tables (e.g.,
fact_orders) - Semantic layer - Create views from facts
from odibi import Project
project = Project.load("odibi.yaml")
# Run all pipelines in order
project.run("bronze")
project.run("silver")
project.run("gold")
# Then create semantic views
project.run_semantic_layer()
Scheduling¶
For production, schedule the semantic layer after your gold pipeline:
# In Databricks or Airflow
def daily_etl():
project = Project.load("odibi.yaml")
# Run ETL
project.run("bronze")
project.run("silver")
project.run("gold")
# Update semantic views
result = project.run_semantic_layer()
# Alert on failures
if result['views_failed']:
send_alert(f"Semantic layer had {len(result['views_failed'])} failures")
Troubleshooting¶
Views Not Updating¶
Symptom: You changed the YAML but views show old data.
Cause: CREATE OR ALTER VIEW should update, but check:
1. Are you running against the right SQL Server?
2. Check the story for errors
3. Verify the SQL file content
Permission Denied¶
Symptom: Cannot create schema or Cannot create view
Solution:
-- Grant permissions to your service account
GRANT CREATE SCHEMA TO [your_user];
GRANT CREATE VIEW TO [your_user];
GRANT SELECT ON SCHEMA::dbo TO [your_user]; -- For source tables
Stories Not Saving¶
Symptom: Run completes but no story files
Check:
1. Is story.auto_generate: true set?
2. Does the story.connection have write access?
3. Check logs for storage errors
See Also¶
- Defining Metrics - Metric and dimension definitions
- Querying - Ad-hoc metric queries
- Materializing - Pre-computing aggregates
- Lineage Stitcher - Combined lineage generation
- Stories - Pipeline execution stories