Semantic Layer Runner¶

The Semantic Layer Runner executes your metric definitions as SQL Server views and generates execution stories for observability. This guide walks you through everything from basic concepts to production deployment.

What is the Semantic Layer Runner?¶

In a typical data warehouse, you have:

Bronze layer - Raw data ingested from sources
Silver layer - Cleaned and transformed data
Gold layer - Business-ready fact and dimension tables
Semantic layer - Pre-aggregated views for analytics and BI tools

The Semantic Layer Runner bridges the gap between your Gold layer tables and your BI tools (Power BI, Tableau, etc.) by:

Generating SQL views from your metric definitions
Executing those views against SQL Server (or Azure SQL)
Creating execution stories that document what happened
Generating lineage showing how data flows from sources to views

Think of it as an automated "view factory" that turns your YAML metric definitions into real database views.

Why Use the Semantic Layer Runner?¶

Without the Runner¶

You would manually: 1. Write SQL view definitions by hand 2. Execute them against SQL Server 3. Track which views succeeded or failed 4. Document the SQL for auditing 5. Update views when metrics change

With the Runner¶

You: 1. Define metrics in YAML (human-readable) 2. Run one command 3. Get views created, documented, and tracked automatically

Benefits: - Consistency - All views follow the same pattern - Auditability - Every execution is documented in stories - Maintainability - Change YAML, re-run, views update - Observability - Know exactly what happened and when

Prerequisites¶

Before using the Semantic Layer Runner, you need:

Gold layer tables - Your fact tables with calculated metrics (e.g., fact_orders)
A SQL Server connection - Where views will be created
A storage connection - Where stories and SQL files will be saved (e.g., Azure Data Lake)

Configuration¶

The Semantic Layer Runner is configured in your odibi.yaml project file. Here's a complete example:

project: SalesAnalytics
engine: spark

# Define your connections
connections:
  # Where your gold layer data lives
  gold:
    type: delta
    account_name: mydatalake
    container: datalake
    base_path: gold/sales
    auth:
      mode: account_key
      account_key: ${AZURE_STORAGE_KEY}

  # Where SQL views will be created
  sql_server:
    type: azure_sql
    host: myserver.database.windows.net
    database: analytics_db
    port: 1433
    auth:
      mode: sql_login
      username: ${SQL_USER}
      password: ${SQL_PASSWORD}

  # Where stories will be saved
  stories:
    type: azure_blob
    account_name: mydatalake
    container: datalake
    base_path: stories
    auth:
      mode: account_key
      account_key: ${AZURE_STORAGE_KEY}

# Story configuration
story:
  connection: stories
  path: stories
  auto_generate: true
  generate_lineage: true

# Semantic layer configuration
semantic:
  # Which SQL Server connection to use for view creation
  connection: sql_server

  # Where to save generated SQL files (optional but recommended)
  sql_output_path: gold/sales/views

  # Define your views
  views:
    - name: vw_sales_daily
      description: "Daily sales metrics by store"
      source: fact_orders
      db_schema: semantic
      metrics:
        - name: revenue
          expr: "SUM(revenue)"
          description: "Total revenue"
        - name: order_count
          expr: "COUNT(*)"
          description: "Total number of orders"
        - name: avg_order_value
          expr: "AVG(order_value)"
          description: "Average order value"
        - name: total_sales
          expr: "SUM(total_sales)"
          description: "Total sales amount"
      dimensions:
        - name: date
          column: Date
        - name: store
          column: store_id
      grain: day

    - name: vw_sales_monthly
      description: "Monthly sales metrics by store"
      source: fact_orders
      db_schema: semantic
      metrics:
        - name: revenue
          expr: "SUM(revenue)"
        - name: order_count
          expr: "COUNT(*)"
        - name: total_sales
          expr: "SUM(total_sales)"
      dimensions:
        - name: year_month
          column: "FORMAT(Date, 'yyyy-MM')"
        - name: store
          column: store_id
      grain: month

Configuration Reference¶

`semantic.connection`¶

The name of the SQL Server connection where views will be created.

semantic:
  connection: sql_server  # Must match a connection name above

`semantic.sql_output_path`¶

Optional path where generated SQL files will be saved. Useful for: - Version control of SQL definitions - Auditing what SQL was executed - Debugging view creation issues

semantic:
  sql_output_path: gold/sales/views

`semantic.views`¶

A list of view definitions. Each view becomes a SQL Server view.

Field	Required	Description
`name`	Yes	View name (will be created as `db_schema.name`)
`description`	No	Human-readable description
`source`	Yes	Source table name (your gold layer fact table)
`db_schema`	No	SQL Server schema (default: `semantic`)
`ensure_schema`	No	Auto-create schema if missing (default: `true`)
`metrics`	Yes	List of metrics to include
`dimensions`	Yes	List of dimensions to group by
`grain`	No	Aggregation grain: `day`, `week`, `month`, `quarter`, `year`
`filters`	No	WHERE clause filters

Metric Definition¶

Each metric defines an aggregation:

metrics:
  - name: revenue                # Column name in the view
    expr: "SUM(revenue)"         # SQL aggregation expression
    description: "Total revenue" # Optional documentation

Common expressions: - SUM(column) - Total - AVG(column) - Average - COUNT(*) - Row count - COUNT(DISTINCT column) - Unique count - MAX(column) / MIN(column) - Maximum/Minimum

Dimension Definition¶

Each dimension defines a grouping column:

dimensions:
  - name: date           # Column name in the view
    column: Date         # Source column (can be an expression)

Examples:

dimensions:
  # Simple column reference
  - name: store_id
    column: store_id

  # Date formatting
  - name: year_month
    column: "FORMAT(Date, 'yyyy-MM')"

  # Derived column
  - name: is_weekend
    column: "CASE WHEN DATEPART(dw, Date) IN (1,7) THEN 1 ELSE 0 END"

Running the Semantic Layer¶

Using Python¶

from odibi import Project

# Load your project configuration
project = Project.load("odibi.yaml")

# Run the semantic layer
# This will:
# 1. Generate SQL for each view
# 2. Execute the SQL against SQL Server
# 3. Save the SQL files
# 4. Generate an execution story
result = project.run_semantic_layer()

# Check results
print(f"Views created: {result['views_created']}")
print(f"Views failed: {result['views_failed']}")
print(f"Duration: {result['duration']:.2f}s")

# Story paths (if auto_generate is true)
if result['story_paths']:
    print(f"Story JSON: {result['story_paths']['json']}")
    print(f"Story HTML: {result['story_paths']['html']}")

Using SemanticLayerRunner Directly¶

For more control, use the SemanticLayerRunner class:

from odibi.semantics.runner import SemanticLayerRunner
from odibi.config import ProjectConfig

# Load configuration
config = ProjectConfig.from_yaml("odibi.yaml")

# Create runner
runner = SemanticLayerRunner(config)

# Run with options
result = runner.run(
    generate_story=True,      # Create execution story
    generate_lineage=True,    # Create combined lineage
)

# Access metadata
if runner.metadata:
    for view in runner.metadata.views:
        status = "✓" if view.status == "success" else "✗"
        print(f"{status} {view.view_name}: {view.duration:.2f}s")
        if view.error_message:
            print(f"  Error: {view.error_message}")

Custom SQL Executor¶

If you need custom SQL execution logic:

import pyodbc

def my_sql_executor(sql: str) -> None:
    """Custom SQL executor with logging."""
    conn = pyodbc.connect(my_connection_string)
    cursor = conn.cursor()
    print(f"Executing: {sql[:100]}...")
    cursor.execute(sql)
    conn.commit()
    cursor.close()
    conn.close()

# Run with custom executor
result = runner.run(
    execute_sql=my_sql_executor,
    generate_story=True,
)

Custom File Writer¶

For custom storage backends:

def write_to_s3(path: str, content: str) -> None:
    """Write files to S3 instead of Azure."""
    import boto3
    s3 = boto3.client('s3')
    s3.put_object(Bucket='my-bucket', Key=path, Body=content)

result = runner.run(
    write_file=write_to_s3,
    generate_story=True,
)

Generated SQL¶

The runner generates standard SQL Server view definitions. Here's an example of what gets created:

Input (YAML)¶

views:
  - name: vw_sales_monthly
    source: fact_orders
    db_schema: semantic
    metrics:
      - name: revenue
        expr: "SUM(revenue)"
      - name: total_sales
        expr: "SUM(total_sales)"
    dimensions:
      - name: year_month
        column: "FORMAT(Date, 'yyyy-MM')"
      - name: store_id
        column: store_id
    filters:
      - "revenue > 0"

Output (SQL)¶

-- View: semantic.vw_sales_monthly
-- Generated by Odibi Semantic Layer Runner
-- Source: fact_orders
-- Generated at: 2026-01-02T10:30:00

-- Ensure schema exists
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = 'semantic')
BEGIN
    EXEC('CREATE SCHEMA semantic')
END
GO

-- Create or replace view
CREATE OR ALTER VIEW semantic.vw_sales_monthly AS
SELECT
    FORMAT(Date, 'yyyy-MM') AS year_month,
    store_id AS store_id,
    SUM(revenue) AS revenue,
    SUM(total_sales) AS total_sales
FROM fact_orders
WHERE revenue > 0
GROUP BY FORMAT(Date, 'yyyy-MM'), store_id
GO

Execution Stories¶

Every run generates an execution story - a detailed record of what happened. Stories are saved as both JSON (for programmatic access) and HTML (for human viewing).

Story Location¶

Stories are saved to:

{story.path}/{semantic_name}/{date}/run_{time}.json
{story.path}/{semantic_name}/{date}/run_{time}.html

Example:

stories/Sales_semantic/2026-01-02/run_10-30-45.json
stories/Sales_semantic/2026-01-02/run_10-30-45.html

Story Contents¶

The JSON story includes:

{
  "name": "Sales_semantic",
  "started_at": "2026-01-02T10:30:45",
  "completed_at": "2026-01-02T10:31:12",
  "duration": 27.3,
  "views_created": 5,
  "views_failed": 0,
  "views": [
    {
      "view_name": "vw_sales_daily",
      "source_table": "fact_orders",
      "status": "success",
      "duration": 2.1,
      "sql_generated": "CREATE OR ALTER VIEW semantic.vw_sales_daily AS ...",
      "sql_file_path": "gold/sales/views/vw_sales_daily.sql",
      "metrics_included": ["revenue", "order_count", "avg_order_value", "total_sales"],
      "dimensions_included": ["date", "store"]
    },
    {
      "view_name": "vw_sales_monthly",
      "source_table": "fact_orders",
      "status": "success",
      "duration": 1.8,
      "sql_generated": "...",
      "sql_file_path": "gold/sales/views/vw_sales_monthly.sql",
      "metrics_included": ["revenue", "order_count", "total_sales"],
      "dimensions_included": ["year_month", "store"]
    }
  ],
  "sql_files_saved": [
    "gold/sales/views/vw_sales_daily.sql",
    "gold/sales/views/vw_sales_monthly.sql"
  ],
  "graph_data": {
    "nodes": [
      {"id": "fact_orders", "type": "table", "layer": "gold"},
      {"id": "vw_sales_daily", "type": "view", "layer": "semantic"},
      {"id": "vw_sales_monthly", "type": "view", "layer": "semantic"}
    ],
    "edges": [
      {"from": "fact_orders", "to": "vw_sales_daily"},
      {"from": "fact_orders", "to": "vw_sales_monthly"}
    ]
  }
}

HTML Story¶

The HTML story provides a visual representation with:

Summary - Overall status, duration, counts
View Cards - Each view with status, metrics, dimensions
SQL Details - Expandable section showing generated SQL
Lineage Diagram - Visual graph of source → view relationships

Error Handling¶

When a view fails to create, the runner:

Continues processing other views (doesn't stop on first error)
Records the error in the story
Returns failed views in the result

Handling Failures¶

result = runner.run()

if result['views_failed']:
    print("Some views failed:")
    for view in runner.metadata.views:
        if view.status == "failed":
            print(f"  {view.view_name}: {view.error_message}")

Common Errors¶

Error	Cause	Solution
`Invalid column name`	Source column doesn't exist	Check `source` table schema
`Invalid object name`	Source table doesn't exist	Ensure gold layer table exists
`Cannot create schema`	Permission denied	Grant schema creation rights
`Login failed`	Authentication error	Check connection credentials

Best Practices¶

1. Use Descriptive Names¶

views:
  # Good - clear purpose
  - name: vw_sales_daily_by_store

  # Bad - unclear
  - name: v1

2. Document Your Metrics¶

metrics:
  - name: revenue
    expr: "SUM(revenue)"
    description: "Total revenue - sum of all order values"

3. Separate Schemas by Purpose¶

views:
  # Operational views for daily dashboards
  - name: vw_sales_daily
    db_schema: operational

  # Executive views for monthly reports
  - name: vw_sales_monthly
    db_schema: executive

4. Save SQL Files for Auditing¶

semantic:
  sql_output_path: gold/views  # Always save SQL

5. Check Stories After Runs¶

Always review execution stories, especially in production:

result = runner.run()
if result['story_paths']:
    print(f"Review story at: {result['story_paths']['html']}")

Integration with Pipelines¶

The typical workflow is:

Bronze pipeline - Ingest raw data
Silver pipeline - Clean and transform
Gold pipeline - Build fact tables (e.g., fact_orders)
Semantic layer - Create views from facts

from odibi import Project

project = Project.load("odibi.yaml")

# Run all pipelines in order
project.run("bronze")
project.run("silver")
project.run("gold")

# Then create semantic views
project.run_semantic_layer()

Scheduling¶

For production, schedule the semantic layer after your gold pipeline:

# In Databricks or Airflow
def daily_etl():
    project = Project.load("odibi.yaml")

    # Run ETL
    project.run("bronze")
    project.run("silver")
    project.run("gold")

    # Update semantic views
    result = project.run_semantic_layer()

    # Alert on failures
    if result['views_failed']:
        send_alert(f"Semantic layer had {len(result['views_failed'])} failures")

Troubleshooting¶

Views Not Updating¶

Symptom: You changed the YAML but views show old data.

Cause: CREATE OR ALTER VIEW should update, but check: 1. Are you running against the right SQL Server? 2. Check the story for errors 3. Verify the SQL file content

Permission Denied¶

Symptom: Cannot create schema or Cannot create view

Solution:

-- Grant permissions to your service account
GRANT CREATE SCHEMA TO [your_user];
GRANT CREATE VIEW TO [your_user];
GRANT SELECT ON SCHEMA::dbo TO [your_user];  -- For source tables

Stories Not Saving¶

Symptom: Run completes but no story files

Check: 1. Is story.auto_generate: true set? 2. Does the story.connection have write access? 3. Check logs for storage errors