Execution Stories¶
Auto-generated pipeline execution documentation with rich metadata, sample data, and multiple output formats.
Overview¶
Odibi's Story system provides: - Execution timeline: Complete record of pipeline runs with timestamps - Node-level metrics: Duration, row counts, schema changes per node - Sample data capture: Input/output samples with automatic redaction - Multiple renderers: HTML, Markdown, JSON output formats - Themes: Customizable styling for HTML reports - Retention policies: Automatic cleanup of old stories
Configuration¶
Basic Story Setup¶
story:
connection: "local_data"
path: "stories/"
max_sample_rows: 10
retention_days: 30
retention_count: 100
failure_sample_size: 100
max_failure_samples: 500
max_sampled_validations: 5
Story Config Options¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
connection |
string | Yes | - | Connection name for story output |
path |
string | Yes | - | Path for stories (relative to connection base_path) |
max_sample_rows |
int | No | 10 |
Maximum rows to include in samples |
retention_days |
int | No | 30 |
Days to keep stories before cleanup |
retention_count |
int | No | 100 |
Maximum number of stories to retain |
failure_sample_size |
int | No | 100 |
Rows to capture per validation failure |
max_failure_samples |
int | No | 500 |
Total failed rows across all validations |
max_sampled_validations |
int | No | 5 |
After this many validations, show only counts |
theme |
string | No | default |
Built-in options: 'default', 'corporate', 'dark', 'minimal', or path to custom theme YAML file |
include_samples |
bool | No | true |
Whether to include data samples |
Remote Storage¶
Stories can be written to remote storage (ADLS, S3) using fsspec:
story:
output_path: abfss://container@account.dfs.core.windows.net/stories/
storage_options:
account_key: "${STORAGE_ACCOUNT_KEY}"
Story Contents¶
Each story captures comprehensive execution metadata:
Execution Timeline¶
| Metric | Description |
|---|---|
started_at |
ISO timestamp when pipeline started |
completed_at |
ISO timestamp when pipeline finished |
duration |
Total execution time in seconds |
run_id |
Unique identifier for the run |
Node Results¶
For each node in the pipeline:
| Metric | Description |
|---|---|
node_name |
Name of the node |
operation |
Operation type (read, transform, write) |
status |
Execution status: success, failed, skipped |
duration |
Node execution time in seconds |
rows_in |
Input row count |
rows_out |
Output row count |
rows_change |
Row count difference |
rows_change_pct |
Percentage change in row count |
Sample Data¶
Sample data is captured with automatic redaction of sensitive values:
sample_data:
- order_id: 12345
customer_email: "[REDACTED]"
amount: 99.99
- order_id: 12346
customer_email: "[REDACTED]"
amount: 149.99
Configure sample capture:
Schema Changes¶
Stories track schema evolution:
| Field | Description |
|---|---|
schema_in |
Input column names |
schema_out |
Output column names |
columns_added |
New columns added |
columns_removed |
Columns removed |
columns_renamed |
Renamed columns |
Validation Results¶
Validation warnings and errors are captured:
validation_warnings:
- "Column 'email' has 5% null values"
- "Date range extends beyond expected bounds"
Error details for failed nodes:
error_type: ValueError
error_message: "Column 'order_id' contains duplicate values"
error_traceback: "Full Python traceback..."
error_traceback_cleaned: "Cleaned traceback (Spark/Java noise removed)"
Execution Steps¶
Stories capture the execution steps taken during node processing for debugging:
execution_steps:
- "Read from bronze_db"
- "Applied pattern 'deduplicate'"
- "Executed 2 pre-SQL statement(s)"
- "Passed 3 contract checks"
Failed Rows Samples¶
When validations fail, stories capture sample rows that failed each validation:
failed_rows_samples:
not_null_customer_id:
- { order_id: 123, customer_id: null, amount: 50.00 }
- { order_id: 456, customer_id: null, amount: 75.00 }
positive_amount:
- { order_id: 789, customer_id: "C001", amount: -10.00 }
failed_rows_counts:
not_null_customer_id: 150
positive_amount: 25
Configure failure sample limits:
story:
failure_sample_size: 100 # Max rows per validation
max_failure_samples: 500 # Total rows across all validations
max_sampled_validations: 5 # After 5 validations, show only counts
Retry History¶
When retries occur, the full history is captured:
retry_history:
- attempt: 1
success: false
error: "Connection timeout"
error_type: "TimeoutError"
duration: 1.2
- attempt: 2
success: false
error: "Connection timeout"
error_type: "TimeoutError"
duration: 2.4
- attempt: 3
success: true
duration: 0.8
Delta Lake Info¶
For Delta Lake writes, version and operation metrics are captured:
delta_info:
version: 42
operation: MERGE
operation_metrics:
numTargetRowsInserted: 150
numTargetRowsUpdated: 25
Themes¶
Customize HTML story appearance with built-in or custom themes.
Built-in Themes¶
| Theme | Description |
|---|---|
default |
Clean, professional blue theme |
corporate |
Traditional business styling with serif headings |
dark |
Dark mode with high-contrast colors |
minimal |
Simple black and white, compact layout |
Using Themes¶
Custom Theme File¶
Create a custom theme YAML file:
# my_theme.yaml
name: company_brand
primary_color: "#003366"
success_color: "#2e7d32"
error_color: "#c62828"
warning_color: "#ff9900"
bg_color: "#ffffff"
text_color: "#333333"
font_family: "Arial, sans-serif"
heading_font: "Georgia, serif"
logo_url: "https://example.com/logo.png"
company_name: "Acme Corp"
footer_text: "Confidential - Internal Use Only"
Reference in config:
Theme Options¶
| Option | Type | Description |
|---|---|---|
name |
string | Theme identifier |
primary_color |
hex | Main accent color |
success_color |
hex | Success status color |
error_color |
hex | Error status color |
warning_color |
hex | Warning status color |
bg_color |
hex | Background color |
text_color |
hex | Primary text color |
border_color |
hex | Border color |
code_bg |
hex | Code block background |
font_family |
string | Body font stack |
heading_font |
string | Heading font stack |
code_font |
string | Monospace font stack |
font_size |
string | Base font size |
max_width |
string | Container max width |
logo_url |
string | URL to company logo |
company_name |
string | Company name for branding |
footer_text |
string | Custom footer text |
custom_css |
string | Additional CSS rules |
Renderers¶
Stories can be rendered in multiple formats.
HTML Renderer¶
Default format with interactive, responsive design:
from odibi.story.renderers import HTMLStoryRenderer, get_renderer
from odibi.story.themes import get_theme
# Using the factory
renderer = get_renderer("html")
html = renderer.render(metadata)
# With custom theme
theme = get_theme("dark")
renderer = HTMLStoryRenderer(theme=theme)
html = renderer.render(metadata)
Features: - Collapsible node sections - Status indicators with color coding - Summary statistics dashboard - Responsive layout
JSON Renderer¶
Machine-readable format for API integration:
from odibi.story.renderers import JSONStoryRenderer
renderer = JSONStoryRenderer()
json_str = renderer.render(metadata)
Output structure:
{
"pipeline_name": "process_orders",
"run_id": "20240130_101500",
"started_at": "2024-01-30T10:15:00",
"completed_at": "2024-01-30T10:15:45",
"duration": 45.23,
"total_nodes": 5,
"completed_nodes": 4,
"failed_nodes": 1,
"skipped_nodes": 0,
"success_rate": 80.0,
"total_rows_processed": 15000,
"nodes": [...]
}
Markdown Renderer¶
GitHub-flavored markdown for documentation:
from odibi.story.renderers import MarkdownStoryRenderer
renderer = MarkdownStoryRenderer()
md = renderer.render(metadata)
Renderer Factory¶
Use the factory function to get a renderer by format:
from odibi.story.renderers import get_renderer
# Supported formats: "html", "markdown", "md", "json"
renderer = get_renderer("json")
output = renderer.render(metadata)
Retention¶
Stories are automatically cleaned up based on retention policies.
Retention Configuration¶
story:
retention_days: 30 # Delete stories older than 30 days
retention_count: 100 # Keep maximum 100 stories per pipeline
How Retention Works¶
- Count-based: When story count exceeds
retention_count, oldest stories are deleted first - Time-based: Stories older than
retention_daysare deleted - Both apply: A story is deleted if it exceeds either limit
Storage Structure¶
Stories are organized by pipeline and date:
stories/
├── process_orders/
│ ├── 2024-01-30/
│ │ ├── run_10-15-00.html
│ │ ├── run_10-15-00.json
│ │ ├── run_14-30-00.html
│ │ └── run_14-30-00.json
│ └── 2024-01-31/
│ └── ...
└── process_customers/
└── ...
Remote Storage Cleanup¶
Note: Automatic cleanup for remote storage (ADLS, S3) is not yet implemented. Monitor storage usage manually.
Examples¶
Complete Story Configuration¶
project: DataPipeline
engine: spark
story:
output_path: stories/
max_sample_rows: 10
retention_days: 30
retention_count: 100
theme: corporate
include_samples: true
pipelines:
- pipeline: process_orders
nodes:
- name: read_orders
read:
connection: bronze
path: orders/
- name: transform_orders
transform:
operation: sql
query: |
SELECT order_id, customer_id, amount
FROM {read_orders}
WHERE amount > 0
- name: write_orders
write:
connection: silver
path: orders/
mode: merge
Generated Story Output (JSON)¶
{
"pipeline_name": "process_orders",
"pipeline_layer": "silver",
"run_id": "20240130_101500",
"started_at": "2024-01-30T10:15:00",
"completed_at": "2024-01-30T10:15:45",
"duration": 45.23,
"total_nodes": 3,
"completed_nodes": 3,
"failed_nodes": 0,
"skipped_nodes": 0,
"success_rate": 100.0,
"total_rows_processed": 15000,
"project": "DataPipeline",
"nodes": [
{
"node_name": "read_orders",
"operation": "read",
"status": "success",
"duration": 5.12,
"rows_out": 15500,
"schema_out": ["order_id", "customer_id", "amount", "created_at"]
},
{
"node_name": "transform_orders",
"operation": "transform",
"status": "success",
"duration": 2.34,
"rows_in": 15500,
"rows_out": 15000,
"rows_change": -500,
"rows_change_pct": -3.2,
"columns_removed": ["created_at"]
},
{
"node_name": "write_orders",
"operation": "write",
"status": "success",
"duration": 37.77,
"rows_out": 15000,
"delta_info": {
"version": 42,
"operation": "MERGE",
"operation_metrics": {
"numTargetRowsInserted": 500,
"numTargetRowsUpdated": 14500
}
}
}
]
}
Programmatic Story Generation¶
from odibi.story.generator import StoryGenerator
from odibi.story.metadata import PipelineStoryMetadata
from odibi.story.themes import get_theme
# Create generator
generator = StoryGenerator(
pipeline_name="process_orders",
max_sample_rows=10,
output_path="stories/",
retention_days=30,
retention_count=100,
)
# Generate story after pipeline execution
story_path = generator.generate(
node_results=node_results,
completed=["read_orders", "transform_orders", "write_orders"],
failed=[],
skipped=[],
duration=45.23,
start_time="2024-01-30T10:15:00",
end_time="2024-01-30T10:15:45",
)
# Get summary for alerts
alert_summary = generator.get_alert_summary()
Documentation Stories¶
Generate stakeholder-ready documentation from pipeline config:
from odibi.story.doc_story import DocStoryGenerator
from odibi.config import PipelineConfig
# Load pipeline config
pipeline_config = PipelineConfig.from_yaml("pipeline.yaml")
# Generate documentation
doc_generator = DocStoryGenerator(pipeline_config)
doc_path = doc_generator.generate(
output_path="docs/pipeline_doc.html",
format="html",
include_flow_diagram=True,
)
Related¶
- Alerting - Stories linked in alert payloads
- Quality Gates - Gate results captured in stories
- Schema Tracking - Schema changes in stories
- YAML Schema Reference