The Odibi Playbook¶

Find your problem. Get the solution.

Most Common Flows¶

I need to...	Go here
Start from zero	Golden Path
Copy a working config	Canonical Examples
Load only new rows	Incremental Stateful
Track dimension history	SCD2 Pattern
Validate my data	Quality Gates

If You Only Read 3 Pages...¶

Golden Path — Zero to running in 10 minutes
Patterns Overview — Common solutions to common problems
YAML Schema — All configuration options

Find Your Problem¶

Bronze Layer: Ingestion¶

"Get data from sources into your lakehouse reliably."

Problem	Pattern	Docs
Load all files from a folder	Append-only	Pattern
Only process new files since last run	Rolling window	Pattern
Track exact high-water mark	Stateful HWM	Pattern
Fail if source is empty or stale	Contracts	YAML
Handle malformed records	Bad records path	YAML
Extract from SQL Server	JDBC read	Example

Silver Layer: Transformation¶

"Clean, deduplicate, and model your data."

Problem	Pattern	Docs
Remove duplicates	Deduplicate transformer	YAML
Keep latest record per key	Dedupe with ordering	YAML
Track dimension changes over time	SCD2	Pattern
Upsert into target table	Merge	Pattern
Validate output data quality	Validation tests	Feature
Route bad rows for review	Quarantine	Feature

Gold Layer: Analytics¶

"Build fact tables, aggregations, and semantic layers."

Problem	Pattern	Docs
Build fact table with SK lookups	Fact pattern	Pattern
Handle orphan records	Orphan handling	Pattern
Pre-aggregate metrics	Aggregation pattern	Pattern
Generate date dimension	Date dimension	Pattern

Decision Trees¶

Choose Your Engine¶

Data size?
├─► < 1GB → engine: pandas
├─► 1-10GB → engine: polars
└─► > 10GB or Delta Lake → engine: spark

Choose Your Incremental Mode¶

Source has timestamps?
├─► Yes → mode: stateful (exact HWM tracking)
└─► No
    └─► Data arrives daily? → mode: rolling_window (lookback)
    └─► Unknown pattern? → write.skip_if_unchanged: true

Choose Your Validation Approach¶

When to check?
├─► Before processing (source quality) → contracts:
└─► After processing (output quality) → validation.tests:
    └─► Need to stop pipeline? → gate.on_fail: abort
    └─► Soft warning OK? → gate.on_fail: warn_and_write

Choose Your SCD Type¶

Need historical state?
├─► No → scd_type: 1 (overwrite)
└─► Yes → scd_type: 2 (versioned)
    └─► Storage concerns? → Consider snapshots instead

Quick Links by Role¶

Data Engineer (Daily Work)¶

CLI Master Guide — Run, debug, diagnose
Cheatsheet — Quick syntax reference
Troubleshooting — Common errors and fixes

Data Engineer (Building Pipelines)¶

Canonical Examples — Copy-paste configs
Patterns Overview — Standard solutions
Writing Transformations — Custom logic

Data Engineer (Production)¶

Production Deployment — Going to prod
Alerting — Slack/email notifications
Performance Tuning — Optimize speed

CLI Quick Reference¶

Task	Command
Run pipeline	`odibi run config.yaml`
Run specific node	`odibi run config.yaml --node name`
Dry run (no writes)	`odibi run config.yaml --dry-run`
Validate config	`odibi validate config.yaml`
View DAG	`odibi graph config.yaml`
Check state	`odibi catalog state config.yaml`
Diagnose issues	`odibi doctor`
List stories	`odibi story list`

All 54 Transformers¶

Category	Transformers
Filtering	`filter_rows`, `distinct`, `sample`, `limit`
Columns	`derive_columns`, `select_columns`, `drop_columns`, `rename_columns`, `cast_columns`
Text	`clean_text`, `trim_whitespace`, `regex_replace`, `split_part`, `concat_columns`
Dates	`extract_date_parts`, `date_add`, `date_trunc`, `date_diff`, `convert_timezone`
Nulls	`fill_nulls`, `coalesce_columns`
Relational	`join`, `union`, `aggregate`, `pivot`, `unpivot`
Window	`window_calculation` (rank, sum, lag, lead)
JSON	`parse_json`, `normalize_json`, `explode_list_column`, `unpack_struct`
Keys	`generate_surrogate_key`, `hash_columns`
Patterns	`scd2`, `merge`, `deduplicate`, `dimension`, `fact`, `aggregation`

Full reference: YAML Schema - Transformers

The Odibi Playbook¶

Most Common Flows¶

If You Only Read 3 Pages...¶

Find Your Problem¶

Bronze Layer: Ingestion¶

Silver Layer: Transformation¶

Gold Layer: Analytics¶

Decision Trees¶

Choose Your Engine¶

Choose Your Incremental Mode¶

Choose Your Validation Approach¶

Choose Your SCD Type¶

Quick Links by Role¶

Data Engineer (Daily Work)¶

Data Engineer (Building Pipelines)¶

Data Engineer (Production)¶

CLI Quick Reference¶

All 54 Transformers¶

See Also¶