Skip to content

The Odibi Playbook

Find your problem. Get the solution.


Most Common Flows

I need to... Go here
Start from zero Golden Path
Copy a working config Canonical Examples
Load only new rows Incremental Stateful
Track dimension history SCD2 Pattern
Validate my data Quality Gates

If You Only Read 3 Pages...

  1. Golden Path — Zero to running in 10 minutes
  2. Patterns Overview — Common solutions to common problems
  3. YAML Schema — All configuration options

Find Your Problem

Bronze Layer: Ingestion

"Get data from sources into your lakehouse reliably."

Problem Pattern Docs
Load all files from a folder Append-only Pattern
Only process new files since last run Rolling window Pattern
Track exact high-water mark Stateful HWM Pattern
Fail if source is empty or stale Contracts YAML
Handle malformed records Bad records path YAML
Extract from SQL Server JDBC read Example

Silver Layer: Transformation

"Clean, deduplicate, and model your data."

Problem Pattern Docs
Remove duplicates Deduplicate transformer YAML
Keep latest record per key Dedupe with ordering YAML
Track dimension changes over time SCD2 Pattern
Upsert into target table Merge Pattern
Validate output data quality Validation tests Feature
Route bad rows for review Quarantine Feature

Gold Layer: Analytics

"Build fact tables, aggregations, and semantic layers."

Problem Pattern Docs
Build fact table with SK lookups Fact pattern Pattern
Handle orphan records Orphan handling Pattern
Pre-aggregate metrics Aggregation pattern Pattern
Generate date dimension Date dimension Pattern

Decision Trees

Choose Your Engine

Data size?
├─► < 1GB → engine: pandas
├─► 1-10GB → engine: polars
└─► > 10GB or Delta Lake → engine: spark

Choose Your Incremental Mode

Source has timestamps?
├─► Yes → mode: stateful (exact HWM tracking)
└─► No
    └─► Data arrives daily? → mode: rolling_window (lookback)
    └─► Unknown pattern? → write.skip_if_unchanged: true

Choose Your Validation Approach

When to check?
├─► Before processing (source quality) → contracts:
└─► After processing (output quality) → validation.tests:
    └─► Need to stop pipeline? → gate.on_fail: abort
    └─► Soft warning OK? → gate.on_fail: warn_and_write

Choose Your SCD Type

Need historical state?
├─► No → scd_type: 1 (overwrite)
└─► Yes → scd_type: 2 (versioned)
    └─► Storage concerns? → Consider snapshots instead

Data Engineer (Daily Work)

Data Engineer (Building Pipelines)

Data Engineer (Production)


CLI Quick Reference

Task Command
Run pipeline odibi run config.yaml
Run specific node odibi run config.yaml --node name
Dry run (no writes) odibi run config.yaml --dry-run
Validate config odibi validate config.yaml
View DAG odibi graph config.yaml
Check state odibi catalog state config.yaml
Diagnose issues odibi doctor
List stories odibi story list

All 54 Transformers

Category Transformers
Filtering filter_rows, distinct, sample, limit
Columns derive_columns, select_columns, drop_columns, rename_columns, cast_columns
Text clean_text, trim_whitespace, regex_replace, split_part, concat_columns
Dates extract_date_parts, date_add, date_trunc, date_diff, convert_timezone
Nulls fill_nulls, coalesce_columns
Relational join, union, aggregate, pivot, unpivot
Window window_calculation (rank, sum, lag, lead)
JSON parse_json, normalize_json, explode_list_column, unpack_struct
Keys generate_surrogate_key, hash_columns
Patterns scd2, merge, deduplicate, dimension, fact, aggregation

Full reference: YAML Schema - Transformers


See Also