Recipes¶

TL;DR

Recipes are reusable node-level templates with variable substitution. Write a pattern once, reuse it everywhere. Zero boilerplate, full control.

Why Recipes?¶

In a typical data platform, you'll write the same patterns dozens of times:

Bronze: "Load CSV, normalize columns, dedup, write Parquet" — repeated per source table
Silver: "Dedup, validate, merge into target" — repeated per dimension/fact
Gold: "Group by grain, aggregate measures" — repeated per report

Without recipes, each node is 15–30 lines of YAML. With recipes, it's 3–5 lines.

Before (without recipes)¶

nodes:
  - name: dedup_users
    transformer: deduplicate
    params:
      keys: [user_id]
      order_by: "updated_at DESC"
    write:
      connection: silver
      format: parquet
      mode: overwrite

  - name: dedup_orders
    transformer: deduplicate
    params:
      keys: [order_id]
      order_by: "created_at DESC"
    write:
      connection: silver
      format: parquet
      mode: overwrite

After (with recipes)¶

nodes:
  - name: dedup_users
    recipe: silver_dedup
    recipe_vars:
      keys: [user_id]
      order_by: "updated_at DESC"
    write:
      connection: silver
      format: parquet

  - name: dedup_orders
    recipe: silver_dedup
    recipe_vars:
      keys: [order_id]
      order_by: "created_at DESC"
    write:
      connection: silver
      format: parquet

How It Works¶

flowchart LR
    A[YAML with recipe:] --> B[Recipe Resolution]
    B --> C[Variable Substitution]
    C --> D[Deep Merge with Node Overrides]
    D --> E[Expanded Node Config]
    E --> F[Pydantic Validation]
    F --> G[Pipeline Execution]

Recipes are resolved as a YAML-to-YAML preprocessing step before Pydantic validation:

Load built-in recipes + any inline recipes in your YAML
Substitute ${recipe.var_name} placeholders with your recipe_vars
Deep merge the recipe template with your node-level overrides (your overrides win)
Pass the expanded config to the normal pipeline engine

This means recipes work with every engine (Pandas, Spark, Polars) and every pattern — they're purely a configuration convenience.

Quick Start¶

Using a Built-in Recipe¶

pipelines:
  - pipeline: silver_pipeline
    nodes:
      - name: dedup_customers
        recipe: silver_dedup
        recipe_vars:
          keys: [customer_id]
          order_by: "updated_at DESC"
        write:
          connection: silver_output
          format: parquet

Defining Your Own Recipe¶

recipes:
  my_clean_and_write:
    description: "Standard cleanup and write to silver"
    required_vars: [target_table]
    optional_vars:
      write_mode: overwrite
    template:
      transform:
        steps:
          - function: normalize_column_names
          - function: trim_whitespace
      write:
        connection: silver
        format: delta
        table: "${recipe.target_table}"
        mode: "${recipe.write_mode}"

pipelines:
  - pipeline: silver
    nodes:
      - name: clean_users
        recipe: my_clean_and_write
        recipe_vars:
          target_table: dim_users

      - name: clean_orders
        recipe: my_clean_and_write
        recipe_vars:
          target_table: fact_orders
          write_mode: append    # Override the default

Variable Substitution¶

Variables use the ${recipe.var_name} syntax, chosen to avoid collision with existing ${ENV_VAR}, ${vars.xxx}, and ${date:xxx} patterns.

Type Preservation¶

When a placeholder is the entire value, the original type is preserved:

# In recipe template:
params:
  keys: "${recipe.keys}"         # If keys is a list, stays a list
  limit: "${recipe.limit}"       # If limit is an int, stays an int

# In recipe_vars:
recipe_vars:
  keys: [user_id, email]         # → list preserved
  limit: 100                     # → int preserved

When a placeholder is embedded in a string, it's stringified:

# Template:
table: "silver_${recipe.table_name}"

# recipe_vars:
recipe_vars:
  table_name: customers    # → "silver_customers"

Required vs Optional Variables¶

recipes:
  my_recipe:
    required_vars: [keys, target]           # Must be provided
    optional_vars:                            # Have defaults
      write_mode: overwrite
      format: parquet
    template:
      params:
        keys: "${recipe.keys}"
      write:
        format: "${recipe.format}"
        mode: "${recipe.write_mode}"
        table: "${recipe.target}"

Missing required variables produce clear error messages:

ValueError: Node 'my_node': Recipe 'my_recipe' requires variables ['target'].
Provided: ['keys']. Add the missing variables to recipe_vars.

Deep Merge Rules¶

When a node specifies both a recipe and additional fields, they are deep-merged:

Scenario	Rule	Example
`dict + dict`	Recursive merge (node wins on conflicts)	Node's `write.mode` overrides recipe's `write.mode`, but recipe's `write.format` is kept
`list`	Node replaces recipe list entirely	Node's `steps: [a, b]` replaces recipe's `steps: [x, y, z]`
`scalar`	Node wins	Node's `transformer: custom` overrides recipe's `transformer: dedup`

Example: Overriding Part of a Recipe¶

- name: api_weather
  recipe: api_bronze_load
  recipe_vars:
    source_name: weather_api
  read:
    connection: weather_api_conn       # Added by node
    options:
      http:
        timeout_s: 60                  # Overrides recipe default (30)
      # retry config from recipe is KEPT (deep merge)

Recipe Inheritance¶

Recipes can extend other recipes using the extends field. The child recipe inherits the parent's template, required_vars, and optional_vars, with the child's definitions taking precedence.

recipes:
  base_silver:
    description: "Base silver pattern"
    optional_vars:
      write_mode: overwrite
    template:
      transform:
        steps:
          - function: normalize_column_names
          - function: trim_whitespace
      write:
        mode: "${recipe.write_mode}"

  silver_with_dedup:
    extends: base_silver
    description: "Silver + deduplication"
    required_vars: [keys, order_by]
    template:
      transformer: deduplicate
      params:
        keys: "${recipe.keys}"
        order_by: "${recipe.order_by}"

Using silver_with_dedup gives you the cleaning steps from base_silver plus the dedup transformer — without repeating the base template.

Multi-Level Inheritance¶

recipes:
  level_1:
    template:
      read: { format: csv }

  level_2:
    extends: level_1
    template:
      write: { mode: overwrite }

  level_3:
    extends: level_2
    template:
      transformer: noop

level_3 gets read.format, write.mode, and transformer — all merged from the chain.

Circular Inheritance

Circular references (A extends B, B extends A) are detected and raise a clear error at resolution time.

Built-in Recipes¶

Odibi ships with 12 built-in recipes covering the most common medallion architecture patterns.

Bronze Layer¶

Recipe	Description	Required Vars
`api_bronze_load`	HTTP API → Parquet with metadata, retry, rate limiting	`source_name`
`csv_bronze_load`	CSV → Parquet with column normalization and dedup	`keys`
`database_bronze_load`	Database → Parquet with incremental HWM loading	`hwm_column`
`reference_snapshot`	Full-refresh reference data with normalization	—

Silver Layer¶

Recipe	Description	Required Vars
`silver_clean`	Normalize names, trim whitespace	—
`silver_dedup`	Deterministic deduplication	`keys`, `order_by`
`silver_merge`	Dedup + upsert/merge into target	`keys`, `target`
`silver_scd2`	SCD Type 2 with standard columns	`keys`, `track_cols`, `effective_time_col`, `target`
`silver_validate_and_write`	Quality gates + quarantine routing	`tests`
`silver_full_conform`	Kitchen-sink: normalize, trim, dedup, merge	`keys`, `target`

Gold Layer¶

Recipe	Description	Required Vars
`gold_aggregate`	Grain + measures aggregation with audit columns	`grain`, `measures`
`gold_pivot_report`	Pivot data for reporting output	`index`, `columns`, `values`

Exploring Built-in Recipes¶

Use the CLI to discover and explore recipes:

# List all recipes
odibi list recipes

# Get full details for a recipe
odibi explain silver_scd2

# Get a copy-pasteable YAML template
odibi templates show silver_dedup

Inline Recipes¶

Define project-specific recipes directly in your YAML file under the recipes: key:

recipes:
  my_custom_recipe:
    description: "Company-specific pattern"
    required_vars: [source, target]
    template:
      read:
        connection: "${recipe.source}"
      transform:
        steps:
          - function: normalize_column_names
      write:
        connection: "${recipe.target}"
        format: delta
        mode: overwrite

pipelines:
  - pipeline: daily_load
    nodes:
      - name: load_products
        recipe: my_custom_recipe
        recipe_vars:
          source: erp_connection
          target: silver_connection

Override Priority

Inline recipes with the same name as a built-in will override the built-in. This lets you customize built-in behavior for your project.

Validation¶

CLI Validation¶

The odibi validate command now checks recipe references:

odibi validate my_pipeline.yaml

It verifies:

Referenced recipes exist (built-in or inline)
All required variables are provided
Variable types are correct

Error Messages¶

Recipes are designed to give business-analyst-friendly error messages:

✗ my_pipeline.yaml has errors:
  - recipes: Node 'load_orders': Recipe 'silver_dedup' requires variables
    ['order_by']. Provided: ['keys']. Add the missing variables to recipe_vars.

Complete Example¶

Here's a real-world pipeline using multiple recipes:

recipes:
  company_bronze:
    extends: csv_bronze_load
    description: "Standard CSV ingestion for our data warehouse"
    optional_vars:
      keys: [id]
    template:
      write:
        connection: bronze_lake
        format: delta

pipelines:
  - pipeline: daily_warehouse
    nodes:
      # Bronze: Ingest raw CSVs
      - name: ingest_customers
        recipe: company_bronze
        recipe_vars:
          keys: [customer_id]
        read:
          connection: sftp_source
          path: customers/*.csv

      - name: ingest_orders
        recipe: company_bronze
        recipe_vars:
          keys: [order_id]
        read:
          connection: sftp_source
          path: orders/*.csv

      # Silver: Clean and conform
      - name: conform_customers
        recipe: silver_full_conform
        recipe_vars:
          keys: [customer_id]
          target: silver_lake/dim_customers
        depends_on: [ingest_customers]
        read:
          connection: bronze_lake
          path: customers

      - name: conform_orders
        recipe: silver_merge
        recipe_vars:
          keys: [order_id]
          target: silver_lake/fact_orders
        depends_on: [ingest_orders]
        read:
          connection: bronze_lake
          path: orders

      # Gold: Aggregate for reporting
      - name: daily_revenue
        recipe: gold_aggregate
        recipe_vars:
          grain: [order_date, product_category]
          measures:
            revenue: { column: amount, function: sum }
            order_count: { column: order_id, function: count }
        depends_on: [conform_orders]
        read:
          connection: silver_lake
          path: fact_orders
        write:
          connection: gold_lake
          table: daily_revenue_by_category

Architecture Notes¶

Resolution Order¶

Built-in recipes loaded from odibi/recipes/builtins/*.yaml
Inline recipes loaded from your YAML's recipes: key (overrides built-ins)
Inheritance chains resolved (extends)
Node recipe references expanded (variable substitution + deep merge)
recipes: key removed from config (consumed)
Expanded config passed to Pydantic validation → pipeline execution

Design Decisions¶

Decision	Rationale
`${recipe.var}` syntax	Avoids collision with `${ENV_VAR}`, `${vars.x}`, `${date:x}`
YAML-to-YAML preprocessing	Works with all engines, no runtime overhead
Deep merge (not shallow)	Lets you override one nested field without losing others
Lists replace entirely	Prevents ambiguity about list ordering/dedup
Child wins on conflicts	Matches CSS/inheritance mental model
Built-ins are YAML files	Easy to read, copy, and customize

Extending the System¶

Adding a new built-in recipe:

Add your recipe to the appropriate YAML file in odibi/recipes/builtins/
Follow the naming convention: {layer}_{action} (e.g., silver_validate)
Always include a description
Declare required_vars for anything the user MUST provide
Use optional_vars with sensible defaults for everything else

Creating a custom recipe YAML file (for teams):

# my_team_recipes.yaml — import via project config
recipes:
  team_standard_load:
    description: "Our team's standard load pattern"
    required_vars: [table_name]
    template:
      # ... your template