Skip to content

Gold Layer Tutorial

The Gold Layer is where business-ready datasets live. Fact tables, aggregations, and semantic metricsβ€”optimized for consumption.

Layer Philosophy

"Answers, not data."

Gold is consumption-optimized. BI tools, dashboards, and ML models read from Gold. Queries should be fast and intuitive.

Principle Why
Denormalized Fewer joins = faster queries
Pre-aggregated Common rollups pre-computed
Business-named Column names match business terms
SK-based Surrogate keys for dimension lookups

Quick Start: Fact Table

The most common Gold pattern is a fact table with dimension lookups:

# pipelines/gold/fact_orders.yaml
pipelines:
  - pipeline: gold_fact_orders
    layer: gold
    nodes:
      - name: fact_orders
        read:
          connection: silver
          table: orders
        pattern:
          type: fact
          params:
            grain: [order_id, line_item_id]
            dimensions:
              - source_column: customer_id
                dimension_table: dim_customer
                dimension_key: customer_id
                surrogate_key: customer_sk
              - source_column: product_id
                dimension_table: dim_product
                dimension_key: product_id
                surrogate_key: product_sk
              - source_column: order_date
                dimension_table: dim_date
                dimension_key: full_date
                surrogate_key: date_sk
            orphan_handling: unknown
        write:
          connection: gold
          table: fact_orders

Common Problems & Solutions

1. "How do I build a star schema fact table?"

Problem: Need to replace natural keys with surrogate keys from dimensions.

Solution: Use the fact pattern with dimension lookups.

nodes:
  - name: fact_orders
    read:
      connection: silver
      table: orders
    pattern:
      type: fact
      params:
        grain: [order_id]            # One row per order
        dimensions:
          - source_column: customer_id
            dimension_table: dim_customer
            dimension_key: customer_id
            surrogate_key: customer_sk
          - source_column: product_id
            dimension_table: dim_product
            dimension_key: product_id
            surrogate_key: product_sk
    write:
      connection: gold
      table: fact_orders

Result:

order_id | customer_sk | product_sk | order_total | order_date
1        | 42          | 15         | 150.00      | 2025-01-15
2        | 42          | 23         | 75.00       | 2025-01-16

See: Fact Pattern


2. "Orders reference customers that don't exist (orphans)"

Problem: Some orders have customer_id values not in dim_customer.

Solution: Configure orphan handling.

pattern:
  type: fact
  params:
    grain: [order_id]
    dimensions:
      - source_column: customer_id
        dimension_table: dim_customer
        dimension_key: customer_id
        surrogate_key: customer_sk
    orphan_handling: unknown         # Assign to unknown member

Options for orphan_handling: | Option | Behavior | |--------|----------| | unknown | Assign SK = 0 (unknown member) | | quarantine | Route to quarantine table | | error | Fail the pipeline | | null | Set SK = NULL |

See: Fact Pattern - Orphan Handling


3. "I need a date dimension"

Problem: Need a standard date dimension for time-based analysis.

Solution: Use the date dimension pattern.

nodes:
  - name: dim_date
    pattern:
      type: date_dimension
      params:
        start_date: "2020-01-01"
        end_date: "2030-12-31"
        columns:
          - date_sk               # Surrogate key (YYYYMMDD)
          - full_date             # DATE type
          - day_of_week           # Monday, Tuesday, ...
          - day_of_month          # 1-31
          - month_name            # January, February, ...
          - month_number          # 1-12
          - quarter               # Q1, Q2, Q3, Q4
          - year                  # 2024, 2025, ...
          - is_weekend            # true/false
          - fiscal_year           # Custom fiscal calendar
    write:
      connection: gold
      table: dim_date

See: Date Dimension Pattern


4. "I need pre-aggregated metrics"

Problem: Dashboards are slowβ€”need pre-computed rollups.

Solution: Use the aggregation pattern.

nodes:
  - name: daily_sales
    read:
      connection: gold
      table: fact_orders
    pattern:
      type: aggregation
      params:
        grain: [date_sk, product_sk]
        measures:
          - name: total_revenue
            expr: "SUM(order_total)"
          - name: order_count
            expr: "COUNT(*)"
          - name: avg_order_value
            expr: "AVG(order_total)"
        incremental: true            # Merge new days
    write:
      connection: gold
      table: agg_daily_sales

Result:

date_sk  | product_sk | total_revenue | order_count | avg_order_value
20250115 | 15         | 1500.00       | 10          | 150.00
20250115 | 23         | 750.00        | 10          | 75.00

See: Aggregation Pattern


5. "I want to define reusable metrics for BI"

Problem: Different dashboards calculate "revenue" differently.

Solution: Define semantic metrics.

semantic:
  metrics:
    - name: revenue
      expression: "SUM(order_total)"
      description: "Total order revenue"
      format: currency

    - name: order_count
      expression: "COUNT(DISTINCT order_id)"
      description: "Number of unique orders"
      format: integer

    - name: aov
      expression: "SUM(order_total) / COUNT(DISTINCT order_id)"
      description: "Average order value"
      format: currency
      depends_on: [revenue, order_count]

  dimensions:
    - name: customer_name
      column: dim_customer.name

    - name: product_category
      column: dim_product.category

    - name: order_month
      column: dim_date.month_name

See: Semantic Layer, Defining Metrics


6. "How do I materialize semantic metrics to tables?"

Problem: Want to query metrics from SQL, not just the API.

Solution: Materialize metrics to Gold tables.

nodes:
  - name: materialized_revenue
    semantic:
      materialize:
        metrics: [revenue, order_count, aov]
        dimensions: [product_category, order_month]
        target: gold.revenue_by_category_month

See: Materializing Metrics


7. "Reference data rarely changesβ€”skip if unchanged"

Problem: Date dimension regenerates every run unnecessarily.

Solution: Skip if content hash is unchanged.

nodes:
  - name: dim_date
    pattern:
      type: date_dimension
      params:
        start_date: "2020-01-01"
        end_date: "2030-12-31"
    write:
      connection: gold
      table: dim_date
      format: delta
      skip_if_unchanged: true        # Skip if content hash matches

How it works: - Before writing, Odibi computes a SHA256 hash of the DataFrame - Compares to hash stored in Delta table metadata - Skips write if hashes match (saves storage and compute)

See: Skip If Unchanged Pattern


8. "How do I validate fact table grain?"

Problem: Want to ensure no duplicate rows per grain key.

Solution: Add grain validation contract.

nodes:
  - name: fact_orders
    read:
      connection: silver
      table: orders
    contracts:
      - type: unique
        columns: [order_id, line_item_id]  # Grain columns
        on_fail: fail
    pattern:
      type: fact
      params:
        grain: [order_id, line_item_id]
    write:
      connection: gold
      table: fact_orders

Gold Layer Checklist

Before exposing to BI:

  • [ ] Star schema? Facts reference dimensions via surrogate keys
  • [ ] Grain validated? No duplicate rows per grain key
  • [ ] Orphans handled? Missing dimension members β†’ unknown or quarantine
  • [ ] Pre-aggregated? Common rollups materialized
  • [ ] Documented? Semantic layer defines metrics and dimensions

Star Schema Example

                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚   dim_date      β”‚
                   β”‚  date_sk (PK)   β”‚
                   β”‚  full_date      β”‚
                   β”‚  month_name     β”‚
                   β”‚  year           β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  dim_customer   β”‚    β”‚   fact_orders    β”‚    β”‚  dim_product    β”‚
β”‚ customer_sk(PK) │◄───│ customer_sk(FK)  │───►│ product_sk (PK) β”‚
β”‚ customer_id     β”‚    β”‚ product_sk (FK)  β”‚    β”‚ product_id      β”‚
β”‚ name            β”‚    β”‚ date_sk (FK)     β”‚    β”‚ name            β”‚
β”‚ city            β”‚    β”‚ order_id         β”‚    β”‚ category        β”‚
β”‚ state           β”‚    β”‚ order_total      β”‚    β”‚ price           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ quantity         β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Next Steps