Skip to content

Unity Catalog Setup Guide

This guide covers connecting Odibi to Databricks Unity Catalog (UC) — the 3-level namespace model (catalog.schema.table) that manages data governance, access control, and storage credentials centrally.


Why Unity Catalog?

Without UC With UC
Mount points (/mnt/...) or ADLS paths Governed table names (catalog.schema.table)
Manual credential config per connection Storage credentials managed centrally
Hive metastore (single default catalog) Multiple catalogs, schemas, fine-grained ACLs
CREATE TABLE ... LOCATION for registration Managed tables — auto-registered

Key insight: UC handles storage and credentials for you. Odibi connections become lightweight pointers to a catalog.schema pair — no account keys, no mount paths.


Prerequisites

  • Databricks workspace with Unity Catalog enabled
  • A UC catalog and schema created (e.g., my_catalog.bronze, my_catalog.silver)
  • Odibi installed with Spark extras: pip install "odibi[spark]"

1. Connection Setup

Use the delta connection type with catalog and schema fields. This maps directly to UC's 3-level namespace:

# project.yaml
project: "my_uc_project"
engine: spark

connections:
  bronze:
    type: delta
    catalog: my_catalog
    schema: bronze

  silver:
    type: delta
    catalog: my_catalog
    schema: silver

  gold:
    type: delta
    catalog: my_catalog
    schema: gold

Each connection owns tier 1 (catalog) and tier 2 (schema). Your pipeline nodes supply tier 3 (table).

How it works under the hood

The DeltaCatalogConnection.get_path(table) method returns my_catalog.silver.dim_customers. This flows through DeltaTable.forName(), spark.table(), and saveAsTable() — all UC-native APIs.


2. Reading Data

Use table: with the table name only — the connection resolves the full 3-level path:

nodes:
  load_customers:
    read:
      connection: bronze
      format: delta
      table: raw_customers    # → my_catalog.bronze.raw_customers

Time Travel (works with UC)

nodes:
  debug_snapshot:
    read:
      connection: silver
      format: delta
      table: dim_customers
      time_travel:
        as_of_version: 5

3. Writing Data

Same pattern — connection provides catalog + schema, table: provides the table name:

nodes:
  write_customers:
    write:
      connection: silver
      format: delta
      table: dim_customers    # → my_catalog.silver.dim_customers
      mode: overwrite

Upsert / Merge

nodes:
  upsert_orders:
    write:
      connection: silver
      format: delta
      table: fact_orders
      mode: upsert
      options:
        keys: [order_id]

Optimized Writes

All Delta write features work with UC managed tables:

nodes:
  write_sales:
    write:
      connection: gold
      format: delta
      table: fact_sales
      mode: append
      partition_by: [sale_year, sale_month]
      zorder_by: [customer_id, product_id]
      table_properties:
        "delta.autoOptimize.optimizeWrite": "true"
        "delta.autoOptimize.autoCompact": "true"

4. Patterns with UC

All Odibi patterns work with UC table names.

SCD2

nodes:
  scd2_customers:
    read:
      connection: bronze
      format: delta
      table: raw_customers

    transformer: scd2
    params:
      target: "my_catalog.silver.dim_customers"
      keys: [customer_id]
      track_cols: [name, address, tier]
      effective_time_col: updated_at

    # No write: block needed — SCD2 is self-contained

SCD2 target format

The target param in SCD2 needs the full 3-level name since it bypasses the connection resolution. Use "catalog.schema.table" format directly.

Merge Transformer

nodes:
  merge_products:
    read:
      connection: bronze
      format: delta
      table: raw_products

    transformer: merge
    params:
      target: "my_catalog.silver.dim_products"
      keys: [product_id]
      strategy: upsert

Dimension Pattern

nodes:
  build_dim_customer:
    read:
      connection: bronze
      format: delta
      table: raw_customers

    pattern: dimension
    params:
      natural_key: customer_id
      surrogate_key: customer_sk
      scd_type: 2
      track_cols: [name, address, phone]
      target: "my_catalog.silver.dim_customers"

    write:
      connection: silver
      format: delta
      table: dim_customers
      mode: overwrite

5. Mixed Connections (UC + SQL Server)

A common real-world setup: read from SQL Server, write to UC.

connections:
  erp_source:
    type: sql_server
    server: erp-server.database.windows.net
    database: production
    auth:
      mode: aad_msi

  silver:
    type: delta
    catalog: my_catalog
    schema: silver

nodes:
  ingest_orders:
    read:
      connection: erp_source
      format: sql
      table: dbo.SalesOrders

    write:
      connection: silver
      format: delta
      table: raw_orders
      mode: append

6. Databricks Notebook Integration

On Databricks, pass the existing spark session — it already has UC configured:

from odibi import Pipeline

# Databricks provides a pre-configured spark session
# No need for ADLS keys, mount points, or credential setup
pipeline = Pipeline.from_yaml(
    "project.yaml",
    "pipeline.yaml",
    spark_session=spark,  # The Databricks-provided session
)

pipeline.run()

What You DON'T Need

Feature Why not needed with UC
register_table: UC managed tables are auto-registered
ADLS connection for lakehouse UC manages storage credentials
base_path: / mount points UC abstracts storage location
configure_spark() calls The Databricks session is pre-configured

Migrating from Hive Metastore

If you have existing Odibi projects using spark_catalog:

# BEFORE (Hive metastore)
connections:
  silver:
    type: delta
    catalog: spark_catalog
    schema: silver_db

# AFTER (Unity Catalog)
connections:
  silver:
    type: delta
    catalog: my_catalog      # ← Your UC catalog name
    schema: silver            # ← Your UC schema name

That's it — change two strings. All pipeline YAML stays the same.


Troubleshooting

"Table or view not found"

  • Verify the catalog and schema exist: SHOW SCHEMAS IN my_catalog
  • Check permissions: SHOW GRANTS ON SCHEMA my_catalog.silver

"Cannot create managed table with LOCATION"

  • You're using register_table: which creates external tables. Remove it — UC managed tables don't need it.

SCD2 target not resolving

  • The target param in SCD2/Merge transformers needs the full 3-level name (catalog.schema.table), not just the table name. The connection isn't used for target resolution in these transformers.