Unity Catalog Setup Guide¶

This guide covers connecting Odibi to Databricks Unity Catalog (UC) — the 3-level namespace model (catalog.schema.table) that manages data governance, access control, and storage credentials centrally.

Why Unity Catalog?¶

Without UC	With UC
Mount points (`/mnt/...`) or ADLS paths	Governed table names (`catalog.schema.table`)
Manual credential config per connection	Storage credentials managed centrally
Hive metastore (single `default` catalog)	Multiple catalogs, schemas, fine-grained ACLs
`CREATE TABLE ... LOCATION` for registration	Managed tables — auto-registered

Key insight: UC handles storage and credentials for you. Odibi connections become lightweight pointers to a catalog.schema pair — no account keys, no mount paths.

Prerequisites¶

Databricks workspace with Unity Catalog enabled
A UC catalog and schema created (e.g., my_catalog.bronze, my_catalog.silver)
Odibi installed with Spark extras: pip install "odibi[spark]"

1. Connection Setup¶

Use the delta connection type with catalog and schema fields. This maps directly to UC's 3-level namespace:

# project.yaml
project: "my_uc_project"
engine: spark

connections:
  bronze:
    type: delta
    catalog: my_catalog
    schema: bronze

  silver:
    type: delta
    catalog: my_catalog
    schema: silver

  gold:
    type: delta
    catalog: my_catalog
    schema: gold

Each connection owns tier 1 (catalog) and tier 2 (schema). Your pipeline nodes supply tier 3 (table).

How it works under the hood

The DeltaCatalogConnection.get_path(table) method returns my_catalog.silver.dim_customers. This flows through DeltaTable.forName(), spark.table(), and saveAsTable() — all UC-native APIs.

2. Reading Data¶

Use table: with the table name only — the connection resolves the full 3-level path:

nodes:
  load_customers:
    read:
      connection: bronze
      format: delta
      table: raw_customers    # → my_catalog.bronze.raw_customers

Time Travel (works with UC)¶

nodes:
  debug_snapshot:
    read:
      connection: silver
      format: delta
      table: dim_customers
      time_travel:
        as_of_version: 5

3. Writing Data¶

Same pattern — connection provides catalog + schema, table: provides the table name:

nodes:
  write_customers:
    write:
      connection: silver
      format: delta
      table: dim_customers    # → my_catalog.silver.dim_customers
      mode: overwrite

Upsert / Merge¶

nodes:
  upsert_orders:
    write:
      connection: silver
      format: delta
      table: fact_orders
      mode: upsert
      options:
        keys: [order_id]

Optimized Writes¶

All Delta write features work with UC managed tables:

nodes:
  write_sales:
    write:
      connection: gold
      format: delta
      table: fact_sales
      mode: append
      partition_by: [sale_year, sale_month]
      zorder_by: [customer_id, product_id]
      table_properties:
        "delta.autoOptimize.optimizeWrite": "true"
        "delta.autoOptimize.autoCompact": "true"

4. Patterns with UC¶

All Odibi patterns work with UC table names.

SCD2¶

nodes:
  scd2_customers:
    read:
      connection: bronze
      format: delta
      table: raw_customers

    transformer: scd2
    params:
      target: "my_catalog.silver.dim_customers"
      keys: [customer_id]
      track_cols: [name, address, tier]
      effective_time_col: updated_at

    # No write: block needed — SCD2 is self-contained

SCD2 target format

The target param in SCD2 needs the full 3-level name since it bypasses the connection resolution. Use "catalog.schema.table" format directly.

Merge Transformer¶

nodes:
  merge_products:
    read:
      connection: bronze
      format: delta
      table: raw_products

    transformer: merge
    params:
      target: "my_catalog.silver.dim_products"
      keys: [product_id]
      strategy: upsert

Dimension Pattern¶

nodes:
  build_dim_customer:
    read:
      connection: bronze
      format: delta
      table: raw_customers

    pattern: dimension
    params:
      natural_key: customer_id
      surrogate_key: customer_sk
      scd_type: 2
      track_cols: [name, address, phone]
      target: "my_catalog.silver.dim_customers"

    write:
      connection: silver
      format: delta
      table: dim_customers
      mode: overwrite

5. Mixed Connections (UC + SQL Server)¶

A common real-world setup: read from SQL Server, write to UC.

connections:
  erp_source:
    type: sql_server
    host: erp-server.database.windows.net
    database: production
    auth:
      mode: aad_msi

  silver:
    type: delta
    catalog: my_catalog
    schema: silver

nodes:
  ingest_orders:
    read:
      connection: erp_source
      format: sql
      table: dbo.SalesOrders

    write:
      connection: silver
      format: delta
      table: raw_orders
      mode: append

6. Databricks Notebook Integration¶

On Databricks, pass the existing spark session — it already has UC configured:

from odibi import Pipeline

# Databricks provides a pre-configured spark session
# No need for ADLS keys, mount points, or credential setup
pipeline = Pipeline.from_yaml(
    "project.yaml",
    "pipeline.yaml",
    spark_session=spark,  # The Databricks-provided session
)

pipeline.run()

What You DON'T Need¶

Feature	Why not needed with UC
`register_table:`	UC managed tables are auto-registered
ADLS connection for lakehouse	UC manages storage credentials
`base_path:` / mount points	UC abstracts storage location
`configure_spark()` calls	The Databricks session is pre-configured

Migrating from Hive Metastore¶

If you have existing Odibi projects using spark_catalog:

# BEFORE (Hive metastore)
connections:
  silver:
    type: delta
    catalog: spark_catalog
    schema: silver_db

# AFTER (Unity Catalog)
connections:
  silver:
    type: delta
    catalog: my_catalog      # ← Your UC catalog name
    schema: silver            # ← Your UC schema name

That's it — change two strings. All pipeline YAML stays the same.

Troubleshooting¶

"Table or view not found"¶

Verify the catalog and schema exist: SHOW SCHEMAS IN my_catalog
Check permissions: SHOW GRANTS ON SCHEMA my_catalog.silver

"Cannot create managed table with LOCATION"¶

You're using register_table: which creates external tables. Remove it — UC managed tables don't need it.

SCD2 target not resolving¶

The target param in SCD2/Merge transformers needs the full 3-level name (catalog.schema.table), not just the table name. The connection isn't used for target resolution in these transformers.