Azure Connections Tutorial¶

This tutorial shows how to connect Odibi to Azure data services: Blob Storage, ADLS Gen2, and Azure SQL.

Prerequisites¶

Odibi installed (pip install odibi)
Azure subscription with appropriate permissions
Azure CLI installed (az login completed)

1. Azure Blob Storage¶

Account Key Authentication (Recommended for Production)¶

connections:
  azure_landing:
    type: azure_blob
    account_name: mystorageaccount
    container: landing
    auth:
      mode: account_key
      account_key: "${AZURE_STORAGE_KEY}"

Managed Identity (Databricks/Synapse)¶

connections:
  azure_landing:
    type: azure_blob
    account_name: mystorageaccount
    container: landing
    auth:
      mode: aad_msi

SAS Token¶

connections:
  azure_landing:
    type: azure_blob
    account_name: mystorageaccount
    container: landing
    auth:
      mode: sas
      sas_token: "${AZURE_SAS_TOKEN}"

Connection String¶

connections:
  azure_landing:
    type: azure_blob
    connection_string: "${AZURE_STORAGE_CONNECTION_STRING}"
    container: landing

2. ADLS Gen2 (Hierarchical Namespace)¶

For Delta Lake on Azure, use ADLS Gen2:

connections:
  adls_bronze:
    type: azure_blob
    account_name: mydatalake
    container: bronze
    is_adls_gen2: true  # Enable hierarchical namespace
    auth:
      mode: account_key
      account_key: "${AZURE_STORAGE_KEY}"

Delta Lake on ADLS¶

connections:
  delta_silver:
    type: delta
    base_path: abfss://silver@mydatalake.dfs.core.windows.net/
    # Auth inherited from Spark session config

3. Azure SQL Database¶

SQL Authentication¶

connections:
  azure_sql:
    type: sql_server
    host: myserver.database.windows.net
    database: mydb
    auth:
      mode: sql_login
      username: "${SQL_USER}"
      password: "${SQL_PASSWORD}"

Azure AD Authentication¶

connections:
  azure_sql:
    type: sql_server
    host: myserver.database.windows.net
    database: mydb
    auth:
      mode: aad_password
      tenant_id: "${AZURE_TENANT_ID}"
      client_id: "${AZURE_CLIENT_ID}"
      client_secret: "${AZURE_CLIENT_SECRET}"

Managed Identity¶

connections:
  azure_sql:
    type: sql_server
    host: myserver.database.windows.net
    database: mydb
    auth:
      mode: aad_msi

4. Complete Azure Project¶

Here's a full project using Azure services:

# project.yaml
project: "azure_data_platform"
engine: spark

connections:
  # Landing zone (raw files)
  landing:
    type: azure_blob
    account_name: "${STORAGE_ACCOUNT}"
    container: landing
    auth:
      mode: key_vault
      key_vault: kv-data
      secret: storage-account-key

  # Bronze layer (Delta)
  bronze:
    type: delta
    base_path: abfss://bronze@${STORAGE_ACCOUNT}.dfs.core.windows.net/

  # Silver layer (Delta)  
  silver:
    type: delta
    base_path: abfss://silver@${STORAGE_ACCOUNT}.dfs.core.windows.net/

  # Source database
  erp_sql:
    type: sql_server
    host: erp-server.database.windows.net
    database: erp_prod
    auth:
      mode: aad_msi

story:
  connection: landing
  path: _odibi/stories/

system:
  connection: landing
  path: _odibi/catalog/

pipelines:
  - pipeline: bronze_ingest
    layer: bronze
    nodes:
      # Ingest from SQL
      - name: customers_raw
        read:
          connection: erp_sql
          format: sql
          table: dbo.Customers
        write:
          connection: bronze
          table: raw_customers

      # Ingest from files
      - name: orders_raw
        read:
          connection: landing
          format: csv
          path: orders/*.csv
        write:
          connection: bronze
          table: raw_orders

5. Environment Variables¶

Store secrets in environment variables or Azure Key Vault:

# .env (local development - git-ignored!)
AZURE_TENANT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AZURE_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AZURE_CLIENT_SECRET=your-secret-here
STORAGE_ACCOUNT=mydatalake
SQL_USER=odibi_user
SQL_PASSWORD=secure-password

Using Azure Key Vault¶

# Reference Key Vault secrets with ${kv:secret-name}
connections:
  azure_sql:
    type: sql_server
    host: myserver.database.windows.net
    database: mydb
    auth:
      mode: sql_login
      username: "${kv:sql-username}"
      password: "${kv:sql-password}"

6. Spark Configuration for Azure¶

When using Spark with Azure, configure the session:

# conf/spark_azure.py
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("OdibiAzure") \
    .config("spark.jars.packages", "io.delta:delta-core_2.12:2.4.0") \
    .config("fs.azure.account.auth.type", "OAuth") \
    .config("fs.azure.account.oauth.provider.type", 
            "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") \
    .config("fs.azure.account.oauth2.client.id", os.environ["AZURE_CLIENT_ID"]) \
    .config("fs.azure.account.oauth2.client.secret", os.environ["AZURE_CLIENT_SECRET"]) \
    .config("fs.azure.account.oauth2.client.endpoint", 
            f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}/oauth2/token") \
    .getOrCreate()

7. Running on Azure Databricks¶

Cluster Configuration¶

Add to cluster Spark config:

fs.azure.account.auth.type OAuth
fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id {{secrets/odibi/client-id}}
fs.azure.account.oauth2.client.secret {{secrets/odibi/client-secret}}
fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/{{secrets/odibi/tenant-id}}/oauth2/token

Unity Catalog¶

With Unity Catalog, use catalog-based connections:

connections:
  bronze:
    type: delta
    catalog: main
    schema: bronze

  silver:
    type: delta
    catalog: main
    schema: silver

8. Running on Azure Synapse¶

# Synapse notebook
%%pyspark
from odibi import PipelineManager

# Synapse auto-configures ADLS access via linked services
manager = PipelineManager.from_yaml("/synapse/project.yaml")
manager.run()

Common Issues¶

"403 Forbidden" on Blob Access¶

Check service principal has "Storage Blob Data Contributor" role
Verify container name is correct
Check firewall allows your IP

Verify Azure AD admin is set on SQL server
Check firewall allows Azure services
For MSI, ensure the managed identity has db_datareader/db_datawriter

Slow ADLS Performance¶

Enable Delta caching:

performance:
  delta_table_properties:
    delta.autoOptimize.optimizeWrite: true

Next Steps¶

Spark Engine Tutorial - Spark-specific features
Getting Started - Basic Odibi concepts
Performance Tuning - Optimize large pipelines