Example 2: Incremental SQL Ingestion (Database → Raw with HWM)¶
Load data from a SQL Server database incrementally using high-water mark (HWM).
When to Use¶
- Source table has millions of rows
- Source has an
updated_atorcreated_attimestamp column - You want to load only new/changed rows after the first full load
How It Works¶
- First run: No state exists → Full load (all rows)
- Subsequent runs: State exists → Only rows where
updated_at > last_hwm - State is persisted to
_systemcatalog automatically
Full Config¶
# odibi.yaml
project: incremental_orders
engine: spark # Use Spark for JDBC
connections:
source_db:
type: sql_server
host: ${SQL_SERVER_HOST}
database: production
auth:
mode: sql_login
username: ${SQL_USER}
password: ${SQL_PASSWORD}
lake:
type: local
base_path: ./data/lake
story:
connection: lake
path: stories
system:
connection: lake
path: _system
pipelines:
- pipeline: bronze_orders
layer: bronze
nodes:
- name: ingest_orders
read:
connection: source_db
format: jdbc
table: dbo.orders
incremental:
mode: stateful
column: updated_at
# Optional: limit first load to recent data
# first_run_lookback: "365d"
write:
connection: lake
format: delta
path: bronze/orders
mode: append
Environment Variables¶
Set these before running:
export SQL_SERVER_HOST=your-server.database.windows.net
export SQL_USER=reader
export SQL_PASSWORD=your-password
Run¶
# First run: Full load
odibi run odibi.yaml
# Second run: Incremental (only new rows)
odibi run odibi.yaml
Check State¶
Output:
Schema Reference¶
| Key | Docs |
|---|---|
connections[].type: sql_server |
SQLServerConnectionConfig |
incremental.mode: stateful |
IncrementalConfig |
write.mode: append |
WriteConfig |
Common Patterns¶
Choose rolling_window instead if:
- You don't need exact row-level tracking
- Source rows can be updated without changing updated_at
- You want simpler state management