Data Validation¶
Odibi provides a comprehensive validation framework to ensure data quality at every stage of your pipeline.
Validation Layers¶
| Layer | When it Runs | Purpose |
|---|---|---|
| Contracts | Before transform | Fail-fast checks on input data |
| Validation Tests | After transform | Row-level data quality checks |
| Quality Gates | After validation | Batch-level thresholds and pass rates |
| FK Validation | Post-pipeline | Referential integrity between tables |
Quick Links¶
Guides¶
- Contracts - Pre-transform fail-fast checks (always fail on violation)
- Validation Tests - Post-transform row-level checks (fail/warn/quarantine)
- Quality Gates - Batch-level thresholds and pass rates
- Quarantine - Capture and review invalid records
- FK Validation - Referential integrity between fact and dimension tables
Reference¶
- Contracts Schema - All contract types
- Validation Schema - Full validation configuration
Choosing the Right Validation¶
┌─────────────────────────────────────────────────────────────┐
│ INPUT DATA │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ CONTRACTS (Pre-Transform) │
│ • not_null on required columns │
│ • row_count min/max │
│ • freshness checks │
│ → ALWAYS FAILS on violation │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ TRANSFORMATION │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ VALIDATION TESTS (Post-Transform) │
│ • Range checks, format validation │
│ • Custom SQL conditions │
│ → Can WARN, QUARANTINE, or FAIL │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ QUALITY GATES (Batch-Level) │
│ • Pass rate thresholds (e.g., 95%) │
│ • Row count anomaly detection │
│ → Can ABORT, WARN, or WRITE_VALID_ONLY │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ OUTPUT DATA │
└─────────────────────────────────────────────────────────────┘
Example: Complete Validation Setup¶
nodes:
- name: process_orders
read:
connection: staging
path: orders
# 1. Contracts - Fail fast on bad input
contracts:
- type: not_null
columns: [order_id, customer_id]
- type: row_count
min: 100
# 2. Transformation
transform:
steps:
- sql: "SELECT * FROM df WHERE amount > 0"
# 3. Validation Tests - Check output quality
validation:
mode: warn
tests:
- type: range
column: amount
min: 0
max: 1000000
on_fail: quarantine
- type: unique
columns: [order_id]
on_fail: quarantine
quarantine:
connection: warehouse
path: quarantine/orders
# 4. Quality Gate - Batch-level threshold
gate:
require_pass_rate: 0.95
on_fail: abort
write:
connection: warehouse
path: fact_orders
See Also¶
- Getting Started: Validation - Tutorial walkthrough
- Fact Pattern - Orphan handling in fact tables