Quality Gates¶
Batch-level quality validation that evaluates the entire dataset before writing.
Overview¶
While validation tests run per-row, quality gates evaluate aggregate metrics: - Overall pass rate - What percentage of rows passed all tests? - Per-test thresholds - Different requirements for different tests - Row count anomalies - Detect unexpected batch sizes
Configuration¶
Basic Gate Setup¶
nodes:
- name: load_silver_customers
read:
connection: bronze
path: customers
validation:
tests:
- type: not_null
columns: [customer_id]
- type: unique
columns: [customer_id]
gate:
require_pass_rate: 0.95 # 95% must pass
on_fail: abort # Stop if gate fails
Gate Config Options¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
require_pass_rate |
float | No | 0.95 | Minimum % of rows passing ALL tests |
on_fail |
string | No | "abort" | Action on failure |
thresholds |
list | No | [] | Per-test thresholds |
row_count |
object | No | null | Row count validation |
On-Fail Actions¶
| Action | Description |
|---|---|
abort |
Stop pipeline, write nothing (default) |
warn_and_write |
Log warning, write all rows anyway |
write_valid_only |
Write only rows that passed validation |
Per-Test Thresholds¶
Set different requirements for specific tests:
gate:
require_pass_rate: 0.95 # Global: 95% must pass all tests
thresholds:
- test: not_null
min_pass_rate: 0.99 # 99% for not_null (stricter)
- test: unique
min_pass_rate: 1.0 # 100% unique (no duplicates allowed)
- test: email_format # Named test
min_pass_rate: 0.90 # 90% for email format (more lenient)
Row Count Validation¶
Detect anomalies in batch size:
gate:
row_count:
min: 100 # Fail if fewer than 100 rows
max: 1000000 # Fail if more than 1M rows
change_threshold: 0.5 # Fail if count changes >50% vs last run
Row Count Options¶
| Field | Type | Description |
|---|---|---|
min |
int | Minimum expected row count |
max |
int | Maximum expected row count |
change_threshold |
float | Max allowed change vs previous run (0.5 = 50%) |
Complete Example¶
nodes:
- name: process_orders
read:
connection: bronze
path: orders_raw
validation:
tests:
# Critical fields
- type: not_null
name: required_fields
columns: [order_id, customer_id, order_date]
# Uniqueness
- type: unique
name: unique_orders
columns: [order_id]
# Business rules
- type: range
name: valid_amount
column: amount
min: 0
- type: accepted_values
name: valid_status
column: status
values: [pending, completed, cancelled]
gate:
# Global threshold
require_pass_rate: 0.95
# Per-test overrides
thresholds:
- test: required_fields
min_pass_rate: 0.99
- test: unique_orders
min_pass_rate: 1.0
# Row count checks
row_count:
min: 1000
change_threshold: 0.3
# What to do on failure
on_fail: abort
write:
connection: silver
path: orders
format: delta
Combining Gates with Quarantine¶
Use both for comprehensive data quality:
validation:
tests:
- type: not_null
columns: [customer_id]
on_fail: quarantine # Route failures to quarantine
- type: unique
columns: [customer_id]
on_fail: fail # Critical - must pass
quarantine:
connection: silver
path: quarantine/customers
gate:
require_pass_rate: 0.95 # Still need 95% overall
on_fail: abort
Flow:
1. Rows failing not_null are quarantined
2. Gate evaluates remaining rows
3. If <95% pass, pipeline aborts
4. Otherwise, valid rows are written
Gate Failure Alerts¶
Get notified when gates fail:
alerts:
- type: slack
url: "${SLACK_WEBHOOK_URL}"
on_events:
- on_gate_block
metadata:
throttle_minutes: 15
channel: "#data-alerts"
Alert payload includes: - Pass rate achieved vs required - Number of failed rows - Failure reasons
GateFailedError¶
When a gate fails with on_fail: abort, a GateFailedError is raised:
from odibi.exceptions import GateFailedError
try:
pipeline.run()
except GateFailedError as e:
print(f"Gate failed: {e.pass_rate:.1%} < {e.required_rate:.1%}")
print(f"Reasons: {e.failure_reasons}")
Best Practices¶
- Start with high thresholds - Be strict initially, relax as needed
- Use per-test thresholds - Critical tests (uniqueness) should be 100%
- Monitor row count changes - Sudden changes often indicate problems
- Combine with quarantine - Don't lose failed data, route it for analysis
- Set up alerts - Know immediately when gates fail
Related¶
- Quarantine Tables - Route failed rows
- Alerting - Alert on gate failures
- YAML Schema Reference