Enhanced Alerting¶
Real-time notifications for pipeline events with throttling, event-specific payloads, @mentions, and support for Slack, Teams, and generic webhooks.
Overview¶
Odibi's alerting system provides:
- Multiple channels: Slack, Teams (Power Automate), generic webhooks
- Event-specific payloads: Contextual information for each event type
- Throttling: Prevent alert spam with rate limiting
- @Mentions: Tag users on alerts (Teams)
- Rich formatting: Adaptive Cards for Teams, Block Kit for Slack
Quick Start¶
alerts:
- type: teams
url: "${TEAMS_WEBHOOK_URL}"
on_events:
- on_failure
metadata:
mention_on_failure: "data-team@company.com"
Configuration¶
Alert Config Options¶
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | Yes | Alert type: slack, teams, teams_workflow, webhook |
url |
string | Yes | Webhook URL |
on_events |
list | No | Events to trigger on (default: on_failure) |
metadata |
object | No | Extra settings (throttling, mentions, etc.) |
Note:
teams_workflowis an alias forteams. Both use Power Automate workflow format.
Event Types¶
| Event | Description | Triggered When |
|---|---|---|
on_start |
Pipeline started | Pipeline execution begins |
on_success |
Pipeline completed | Pipeline finishes without errors |
on_failure |
Pipeline failed | Pipeline encounters an error |
on_quarantine |
Rows quarantined | Validation routes bad rows to quarantine |
on_gate_block |
Gate blocked | Quality gate prevents pipeline from continuing |
on_threshold_breach |
Threshold exceeded | A metric exceeds its configured threshold |
Metadata Reference¶
The metadata field controls throttling, mentions, and channel routing.
| Key | Type | Default | Description |
|---|---|---|---|
throttle_minutes |
int | 15 |
Minimum minutes between repeated alerts for the same pipeline+event |
max_per_hour |
int | 10 |
Maximum alerts of the same type per hour |
channel |
string | — | Target channel override (Slack only) |
mention |
string or list | — | User email(s) to @mention on all events (Teams only) |
mention_on_failure |
string or list | — | User email(s) to @mention on failure events only (Teams only) |
All metadata keys are optional
If not specified, defaults are applied automatically.
@Mentions (Teams)¶
Tag users directly in Teams alerts to ensure they see critical notifications.
Single user:
Multiple users:
Mention on all events vs failures only:
metadata:
mention: "ops-team@company.com" # @mention on ALL events
mention_on_failure: "on-call@company.com" # @mention ONLY on failures
When a failure event occurs, both mention and mention_on_failure users are tagged.
Throttling¶
Prevent alert spam with time-based and rate-based throttling:
alerts:
- type: teams
url: "${TEAMS_WEBHOOK_URL}"
on_events:
- on_failure
- on_quarantine
metadata:
throttle_minutes: 15 # Min 15 minutes between same alerts
max_per_hour: 10 # Max 10 alerts per hour
Throttle Key¶
Throttling is applied per unique combination of:
- Pipeline name
- Event type
So a process_orders pipeline failing twice in 5 minutes sends one alert, but a different pipeline can still alert.
Event-Specific Payloads¶
Quarantine Events¶
Triggered when validation routes bad rows to a quarantine table.
Payload includes:
- Rows quarantined count
- Quarantine table path
- Failed test names
- Node name
Gate Block Events¶
Triggered when a quality gate prevents the pipeline from continuing.
Payload includes:
- Pass rate achieved
- Required pass rate
- Number of failed rows
- Failure reasons
Threshold Breach Events¶
Triggered when a metric exceeds its configured threshold.
Payload includes:
- Metric name
- Threshold value
- Actual value
- Node name
Complete Example¶
A comprehensive alerting setup with separate channels for different severity levels:
project: SalesAnalytics
engine: spark
alerts:
# Critical alerts - immediate notification with @mentions
- type: teams
url: "${TEAMS_WEBHOOK_URL_CRITICAL}"
on_events:
- on_failure
- on_gate_block
metadata:
throttle_minutes: 5
max_per_hour: 20
mention_on_failure:
- "data-lead@company.com"
- "on-call@company.com"
# Informational alerts - pipeline lifecycle
- type: teams
url: "${TEAMS_WEBHOOK_URL_INFO}"
on_events:
- on_start
- on_success
metadata:
throttle_minutes: 1
max_per_hour: 50
mention: "data-team@company.com"
# Data quality alerts - quarantine notifications
- type: teams
url: "${TEAMS_WEBHOOK_URL_QUALITY}"
on_events:
- on_quarantine
- on_threshold_breach
metadata:
throttle_minutes: 30
max_per_hour: 10
# External integration - send to monitoring system
- type: webhook
url: "${MONITORING_WEBHOOK_URL}"
on_events:
- on_failure
- on_success
metadata:
throttle_minutes: 0 # No throttling for monitoring
connections:
# ...
pipelines:
- pipeline: daily_sales
nodes:
- name: validate_transactions
validation:
tests:
- type: not_null
columns: [transaction_id, amount]
on_fail: quarantine
quarantine:
connection: silver
path: quarantine/transactions
gate:
require_pass_rate: 0.95
Teams Configuration¶
Power Automate Workflows¶
Teams alerts use Power Automate workflows (classic Incoming Webhooks were retired December 2025).
Setup:
- In Teams, go to your channel
- Click
...→ Workflows → Post to a channel when a webhook request is received - Configure the workflow and copy the webhook URL
- Use the URL in your alert config
Private Channels
Power Automate bots may not have access to private channels by default. Use a standard (public) channel or explicitly add the Workflows app to your private channel.
Adaptive Card Format¶
Teams alerts use Adaptive Cards with:
- Color-coded header (red for failures, green for success, orange for warnings)
- Fact set with key metrics (duration, rows processed, etc.)
- @mentions for tagged users
- Story path for debugging
Example card contents:
┌──────────────────────────────────────────┐
│ 🚫 Pipeline: daily_sales - FAILED │
│ Project: SalesAnalytics | Status: FAILED │
├──────────────────────────────────────────┤
│ ⏱ Duration: 45.23s │
│ 📅 Time: 2026-01-02T15:30:00Z │
│ 📊 Rows: 125,000 │
│ 📂 Story: stories/daily_sales/... │
│ │
│ 🔔 @alice @bob │
└──────────────────────────────────────────┘
Slack Configuration¶
Block Kit Format¶
Slack alerts use Block Kit for rich formatting:
alerts:
- type: slack
url: "${SLACK_WEBHOOK_URL}"
on_events:
- on_failure
metadata:
channel: "#data-alerts" # Override default channel
Example card:
┌──────────────────────────────────────────┐
│ 🚫 ODIBI: daily_sales - GATE_BLOCKED │
├──────────────────────────────────────────┤
│ Project: SalesAnalytics │
│ Status: GATE_BLOCKED │
│ Duration: 45.23s │
│ Pass Rate: 92.3% │
│ Required: 95.0% │
│ Rows Failed: 1,542 │
└──────────────────────────────────────────┘
Generic Webhooks¶
For custom integrations with monitoring systems, ticketing tools, or other services:
alerts:
- type: webhook
url: "https://api.example.com/webhooks/odibi"
on_events:
- on_failure
- on_quarantine
Webhook Payload¶
{
"pipeline": "daily_sales",
"status": "QUARANTINE",
"duration": 45.23,
"message": "150 rows quarantined in validate_transactions",
"timestamp": "2026-01-02T10:15:00Z",
"event_type": "on_quarantine",
"quarantine_details": {
"rows_quarantined": 150,
"quarantine_path": "silver/quarantine/transactions",
"failed_tests": ["not_null", "valid_amount"],
"node_name": "validate_transactions"
}
}
Programmatic Alerts¶
Send alerts programmatically from Python code:
from odibi.utils.alerting import (
send_alert,
send_quarantine_alert,
send_gate_block_alert,
)
from odibi.config import AlertConfig, AlertType
config = AlertConfig(
type=AlertType.TEAMS,
url="https://your-webhook-url...",
metadata={"mention_on_failure": ["alice@company.com"]},
)
# Generic alert
send_alert(
config=config,
message="Custom alert message",
context={
"pipeline": "my_pipeline",
"status": "WARNING",
"event_type": "on_failure",
},
)
# Quarantine alert
send_quarantine_alert(
config=config,
pipeline="daily_sales",
node_name="validate_transactions",
rows_quarantined=150,
quarantine_path="silver/quarantine/transactions",
failed_tests=["not_null", "valid_amount"],
)
# Gate block alert
send_gate_block_alert(
config=config,
pipeline="daily_sales",
node_name="validate_transactions",
pass_rate=0.92,
required_rate=0.95,
failed_rows=1542,
total_rows=20000,
failure_reasons=["Pass rate 92.0% < required 95.0%"],
)
Best Practices¶
- Use throttling - Prevent alert fatigue during cascading failures
- Separate channels - Critical alerts vs informational notifications
- Use @mentions sparingly - Reserve for truly critical events
- Test webhooks - Verify connectivity before production deployment
- Monitor alert volume - High volumes often indicate systemic issues
- Include context - Story paths help with debugging
Troubleshooting¶
Alerts not being sent¶
- Check that the webhook URL is correct and accessible
- Verify the event type matches your
on_eventsconfiguration - Check throttling settings - you may be rate-limited
Teams @mentions not working¶
- Ensure you're using the correct email address format
- The user must be a member of the Teams channel
- Power Automate workflow must have permission to mention users
Power Automate "Bot not in roster" error¶
The Workflows app doesn't have access to the channel. Solutions:
- Use a standard (public) channel instead of private
- Add the Workflows app to your private channel manually
Related¶
- Quarantine Tables - Quarantine event source
- Quality Gates - Gate block event source
- YAML Schema Reference