Skip to content

Simulation

Generate realistic synthetic data for any domain - manufacturing, operations, IoT, process control, business - directly in your pipeline YAML. No code needed.

Simulation is Odibi's built-in data generator. Define what you need in YAML, and the framework produces time-series, categorical, relational, and process data that behaves like the real thing. Swap to real sources later - your downstream pipeline stays unchanged.

Who built this

Odibi was built by a chemical engineer turned data engineer - the only DE on an analytics team in operations, not IT. Simulation exists because I needed realistic process data and couldn't wait for IT to provision it. I know what a PID controller is, what a material balance looks like, and what happens when your test data doesn't behave like the real thing. Every pattern in this library comes from a real problem I've solved. The unique angle: this isn't just a data tool. It's an engineering tool built by someone who understands both sides.

A taste of simulation
read:
  format: simulation
  options:
    simulation:
      scope: { start_time: "2026-01-01", timestep: "5m", row_count: 288, seed: 42 }
      entities: { count: 3, id_prefix: "sensor_" }
      columns:
        - name: temperature
          data_type: float
          generator: { type: random_walk, start: 72, min: 60, max: 90, volatility: 0.5 }

Key Capabilities

Capability What You Get
13 Generator Types range ยท random_walk ยท daily_profile ยท categorical ยท boolean ยท timestamp ยท sequential ยท constant ยท derived ยท uuid ยท email ยท ipv4 ยท geo
Stateful Functions prev(), ema(), pid(), delay() โ€” values that depend on history for dynamic process simulation
Cross-Entity References One entity reacts to another: downstream sensor reads upstream output
Entity Overrides Per-entity behavior variation โ€” entity A runs hot, entity B runs cold
Scheduled Events Maintenance windows, setpoint changes, recurring events, condition-based triggers, ramp transitions
Chaos Engineering Outliers, duplicates, downtime gaps, null injection โ€” realistic imperfections
Incremental Mode Continuous data generation with HWM state โ€” each run picks up where the last left off
Deterministic Same seed = same output, every time
Multi-Engine Same YAML works on Pandas, Spark, and Polars

Learning Path

Work through the simulation docs in order, or jump to what you need:

Page What You'll Learn
:material-star: Why Odibi Simulation What makes this different and why it matters
:material-rocket-launch: Getting Started Your first simulation in 5 minutes
:material-book-open-variant: Core Concepts Scope, entities, and columns - the three building blocks
:material-format-list-bulleted-type: Generators Reference All 13 generator types with parameters and examples
:material-function-variant: Stateful Functions prev(), ema(), pid(), delay() โ€” history-dependent values
:material-puzzle: Advanced Features Cross-entity references, overrides, scheduled events (recurring, condition-based, ramp), chaos
:material-sync: Incremental Mode Continuous data generation across pipeline runs
:material-chef-hat: Patterns & Recipes Real-world scenarios: IoT fleets, batch reactors, order streams
:material-flask: Process Simulation ChemE and process control: FOPTD, PID loops, reactor dynamics

When to Use Simulation

  • Build pipelines before source data exists โ€” design transforms, test patterns, validate schema now
  • Test with safe, reproducible data โ€” no PII, no compliance headaches
  • Stress test Delta Lake at scale โ€” 1,000 entities ร— 10,000 rows = 10M rows from a single YAML node
  • Demo without exposing real data โ€” realistic enough for stakeholders, safe enough for anywhere
  • Simulate manufacturing, operations, and IoT โ€” sensors, PLCs, batch processes, alarms for a local data platform
  • Prototype analytics before production data arrives โ€” build dashboards on synthetic facts and dimensions

Installation

Simulation is built into odibi core. No extra dependencies, no plugins โ€” if you have odibi installed, you have simulation.

pip install odibi

Next Steps

Start with Getting Started to generate your first dataset in under 5 minutes, or browse the Generators Reference to see what's available.