Simulation Generators Reference¶

Comprehensive reference for all 13 simulation generator types. Each generator produces a specific kind of synthetic data for realistic dataset simulation.

Why this matters

Most synthetic data tools generate uniform random noise. Real sensor data is autocorrelated, mean-reverting, and subject to shocks. If your test data doesn't pass the squint test - if an operator would look at the chart and say "that's not real" - your pipeline will surprise you in production. These generators, especially random_walk, produce data that behaves like real instruments, not dice rolls.

Generator Quick Reference¶

Generator	Use Case	Data Types
range	Metrics, measurements, scores	int, float
random_walk	Process variables, stock prices, sensor drift	float
daily_profile	Occupancy, traffic, energy demand, shift patterns	int, float
categorical	Status codes, categories, enums	string, int
boolean	Flags, binary states	boolean
timestamp	Event times, auto-stepped	timestamp
sequential	Auto-increment IDs, counters	int
constant	Fixed values, metadata, templates	any
derived	Calculated fields, physics, business logic	any
uuid	Unique identifiers	string
email	Contact info	string
ipv4	IP addresses	string
geo	Geographic coordinates	string

range¶

Generate numeric values with statistical distributions.

Parameters¶

Parameter	Type	Required	Default	Description
min	float	Yes	—	Minimum value
max	float	Yes	—	Maximum value
distribution	string	No	`uniform`	`uniform` or `normal`
mean	float	No	(min+max)/2	Mean for normal distribution
std_dev	float	No	(max-min)/6	Standard deviation for normal distribution

Supported data types: int, float

Examples¶

Manufacturing — quality score:

name: quality_score
data_type: float
generator:
  type: range
  min: 85.0
  max: 100.0
  distribution: normal
  mean: 96.0
  std_dev: 2.5

IoT — battery percentage:

name: battery_pct
data_type: int
generator:
  type: range
  min: 0
  max: 100

Tip

Use distribution: normal with a tight std_dev for measurements that cluster around a target (e.g., fill weight, thickness). Use uniform for values that are equally likely across a range (e.g., random wait times).

Normal vs uniform - when to use which

Uniform (distribution: uniform, the default) means every value in the range is equally likely. Use this when there's no "target" the measurement clusters around.

Random wait times between events
Batch IDs or sequence numbers within a range
Anything where 20.0 is just as likely as 35.0

Normal (distribution: normal) means values cluster around a center point (the mean) and become less likely the further you get from it. Use this when there IS a target or typical value.

Fill weight on a packaging line (target: 500g, most bags are 498-502g)
Cycle time on a machine (typical: 31 sec, rarely below 28 or above 35)
Any measurement where quality control keeps things near a target

How std_dev controls the spread:

Tight std_dev (small relative to range) = most values packed near the mean. A std_dev of 1.0 on a mean of 96.0 means ~68% of values fall between 95.0-97.0.
Loose std_dev (large relative to range) = values spread out more evenly. Starts to look like uniform.
Rule of thumb: set std_dev = (max - min) / 6 for a bell curve that rarely hits the edges. Tighten it for more clustering.

random_walk¶

Generate realistic time-series data where each value depends on the previous value. Uses an Ornstein-Uhlenbeck process with optional shocks. Ideal for simulating controlled process variables, financial data, and drifting sensor readings.

Parameters¶

Parameter	Type	Required	Default	Description
start	float	Yes	—	Initial value / static setpoint
min	float	Yes	—	Hard lower bound (physical limit)
max	float	Yes	—	Hard upper bound (physical limit)
volatility	float	No	`1.0`	Std deviation of step-to-step noise. Controls noise magnitude. Must be > 0.
mean_reversion	float	No	`0.0`	Pull strength toward setpoint (0 = pure random walk, 1 = snap back immediately). Simulates PID-like control. Range: 0.0–1.0.
mean_reversion_to	string	No	`None`	Dynamic setpoint. Column name to use as the reversion target instead of the static `start` value. See Dynamic Setpoint Tracking below.
trend	float	No	`0.0`	Drift per timestep. Positive = gradual increase, negative = decrease. Simulates fouling, degradation, or slow process drift.
precision	int	No	`None`	Round values to N decimal places. None = no rounding. Range: 0–10.
shock_rate	float	No	`0.0`	Probability of a sudden shock per timestep (0.0 = never, 1.0 = every step). Range: 0.0–1.0.
shock_magnitude	float	No	`10.0`	Maximum absolute size of a shock event. The actual shock is drawn uniformly from `[0, shock_magnitude]`. Must be > 0.
shock_bias	float	No	`0.0`	Directional tendency for shocks. +1.0 = always up, -1.0 = always down, 0.0 = either direction. Range: -1.0 to 1.0.

Choosing parameter values - a plain-English guide

volatility - How jittery the signal is between readings. Think of it as instrument noise.

0.1 - 0.5 = Gentle hum. A well-tuned pressure transmitter on a stable loop.
1.0 - 3.0 = Moderate wobble. A flow meter on a line with some turbulence.
5.0+ = Wild swings. A noisy thermocouple or an uncontrolled process.
Rule of thumb: start at 0.5, increase until the signal "looks right" for your process.

mean_reversion - How strongly the value gets pulled back to its setpoint. This is your virtual PID controller strength.

0.0 = No control at all. The value wanders freely (pure random walk).
0.01 - 0.05 = Gentle guidance. The process drifts but slowly returns - like a tank level with gravity drain.
0.1 - 0.3 = Steady-state control. A well-tuned PID loop holding a process variable near setpoint.
0.5+ = Tight control. The value snaps back almost immediately after any disturbance.

trend - A slow, persistent push in one direction. It fights against mean_reversion, creating a tug-of-war.

Scale it relative to your signal range. On a signal between 300-400, a trend of 0.001 is imperceptible over 100 rows, but 0.1 will visibly climb within an hour.
0.001 = Barely noticeable drift. Catalyst slowly losing activity over days.
0.01 - 0.05 = Noticeable over a shift. Heat exchanger fouling you can see in a daily report.
0.1+ = Aggressive drift. Equipment degrading fast enough to trigger alarms.

shock_rate - How often a sudden spike hits. Think of random process upsets.

0.0 = Never. Smooth operation.
0.01 - 0.02 = Rare upsets. One spike every 50-100 readings.
0.05 = Frequent disturbances. Unstable feed or unreliable upstream equipment.
Pair with mean_reversion > 0 so the process recovers after each shock. Shocks without recovery aren't realistic.

shock_magnitude - How big the spike is when it happens. The actual shock is drawn randomly from zero up to this value.

Scale it to your signal range. If your process runs 300-400, a shock_magnitude of 30 means a spike could push the value up to 30 units away from where it was.

shock_bias - Which direction shocks tend to go.

0.0 = Equal chance of spiking up or down (symmetric disturbances).
+1.0 = Always spikes up. Exothermic runaways, pressure surges.
-1.0 = Always spikes down. Sudden cooling, pressure drops, flow interruptions.

precision - How many decimal places to round to. Matches real instrument resolution.

0 = Whole numbers (like a digital counter).
1 = One decimal place (typical for temperature displays: 72.3 degrees F).
2 = Two decimal places (typical for pressure gauges: 14.72 psi).
None = Full floating-point precision (useful for intermediate calculations).

Supported data types: float

How it works: Each value = previous + noise + mean_reversion pull + trend. Values are clamped to [min, max]. Shocks perturb the internal state, so mean_reversion naturally pulls values back — producing realistic spike-and-recover patterns.

Examples¶

Manufacturing — reactor temperature with occasional upsets:

name: reactor_temp
data_type: float
generator:
  type: random_walk
  start: 350.0
  min: 300.0
  max: 400.0
  volatility: 0.5
  mean_reversion: 0.1
  trend: 0.001
  precision: 1
  shock_rate: 0.02
  shock_magnitude: 30.0
  shock_bias: 1.0

Business — daily stock price:

name: stock_price
data_type: float
generator:
  type: random_walk
  start: 150.0
  min: 50.0
  max: 500.0
  volatility: 2.5
  trend: 0.01
  precision: 2

Dynamic Setpoint Tracking¶

The mean_reversion_to parameter enables a walk to track a time-varying reference column instead of reverting to the static start value. This is essential for simulating real-world dependencies where one signal follows another.

How it works: At each timestep, the reversion target is read from the referenced column's current row value. The walk drifts toward that dynamic target with the configured mean_reversion strength. If the referenced column is not yet available (dependency ordering issue), it falls back to start.

Dependency Order

The referenced column must be defined earlier in the column list. Odibi evaluates columns in order — the target column must already have a value for the current row.

IoT — battery temperature tracking ambient:

columns:
  - name: ambient_temp_c
    data_type: float
    generator:
      type: random_walk
      start: 25.0
      min: 15.0
      max: 35.0
      volatility: 0.3
      mean_reversion: 0.05

  - name: battery_temp_c
    data_type: float
    generator:
      type: random_walk
      start: 28.0
      min: 20.0
      max: 45.0
      volatility: 0.4
      mean_reversion: 0.1
      mean_reversion_to: ambient_temp_c  # Tracks ambient, not static 28.0

Manufacturing — process variable following changing setpoint:

columns:
  - name: temp_setpoint_c
    data_type: float
    generator:
      type: random_walk
      start: 80.0
      min: 60.0
      max: 100.0
      volatility: 0.1
      mean_reversion: 0.02

  - name: actual_temp_c
    data_type: float
    generator:
      type: random_walk
      start: 80.0
      min: 55.0
      max: 105.0
      volatility: 0.5
      mean_reversion: 0.15
      mean_reversion_to: temp_setpoint_c  # PV tracks SP
      precision: 1

Tips¶

Use mean_reversion: 0.1 to simulate a PID-controlled process at steady state.
Use trend: 0.001 to simulate slow fouling or catalyst deactivation.
Use precision: 1 to match real instrument resolution (e.g., temperature to 0.1°F).
Use shock_rate: 0.02 with shock_bias: 1.0 to simulate occasional exothermic runaways.
A warning is issued if shock_rate > 0 without mean_reversion — shocks without recovery aren't realistic.
Works with incremental mode — the last value per entity is saved and restored on the next run.

daily_profile¶

Generate values that follow a repeating daily curve defined by anchor points. The engine interpolates between anchor points, adds noise, and clamps to [min, max]. Ideal for simulating any metric with a predictable intraday pattern.

Parameters¶

Parameter	Type	Required	Default	Description
profile	dict	Yes	—	Anchor points mapping `HH:MM` to target values
min	float	Yes	—	Hard lower bound (physical limit)
max	float	Yes	—	Hard upper bound (physical limit)
noise	float	No	`0.0`	Random noise amplitude (±noise added to interpolated value)
volatility	float	No	`0.0`	Day-to-day variation. Each day, anchor targets shift by a random amount (std_dev = volatility). 0 = same curve every day.
interpolation	string	No	`linear`	`linear` or `step`
precision	int	No	`None`	Round to N decimal places. `0` = integers. None = full float. Range: 0–10.
weekend_scale	float	No	`None`	Scale factor for weekends (0.0–1.0). None = no adjustment.

Supported data types: int, float

How it works: At each timestamp, the engine finds the two surrounding anchor points and interpolates to get a base value. If volatility is set, each day's anchor targets are independently shifted by a random normal amount, making each day's curve slightly different while preserving the overall shape. Noise is added (±noise, uniform), then the result is clamped to [min, max] and rounded to the specified precision. On weekends (Saturday/Sunday), the interpolated value is multiplied by weekend_scale before noise is applied.

Examples¶

Facilities — building occupancy (integers):

name: occupancy
data_type: int
generator:
  type: daily_profile
  min: 0
  max: 25
  precision: 0
  noise: 1.5
  volatility: 3.0
  profile:
    "00:00": 1
    "06:00": 3
    "08:00": 19
    "12:00": 15
    "13:00": 22
    "17:00": 14
    "22:00": 2

IT — network traffic (float, weekend scaling):

name: bandwidth_mbps
data_type: float
generator:
  type: daily_profile
  min: 0.0
  max: 1000.0
  noise: 50.0
  precision: 1
  interpolation: linear
  weekend_scale: 0.3
  profile:
    "00:00": 50.0
    "06:00": 100.0
    "09:00": 800.0
    "12:00": 650.0
    "13:00": 750.0
    "17:00": 900.0
    "20:00": 400.0
    "23:00": 100.0

Manufacturing — shift-based power consumption (step interpolation):

name: power_kw
data_type: float
generator:
  type: daily_profile
  min: 0
  max: 500
  precision: 0
  interpolation: step
  profile:
    "00:00": 50
    "06:00": 350
    "14:00": 400
    "22:00": 50

Choosing between linear and step

Use linear for metrics that transition gradually (occupancy, temperature, traffic). Use step for metrics that change abruptly at specific times (shift changes, scheduled equipment start/stop).

Two levels of randomness

noise and volatility work at different time scales:

noise adds per-reading jitter. Every 5-minute reading gets a small random offset. This makes the line wiggly within a single day.
volatility adds per-day variation. Each day's anchor targets are shifted independently, so Monday might peak at 21 people and Tuesday at 17. This makes the overall shape different from day to day.

Use both together for the most realistic data: noise for instrument-level variation, volatility for real-world unpredictability.

Weekend scaling

weekend_scale multiplies the interpolated profile value before noise is applied. A value of 0.3 means weekend values are 30% of the weekday profile. Use 0.0 for buildings that are empty on weekends, or None (the default) for 24/7 operations where weekends look the same as weekdays.

categorical¶

Generate discrete values chosen from a predefined list.

Parameters¶

Parameter	Type	Required	Default	Description
values	list	Yes	—	List of possible values
weights	list[float]	No	uniform	Probability weights (must sum to 1.0)

Supported data types: string, int, any

Examples¶

Manufacturing — machine status:

name: machine_status
data_type: string
generator:
  type: categorical
  values: [Running, Idle, Maintenance, Error]
  weights: [0.75, 0.12, 0.08, 0.05]

Business — customer tier:

name: customer_tier
data_type: string
generator:
  type: categorical
  values: [Bronze, Silver, Gold, Platinum]
  weights: [0.50, 0.30, 0.15, 0.05]

boolean¶

Generate True/False values with configurable probability.

Parameters¶

Parameter	Type	Required	Default	Description
true_probability	float	No	`0.5`	Probability of True (0.0–1.0)

Supported data types: boolean

Examples¶

IoT — sensor online flag:

name: is_online
data_type: boolean
generator:
  type: boolean
  true_probability: 0.98

Business — email opted-in:

name: opted_in
data_type: boolean
generator:
  type: boolean
  true_probability: 0.65

timestamp¶

Generate auto-stepped timestamp values based on the simulation scope's timestep configuration.

Parameters¶

Parameter	Type	Required	Default	Description
(none)	—	—	—	Uses `scope.timestep` automatically

Supported data types: timestamp

Format: ISO 8601 Zulu — 2026-01-01T00:00:00Z

Example¶

name: event_time
data_type: timestamp
generator:
  type: timestamp

Tip

The timestamp column advances automatically based on the timestep in your simulation scope (e.g., 1m, 5m, 1h). You only need one timestamp column per entity.

sequential¶

Generate auto-incrementing integer values.

By default, IDs are globally unique across entities — each entity gets a non-overlapping range. Entity 0 gets [start, start + rows), entity 1 gets [start + rows, start + 2*rows), etc. This prevents duplicate IDs when multiple entities share the same sequential column.

Set unique_across_entities: false to revert to per-entity sequences (all entities start from the same start value).

Parameters¶

Parameter	Type	Required	Default	Description
start	int	No	`1`	Starting value
step	int	No	`1`	Increment per row
unique_across_entities	bool	No	`true`	Each entity gets a non-overlapping ID range

Supported data types: int

Examples¶

Business — order line numbers:

name: line_number
data_type: int
generator:
  type: sequential
  start: 1
  step: 1

Manufacturing — batch IDs (by 10s):

name: batch_id
data_type: int
generator:
  type: sequential
  start: 1000
  step: 10

Per-entity sequence (opt-in to old behavior):

name: local_seq
data_type: int
generator:
  type: sequential
  start: 1
  unique_across_entities: false

constant¶

Generate a fixed value for every row, with optional template variable support.

Parameters¶

Parameter	Type	Required	Default	Description
value	any	Yes	—	Constant value or template string

Supported data types: any

Magic Variables¶

Templates can reference these runtime variables:

Variable	Description
`{entity_id}`	Current entity name
`{entity_index}`	Entity index (0-based)
`{timestamp}`	Current row timestamp
`{row_number}`	Row index

Examples¶

Metadata — source system tag:

name: source_system
data_type: string
generator:
  type: constant
  value: "MES_simulation"

Templated — entity-specific reference:

name: record_ref
data_type: string
generator:
  type: constant
  value: "{entity_id}_batch_{row_number}"

derived¶

Generate calculated columns from other columns using sandboxed Python expressions.

When should I use derived vs a simpler generator?

Use derived when the column's value depends on other columns or changes based on logic. If the column is independent, use a simpler generator:

Temperature that wanders on its own? Use random_walk.
Temperature converted from Celsius to Fahrenheit? Use derived (because it depends on the Celsius column).
Status that's always one of three values? Use categorical.
Alarm flag that's TRUE when temperature exceeds 95? Use derived (because it depends on temperature).

Parameters¶

Parameter	Type	Required	Default	Description
expression	string	Yes	—	Sandboxed Python expression referencing column names. Supports context variables (`_row_index`, `entity_id`, `_timestamp`), safe math functions (`abs`, `round`, `min`, `max`, `coalesce`, `safe_div`), and stateful functions (`prev`, `ema`, `pid`, `delay`)

Supported data types: any (depends on expression result)

Expression Syntax¶

Expressions use Python syntax and can reference any column defined earlier in the column list by name.

Arithmetic operators:

Operator	Description	Example
`+`	Addition	`price + tax`
`-`	Subtraction	`gross - tare`
`*`	Multiplication	`quantity * unit_price`
`/`	Division	`total / count`
`**`	Exponentiation	`base ** 2`
`//`	Floor division	`seconds // 60`
`%`	Modulo	`batch_id % 10`

Comparison operators:

Operator	Description	Example
`==`	Equal	`status == 'OK'`
`!=`	Not equal	`grade != 'FAIL'`
`<`	Less than	`temp < 100`
`>`	Greater than	`pressure > 50`
`<=`	Less or equal	`score <= 100`
`>=`	Greater or equal	`level >= threshold`

Logical operators:

Operator	Description	Example
`and`	Logical AND	`temp > 80 and pressure > 50`
`or`	Logical OR	`status == 'ERROR' or status == 'FAULT'`
`not`	Logical NOT	`not is_active`

Conditionals:

value_if_true if condition else value_if_false

Safe Functions¶

Function	Signature	Description
`abs()`	`abs(x)`	Absolute value
`round()`	`round(x, n)`	Round to n decimals
`min()`	`min(a, b, ...)`	Minimum value
`max()`	`max(a, b, ...)`	Maximum value
`int()`	`int(x)`	Convert to integer
`float()`	`float(x)`	Convert to float
`str()`	`str(x)`	Convert to string
`bool()`	`bool(x)`	Convert to boolean
`coalesce()`	`coalesce(a, b, ...)`	Return first non-None value
`safe_div()`	`safe_div(a, b, default=None)`	Division handling None and zero
`safe_mul()`	`safe_mul(a, b, default=None)`	Multiplication handling None
`random()`	`random()`	Random float in [0, 1)

When to use safe functions

safe_div(a, b, default) - Use whenever you divide and the denominator could be zero or NULL. Without it, you get a runtime error.

# Without safe_div - crashes if total_units is 0
expression: "good_units / total_units * 100"

# With safe_div - returns 0 instead of crashing
expression: "safe_div(good_units, total_units, 0) * 100"

coalesce(a, b, ...) - Use when upstream columns might have NULLs (from null_rate). Returns the first non-None value.

# If primary_temp is NULL, fall back to backup_temp, then to 25.0
expression: "coalesce(primary_temp, backup_temp, 25.0)"

safe_mul(a, b, default) - Use when multiplying values that might be NULL. Returns the default instead of propagating None.

Context Variables¶

These variables are automatically available in every derived expression:

Variable	Type	Description
`entity_id`	string	Current entity name (e.g., `"sensor_01"`)
`_row_index`	int	Current row index (0-based)
`_timestamp`	datetime	Current row timestamp as a Python `datetime` object

# Example: battery draining over time
expression: "max(0, 100 - (_row_index * 0.07))"

# Example: entity-specific logic
expression: "'high' if entity_id == 'reactor_01' else 'normal'"

_timestamp enables time-of-day logic in derived expressions:

# Classify by shift
name: shift
generator:
  type: derived
  expression: "'day' if 6 <= _timestamp.hour < 18 else 'night'"

# Extract hour for grouping
name: hour_of_day
generator:
  type: derived
  expression: "_timestamp.hour"

Stateful Functions¶

These functions maintain state across rows within each entity, enabling time-series logic:

Function	Signature	Description
`prev()`	`prev(column_name, default=None)`	Get previous row's value for a column
`ema()`	`ema(column_name, alpha, default=None)`	Exponential moving average (0 < alpha ≤ 1)
`delay()`	`delay(column_name, steps, default=None)`	Value from N timesteps ago (transport delay)
`pid()`	`pid(pv, sp, Kp, Ki, Kd, dt, output_min, output_max, anti_windup)`	PID controller with anti-windup
`random()`	`random()`	Random float in [0, 1) for noise injection

For full documentation, see Stateful Functions. For the complete list of all safe functions, see the Generators Reference — derived.

Cross-Entity References¶

Derived expressions can reference columns from other entities using EntityName.column_name syntax:

expression: "Furnace01.temperature * 0.9 + ambient_offset"

The referenced entity must be defined in the same simulation. Odibi automatically resolves the dependency order. For details, see Advanced Features.

Security¶

Expressions are evaluated in a sandboxed namespace. The following are explicitly blocked:

No import statements
No file I/O (open, read, write)
No network access
No system calls
No access to __builtins__

Examples¶

Manufacturing — unit conversion:

name: temp_fahrenheit
data_type: float
generator:
  type: derived
  expression: "temp_celsius * 1.8 + 32"

Business — order total with tax:

name: order_total
data_type: float
generator:
  type: derived
  expression: "round(quantity * unit_price * 1.08, 2)"

IoT — alarm classification:

name: alarm_level
data_type: string
generator:
  type: derived
  expression: "'CRITICAL' if temp > 95 else ('WARNING' if temp > 80 else 'NORMAL')"

Manufacturing — efficiency with null safety:

name: oee
data_type: float
generator:
  type: derived
  expression: "safe_div(good_units, total_units, 0) * 100"

uuid¶

Generate unique identifiers.

Parameters¶

Parameter	Type	Required	Default	Description
version	int	No	`4`	UUID version. Options: `4` (random) or `5` (deterministic/namespace-based). Only these values are supported
namespace	string	No	`DNS`	Namespace seed for UUID5 generation. Arbitrary string; default `DNS` uses the standard DNS namespace UUID

Supported data types: string

Examples¶

Random UUID (deterministic with seed):

name: transaction_id
data_type: string
generator:
  type: uuid
  version: 4

Deterministic UUID (same input → same output):

name: device_id
data_type: string
generator:
  type: uuid
  version: 5
  namespace: "com.factory.devices"

email¶

Generate email addresses.

Parameters¶

Parameter	Type	Required	Default	Description
domain	string	No	`example.com`	Email domain
pattern	string	No	`{entity}_{index}`	Username template. Available placeholders: `{entity}`, `{index}`, `{row}`. Default: `{entity}_{index}`

Supported data types: string

Examples¶

Business — customer emails:

name: customer_email
data_type: string
generator:
  type: email
  domain: acme-corp.com
  pattern: "user_{row}"

Output: user.5@acme-corp.com

ipv4¶

Generate IPv4 addresses, optionally within a specific subnet.

Parameters¶

Parameter	Type	Required	Default	Description
subnet	string	No	`None`	CIDR subnet (e.g., `192.168.0.0/24`)

Supported data types: string

Examples¶

IoT — full range:

name: device_ip
data_type: string
generator:
  type: ipv4

Constrained to a private subnet:

name: server_ip
data_type: string
generator:
  type: ipv4
  subnet: "10.0.1.0/24"

geo¶

Generate geographic coordinates (latitude/longitude).

Parameters¶

Parameter	Type	Required	Default	Description
bbox	list[float]	Yes	—	Bounding box: `[min_lat, min_lon, max_lat, max_lon]`
format	string	No	`tuple`	`tuple` or `lat_lon_separate`

Supported data types: string

Examples¶

Manufacturing — factory fleet within a region:

name: truck_location
data_type: string
generator:
  type: geo
  bbox: [33.7, -84.5, 34.0, -84.2]  # Atlanta metro area
  format: tuple

Output: (33.8421, -84.3812)

IoT — sensor deployment zone:

name: sensor_location
data_type: string
generator:
  type: geo
  bbox: [51.4, -0.2, 51.6, 0.1]  # London area
  format: tuple

Compatibility Matrix¶

All generators work across all engines, incremental mode, null injection, and entity overrides.

Generator	Pandas	Spark	Polars	Incremental	Null Rate	Overrides
range	✅	✅	✅	✅	✅	✅
random_walk	✅	✅	✅	✅	✅	✅
daily_profile	✅	✅	✅	✅	✅	✅
categorical	✅	✅	✅	✅	✅	✅
boolean	✅	✅	✅	✅	✅	✅
timestamp	✅	✅	✅	✅	✅	✅
sequential	✅	✅	✅	✅	✅	✅
constant	✅	✅	✅	✅	✅	✅
derived	✅	✅	✅	✅	⚠️ *	✅
uuid	✅	✅	✅	✅	✅	✅
email	✅	✅	✅	✅	✅	✅
ipv4	✅	✅	✅	✅	✅	✅
geo	✅	✅	✅	✅	✅	✅

* Derived + null_rate: null_rate is applied after the expression is calculated. If upstream columns have nulls, use null-safe functions (coalesce, safe_div, safe_mul) in your expression to avoid errors.

Simulation Generators Reference¶

Generator Quick Reference¶

range¶

Parameters¶

Examples¶

random_walk¶

Parameters¶

Examples¶

Dynamic Setpoint Tracking¶

Tips¶

daily_profile¶

Parameters¶

Examples¶

categorical¶

Parameters¶

Examples¶

boolean¶

Parameters¶

Examples¶

timestamp¶

Parameters¶

Example¶

sequential¶

Parameters¶

Examples¶

constant¶

Parameters¶

Magic Variables¶

Examples¶

derived¶

Parameters¶

Expression Syntax¶

Safe Functions¶

Context Variables¶

Stateful Functions¶

Cross-Entity References¶

Security¶

Examples¶

uuid¶

Parameters¶

Examples¶

email¶

Parameters¶

Examples¶

ipv4¶

Parameters¶

Examples¶

geo¶

Parameters¶

Examples¶

Compatibility Matrix¶

See Also¶