Simulation Generators Reference¶
Quick reference for all 13 simulation generator types.
Generator Types¶
range¶
Purpose: Numeric values with statistical distributions
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| min | float | Yes | - | Minimum value |
| max | float | Yes | - | Maximum value |
| distribution | string | No | uniform |
uniform or normal |
| mean | float | No | (min+max)/2 | Mean for normal distribution |
| std_dev | float | No | (max-min)/6 | Std deviation for normal |
Data types: int, float
Example:
random_walk¶
Purpose: Realistic time-series data where each value depends on the previous value. Ideal for simulating controlled process variables (temperatures, pressures, flow rates).
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| start | float | Yes | - | Initial value / setpoint |
| min | float | Yes | - | Hard lower bound (physical limit) |
| max | float | Yes | - | Hard upper bound (physical limit) |
| volatility | float | No | 1.0 |
Std deviation of step-to-step noise |
| mean_reversion | float | No | 0.0 |
Pull toward start (0=none, 1=snap back) |
| trend | float | No | 0.0 |
Drift per timestep (+/- for gradual shift) |
| precision | int | No | None | Round to N decimal places |
| shock_rate | float | No | 0.0 |
Probability of sudden shock per timestep (0=never, 1=every step) |
| shock_magnitude | float | No | 10.0 |
Maximum absolute size of a shock event |
| shock_bias | float | No | 0.0 |
Directional tendency: +1=up only, -1=down only, 0=either direction |
Data types: float
How it works: Uses an Ornstein-Uhlenbeck process with optional shock events. Each value = previous + noise + mean_reversion pull + trend. Random shocks inject sudden jumps that the mean_reversion naturally recovers from. Values are clamped to [min, max].
Example:
name: reactor_temp
data_type: float
generator:
type: random_walk
start: 350.0
min: 300.0
max: 400.0
volatility: 0.5
mean_reversion: 0.1
trend: 0.001
precision: 1
shock_rate: 0.02
shock_magnitude: 30.0
shock_bias: 1.0
Tips:
- Use
mean_reversion: 0.1to simulate a PID-controlled process at steady state - Use
trend: 0.001to simulate slow fouling or catalyst deactivation - Use
precision: 1to match real instrument resolution (e.g., temperature to 0.1Β°F) - Use
shock_rate: 0.02withshock_bias: 1.0to simulate occasional exothermic runaways in a reactor - Shocks perturb the walk's internal state, so
mean_reversionnaturally pulls values back β producing realistic spike-and-recover patterns - A warning is issued if
shock_rate > 0withoutmean_reversionβ shocks without recovery aren't realistic - Works with incremental mode β the last value per entity is saved and restored on the next run
daily_profile¶
Purpose: Time-of-day patterns with day-to-day variation. Values follow a repeating daily curve defined by anchor points.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| profile | dict | Yes | - | Anchor points mapping HH:MM to target values |
| min | float | Yes | - | Hard lower bound |
| max | float | Yes | - | Hard upper bound |
| noise | float | No | 0.0 |
Per-reading jitter (Β±noise) |
| volatility | float | No | 0.0 |
Day-to-day variation in anchor targets (std_dev) |
| interpolation | string | No | linear |
linear or step |
| precision | int | No | None | Round to N decimal places. 0 = integers |
| weekend_scale | float | No | None | Scale factor for weekends (0.0β1.0) |
Data types: int, float
How it works: Interpolates between anchor points to get a base value. If volatility > 0, each day's anchors are independently shifted by a random normal amount. Noise is added, then clamped to [min, max] and rounded.
Example:
name: occupancy
data_type: int
generator:
type: daily_profile
min: 0
max: 25
precision: 0
noise: 1.5
volatility: 3.0
profile:
"00:00": 1
"08:00": 19
"12:00": 15
"13:00": 22
"17:00": 14
"22:00": 2
Tips:
- Use
precision: 0for headcounts, counters, or anything that must be a whole number - Use
noisefor reading-to-reading jitter,volatilityfor day-to-day variation in the overall shape - Use
weekend_scale: 0.15for office buildings that are nearly empty on weekends - Use
interpolation: stepfor shift-based patterns that change abruptly (e.g., factory power at shift change) - Pairs well with
derivedcolumns β e.g., CO2 derived from occupancy
categorical¶
Purpose: Discrete choice from predefined values
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| values | list | Yes | - | List of possible values |
| weights | list[float] | No | uniform | Probability weights (must sum to 1.0) |
Data types: string, int, any
Example:
name: status
data_type: string
generator:
type: categorical
values: [Running, Idle, Error]
weights: [0.8, 0.15, 0.05]
boolean¶
Purpose: True/False with configurable probability
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| true_probability | float | No | 0.5 | Probability of True (0.0-1.0) |
Data types: boolean
Example:
timestamp¶
Purpose: Auto-stepped timestamp column
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| (none) | - | - | - | Uses scope.timestep automatically |
Data types: timestamp
Format: ISO8601 Zulu (2026-01-01T00:00:00Z)
Example:
sequential¶
Purpose: Auto-incrementing integers (globally unique across entities by default)
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| start | int | No | 1 | Starting value |
| step | int | No | 1 | Increment step |
| unique_across_entities | bool | No | true | Each entity gets a non-overlapping ID range |
Data types: int
Example:
constant¶
Purpose: Fixed values with template support
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| value | any | Yes | - | Constant value or template |
Data types: any
Magic variables:
- {entity_id} - Current entity name
- {entity_index} - Entity index (0-based)
- {timestamp} - Current row timestamp
- {row_number} - Row index
Example:
name: source_system
data_type: string
generator:
type: constant
value: "simulation"
# Or with template:
name: entity_ref
data_type: string
generator:
type: constant
value: "{entity_id}_record_{row_number}"
uuid¶
Purpose: Unique identifiers (UUID4 or UUID5)
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| version | int | No | 4 | UUID version. Options: 4 (random) or 5 (deterministic/namespace-based). Only these values are supported |
| namespace | string | No | DNS | Namespace seed for UUID5 generation. Arbitrary string; default DNS uses the standard DNS namespace UUID |
Data types: string
Example:
# Random (deterministic with seed):
name: order_id
data_type: string
generator:
type: uuid
version: 4
# Fully deterministic:
name: device_id
data_type: string
generator:
type: uuid
version: 5
namespace: "com.example.devices"
email¶
Purpose: Email addresses
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| domain | string | No | example.com |
Email domain |
| pattern | string | No | {entity}_{index} |
Username template. Available placeholders: {entity}, {index}, {row}. Default: {entity}_{index} |
Data types: string
Example:
Output: user.5@company.com
ipv4¶
Purpose: IPv4 addresses
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| subnet | string | No | None | CIDR subnet (e.g., 192.168.0.0/24) |
Data types: string
Example:
# Full range:
name: client_ip
data_type: string
generator:
type: ipv4
# Within subnet:
name: server_ip
data_type: string
generator:
type: ipv4
subnet: "10.0.0.0/8"
geo¶
Purpose: Geographic coordinates (latitude/longitude)
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| bbox | list[float] | Yes | - | [min_lat, min_lon, max_lat, max_lon] |
| format | string | No | tuple |
tuple or lat_lon_separate |
Data types: tuple, object
Example:
name: location
data_type: string
generator:
type: geo
bbox: [37.0, -122.5, 37.8, -122.0] # San Francisco Bay Area
format: tuple
Output: (37.4532, -122.1823)
derived¶
Purpose: Calculated columns from other columns
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| expression | string | Yes | - | Sandboxed Python expression referencing column names. See the full function table below for all available functions |
Data types: any (depends on expression result)
Supported operations:
- Arithmetic: +, -, *, /, **, //, %
- Comparison: ==, !=, <, >, <=, >=
- Logical: and, or, not
- Conditionals: value if condition else other
Safe functions:
| Category | Function | Signature | Description |
|---|---|---|---|
| Math | abs() |
abs(x) |
Absolute value |
| Math | round() |
round(x, ndigits) |
Round to N decimal places |
| Math | min() |
min(a, b, ...) |
Minimum value |
| Math | max() |
max(a, b, ...) |
Maximum value |
| Type | int() |
int(x) |
Cast to integer |
| Type | float() |
float(x) |
Cast to float |
| Type | str() |
str(x) |
Cast to string |
| Type | bool() |
bool(x) |
Cast to boolean |
| Null-safe | coalesce() |
coalesce(a, b, ...) |
First non-None value |
| Null-safe | safe_div() |
safe_div(a, b, default=None) |
Division handling None and zero |
| Null-safe | safe_mul() |
safe_mul(a, b, default=None) |
Multiplication handling None |
| Stateful | prev() |
prev(column, default=None) |
Previous row value (see Stateful Functions) |
| Stateful | ema() |
ema(column, alpha, default=None) |
Exponential moving average |
| Stateful | delay() |
delay(column, steps, default=None) |
Value from N timesteps ago (transport delay) |
| Stateful | pid() |
pid(pv, sp, Kp, Ki, Kd, dt, output_min, output_max, anti_windup) |
PID controller with anti-windup |
| Utility | random() |
random() |
Random float in [0, 1) |
Context variables available in expressions: _row_index, entity_id, _timestamp
Example:
# Simple calculation:
name: temp_fahrenheit
data_type: float
generator:
type: derived
expression: "temp_celsius * 1.8 + 32"
# Conditional:
name: status
data_type: string
generator:
type: derived
expression: "'HOT' if temp_celsius > 80 else 'NORMAL'"
# Null-safe:
name: efficiency
data_type: float
generator:
type: derived
expression: "safe_div(output, input, 0)"
Security
Expressions run in a restricted namespace. No imports, file I/O, or system calls allowed.
Compatibility Matrix¶
| Generator | Pandas | Spark | Polars | Incremental | Null Rate | Overrides |
|---|---|---|---|---|---|---|
| range | β | β | β | β | β | β |
| random_walk | β | β | β | β | β | β |
| daily_profile | β | β | β | β | β | β |
| categorical | β | β | β | β | β | β |
| boolean | β | β | β | β | β | β |
| timestamp | β | β | β | β | β | β |
| sequential | β | β | β | β | β | β |
| constant | β | β | β | β | β | β |
| uuid | β | β | β | β | β | β |
| β | β | β | β | β | β | |
| ipv4 | β | β | β | β | β | β |
| geo | β | β | β | β | β | β |
| derived | β | β | β | β | β οΈ * | β |
Notes: - * Derived columns: null_rate applied after calculation. Use null-safe functions (coalesce, safe_div) if dependencies have nulls.
See Also¶
- Simulation Overview β Getting started with simulation
- Generators Guide β All 13 generator types with examples
- Stateful Functions β
prev(),ema(),pid(),delay()deep dive - Simulation Playbook β Pattern-based guide for building process simulations
- State Management β HWM persistence for incremental mode
- Configuration β General config concepts