Skip to content

Simulation Generators Reference

Quick reference for all 13 simulation generator types.

Generator Types

range

Purpose: Numeric values with statistical distributions

Parameters:

Parameter Type Required Default Description
min float Yes - Minimum value
max float Yes - Maximum value
distribution string No uniform uniform or normal
mean float No (min+max)/2 Mean for normal distribution
std_dev float No (max-min)/6 Std deviation for normal

Data types: int, float

Example:

name: temperature
data_type: float
generator:
  type: range
  min: 60.0
  max: 100.0
  distribution: normal


random_walk

Purpose: Realistic time-series data where each value depends on the previous value. Ideal for simulating controlled process variables (temperatures, pressures, flow rates).

Parameters:

Parameter Type Required Default Description
start float Yes - Initial value / setpoint
min float Yes - Hard lower bound (physical limit)
max float Yes - Hard upper bound (physical limit)
volatility float No 1.0 Std deviation of step-to-step noise
mean_reversion float No 0.0 Pull toward start (0=none, 1=snap back)
trend float No 0.0 Drift per timestep (+/- for gradual shift)
precision int No None Round to N decimal places
shock_rate float No 0.0 Probability of sudden shock per timestep (0=never, 1=every step)
shock_magnitude float No 10.0 Maximum absolute size of a shock event
shock_bias float No 0.0 Directional tendency: +1=up only, -1=down only, 0=either direction

Data types: float

How it works: Uses an Ornstein-Uhlenbeck process with optional shock events. Each value = previous + noise + mean_reversion pull + trend. Random shocks inject sudden jumps that the mean_reversion naturally recovers from. Values are clamped to [min, max].

Example:

name: reactor_temp
data_type: float
generator:
  type: random_walk
  start: 350.0
  min: 300.0
  max: 400.0
  volatility: 0.5
  mean_reversion: 0.1
  trend: 0.001
  precision: 1
  shock_rate: 0.02
  shock_magnitude: 30.0
  shock_bias: 1.0

Tips:

  • Use mean_reversion: 0.1 to simulate a PID-controlled process at steady state
  • Use trend: 0.001 to simulate slow fouling or catalyst deactivation
  • Use precision: 1 to match real instrument resolution (e.g., temperature to 0.1Β°F)
  • Use shock_rate: 0.02 with shock_bias: 1.0 to simulate occasional exothermic runaways in a reactor
  • Shocks perturb the walk's internal state, so mean_reversion naturally pulls values back β€” producing realistic spike-and-recover patterns
  • A warning is issued if shock_rate > 0 without mean_reversion β€” shocks without recovery aren't realistic
  • Works with incremental mode β€” the last value per entity is saved and restored on the next run

daily_profile

Purpose: Time-of-day patterns with day-to-day variation. Values follow a repeating daily curve defined by anchor points.

Parameters:

Parameter Type Required Default Description
profile dict Yes - Anchor points mapping HH:MM to target values
min float Yes - Hard lower bound
max float Yes - Hard upper bound
noise float No 0.0 Per-reading jitter (Β±noise)
volatility float No 0.0 Day-to-day variation in anchor targets (std_dev)
interpolation string No linear linear or step
precision int No None Round to N decimal places. 0 = integers
weekend_scale float No None Scale factor for weekends (0.0–1.0)

Data types: int, float

How it works: Interpolates between anchor points to get a base value. If volatility > 0, each day's anchors are independently shifted by a random normal amount. Noise is added, then clamped to [min, max] and rounded.

Example:

name: occupancy
data_type: int
generator:
  type: daily_profile
  min: 0
  max: 25
  precision: 0
  noise: 1.5
  volatility: 3.0
  profile:
    "00:00": 1
    "08:00": 19
    "12:00": 15
    "13:00": 22
    "17:00": 14
    "22:00": 2

Tips:

  • Use precision: 0 for headcounts, counters, or anything that must be a whole number
  • Use noise for reading-to-reading jitter, volatility for day-to-day variation in the overall shape
  • Use weekend_scale: 0.15 for office buildings that are nearly empty on weekends
  • Use interpolation: step for shift-based patterns that change abruptly (e.g., factory power at shift change)
  • Pairs well with derived columns β€” e.g., CO2 derived from occupancy

categorical

Purpose: Discrete choice from predefined values

Parameters:

Parameter Type Required Default Description
values list Yes - List of possible values
weights list[float] No uniform Probability weights (must sum to 1.0)

Data types: string, int, any

Example:

name: status
data_type: string
generator:
  type: categorical
  values: [Running, Idle, Error]
  weights: [0.8, 0.15, 0.05]


boolean

Purpose: True/False with configurable probability

Parameters:

Parameter Type Required Default Description
true_probability float No 0.5 Probability of True (0.0-1.0)

Data types: boolean

Example:

name: is_active
data_type: boolean
generator:
  type: boolean
  true_probability: 0.95


timestamp

Purpose: Auto-stepped timestamp column

Parameters:

Parameter Type Required Default Description
(none) - - - Uses scope.timestep automatically

Data types: timestamp

Format: ISO8601 Zulu (2026-01-01T00:00:00Z)

Example:

name: event_time
data_type: timestamp
generator:
  type: timestamp


sequential

Purpose: Auto-incrementing integers (globally unique across entities by default)

Parameters:

Parameter Type Required Default Description
start int No 1 Starting value
step int No 1 Increment step
unique_across_entities bool No true Each entity gets a non-overlapping ID range

Data types: int

Example:

name: record_id
data_type: int
generator:
  type: sequential
  start: 1000
  step: 10


constant

Purpose: Fixed values with template support

Parameters:

Parameter Type Required Default Description
value any Yes - Constant value or template

Data types: any

Magic variables: - {entity_id} - Current entity name - {entity_index} - Entity index (0-based) - {timestamp} - Current row timestamp - {row_number} - Row index

Example:

name: source_system
data_type: string
generator:
  type: constant
  value: "simulation"

# Or with template:
name: entity_ref
data_type: string
generator:
  type: constant
  value: "{entity_id}_record_{row_number}"


uuid

Purpose: Unique identifiers (UUID4 or UUID5)

Parameters:

Parameter Type Required Default Description
version int No 4 UUID version. Options: 4 (random) or 5 (deterministic/namespace-based). Only these values are supported
namespace string No DNS Namespace seed for UUID5 generation. Arbitrary string; default DNS uses the standard DNS namespace UUID

Data types: string

Example:

# Random (deterministic with seed):
name: order_id
data_type: string
generator:
  type: uuid
  version: 4

# Fully deterministic:
name: device_id
data_type: string
generator:
  type: uuid
  version: 5
  namespace: "com.example.devices"


email

Purpose: Email addresses

Parameters:

Parameter Type Required Default Description
domain string No example.com Email domain
pattern string No {entity}_{index} Username template. Available placeholders: {entity}, {index}, {row}. Default: {entity}_{index}

Data types: string

Example:

name: user_email
data_type: string
generator:
  type: email
  domain: company.com
  pattern: "user_{row}"

Output: user.5@company.com


ipv4

Purpose: IPv4 addresses

Parameters:

Parameter Type Required Default Description
subnet string No None CIDR subnet (e.g., 192.168.0.0/24)

Data types: string

Example:

# Full range:
name: client_ip
data_type: string
generator:
  type: ipv4

# Within subnet:
name: server_ip
data_type: string
generator:
  type: ipv4
  subnet: "10.0.0.0/8"


geo

Purpose: Geographic coordinates (latitude/longitude)

Parameters:

Parameter Type Required Default Description
bbox list[float] Yes - [min_lat, min_lon, max_lat, max_lon]
format string No tuple tuple or lat_lon_separate

Data types: tuple, object

Example:

name: location
data_type: string
generator:
  type: geo
  bbox: [37.0, -122.5, 37.8, -122.0]  # San Francisco Bay Area
  format: tuple

Output: (37.4532, -122.1823)


derived

Purpose: Calculated columns from other columns

Parameters:

Parameter Type Required Default Description
expression string Yes - Sandboxed Python expression referencing column names. See the full function table below for all available functions

Data types: any (depends on expression result)

Supported operations: - Arithmetic: +, -, *, /, **, //, % - Comparison: ==, !=, <, >, <=, >= - Logical: and, or, not - Conditionals: value if condition else other

Safe functions:

Category Function Signature Description
Math abs() abs(x) Absolute value
Math round() round(x, ndigits) Round to N decimal places
Math min() min(a, b, ...) Minimum value
Math max() max(a, b, ...) Maximum value
Type int() int(x) Cast to integer
Type float() float(x) Cast to float
Type str() str(x) Cast to string
Type bool() bool(x) Cast to boolean
Null-safe coalesce() coalesce(a, b, ...) First non-None value
Null-safe safe_div() safe_div(a, b, default=None) Division handling None and zero
Null-safe safe_mul() safe_mul(a, b, default=None) Multiplication handling None
Stateful prev() prev(column, default=None) Previous row value (see Stateful Functions)
Stateful ema() ema(column, alpha, default=None) Exponential moving average
Stateful delay() delay(column, steps, default=None) Value from N timesteps ago (transport delay)
Stateful pid() pid(pv, sp, Kp, Ki, Kd, dt, output_min, output_max, anti_windup) PID controller with anti-windup
Utility random() random() Random float in [0, 1)

Context variables available in expressions: _row_index, entity_id, _timestamp

Example:

# Simple calculation:
name: temp_fahrenheit
data_type: float
generator:
  type: derived
  expression: "temp_celsius * 1.8 + 32"

# Conditional:
name: status
data_type: string
generator:
  type: derived
  expression: "'HOT' if temp_celsius > 80 else 'NORMAL'"

# Null-safe:
name: efficiency
data_type: float
generator:
  type: derived
  expression: "safe_div(output, input, 0)"

Security

Expressions run in a restricted namespace. No imports, file I/O, or system calls allowed.


Compatibility Matrix

Generator Pandas Spark Polars Incremental Null Rate Overrides
range βœ… βœ… βœ… βœ… βœ… βœ…
random_walk βœ… βœ… βœ… βœ… βœ… βœ…
daily_profile βœ… βœ… βœ… βœ… βœ… βœ…
categorical βœ… βœ… βœ… βœ… βœ… βœ…
boolean βœ… βœ… βœ… βœ… βœ… βœ…
timestamp βœ… βœ… βœ… βœ… βœ… βœ…
sequential βœ… βœ… βœ… βœ… βœ… βœ…
constant βœ… βœ… βœ… βœ… βœ… βœ…
uuid βœ… βœ… βœ… βœ… βœ… βœ…
email βœ… βœ… βœ… βœ… βœ… βœ…
ipv4 βœ… βœ… βœ… βœ… βœ… βœ…
geo βœ… βœ… βœ… βœ… βœ… βœ…
derived βœ… βœ… βœ… βœ… ⚠️ * βœ…

Notes: - * Derived columns: null_rate applied after calculation. Use null-safe functions (coalesce, safe_div) if dependencies have nulls.

See Also