Skip to content

๐ŸŽ“ ChemE ร— Data Engineering Course - START HERE

A self-paced course teaching Chemical Engineers data engineering through process control


What's Inside

16 Lessons ยท 35 YAML Examples ยท ~20-25 Hours

Every lesson includes: - Theory recap from Seborg textbook - Runnable YAML examples (use odibi run to execute) - Data engineering concepts explained - Validation patterns and quarantine - Exercises with solution hints - Real plant operation connections - Progressive difficulty from beginner to expert

Part I time: ~5 hours (beginner-friendly)


๐Ÿš€ Quick Start (5 Minutes)

Step 1: Install Odibi

pip install odibi
odibi --version

Step 2: Run Your First Example

# Navigate to course examples
cd examples/cheme_course

# Run first simulation
odibi run L00_setup/tank_data.yaml

# View the generated data
python -c "import pandas as pd; print(pd.read_parquet('data/tank_data.parquet').head())"

Step 3: Start Learning Open L00: Setup and follow along.

You're learning! ๐ŸŽ‰


๐Ÿ“š Your Learning Path

Part I: Foundations

L00: Setup & Basics (45 min)
๐ŸŽฏ Install Odibi, run first pipeline, understand data formats
๐Ÿ“‚ Examples: tank_data.yaml, tank_data_parquet.yaml, tank_realistic.yaml, multi_entity.yaml

L01: CV/MV/DV and Time Series (45 min)
๐ŸŽฏ Controlled/Manipulated/Disturbance variables
๐Ÿ“‚ Examples: mixing_tank_ph.yaml, heat_exchanger.yaml

L02: Degrees of Freedom + Balances (60 min)
๐ŸŽฏ DoF analysis, mass/energy balances
๐Ÿ“‚ Examples: tank_mass_balance.yaml, cstr_energy_balance.yaml

L03: First-Order Dynamics (45 min)
๐ŸŽฏ prev() and ema() stateful functions
๐Ÿ“‚ Examples: tank_temperature_fo.yaml, ema_filtering.yaml

L04: FOPTD Transfer Functions (45 min)
๐ŸŽฏ First-Order Plus Time Delay modeling
๐Ÿ“‚ Examples: tank_foptd.yaml, heat_exchanger_foptd.yaml, foptd_parameterized.yaml

L05: Second-Order Systems (60 min)
๐ŸŽฏ Damping ratio, overshoot, settling time
๐Ÿ“‚ Examples: pressure_underdamped.yaml, valve_actuator.yaml, damping_comparison.yaml


Part II: Feedback Control & System ID

L06: PID Basics (60 min)
๐ŸŽฏ P/I/D actions, pid() function, anti-windup
๐Ÿ“‚ Examples: tank_pi.yaml

L07: PID Tuning Methods (60 min)
๐ŸŽฏ Ziegler-Nichols, Cohen-Coon tuning
๐Ÿ“‚ Examples: ziegler_nichols.yaml, cohen_coon.yaml

L08: Disturbance Rejection (60 min)
๐ŸŽฏ Load rejection, feedwater disturbances
๐Ÿ“‚ Examples: load_disturbance.yaml, feedwater_disturbance.yaml

L09: System Identification (90 min)
๐ŸŽฏ Step test, pulse test for parameter estimation
๐Ÿ“‚ Examples: step_response.yaml, pulse_test.yaml


Part III: Advanced Control Strategies

L10: Interacting Control Loops (60 min)
๐ŸŽฏ MIMO systems, loop interactions
๐Ÿ“‚ Examples: dual_temperature.yaml, pressure_flow.yaml

L11: Cascade Control (60 min)
๐ŸŽฏ Primary/secondary loops, fast inner loops
๐Ÿ“‚ Examples: temperature_cascade.yaml, level_flow_cascade.yaml

L12: Feedforward Control (60 min)
๐ŸŽฏ Anticipatory control, ratio control
๐Ÿ“‚ Examples: simple_feedforward.yaml, ratio_control.yaml

L13: Nonlinear Systems (60 min)
๐ŸŽฏ Valve characteristics, pH neutralization
๐Ÿ“‚ Examples: valve_nonlinearity.yaml, ph_neutralization.yaml

L14: Model Predictive Control Intro (90 min)
๐ŸŽฏ Prediction, optimization, constraints
๐Ÿ“‚ Examples: mpc_basics.yaml, constrained_control.yaml


Capstone: Real-World Digital Twin

L15: CSTR Digital Twin (2-3 hours)
๐ŸŽฏ Complete reactor model with mass, energy, kinetics
๐Ÿ“‚ Examples: cstr_full_model.yaml, optimization.yaml


๐Ÿ’ช What You'll Learn

Data Engineering Skills

  • Generate realistic plant time-series data
  • Use CSV, Parquet, and Delta Lake formats
  • Validate data quality with range checks
  • Implement quarantine patterns
  • Build reproducible simulations
  • Handle multi-entity pipelines

Process Control Skills

  • Map process control to data schemas (CV/MV/DV)
  • Implement mass and energy balances
  • Model first and second-order dynamics
  • Design and tune PID controllers
  • Perform system identification
  • Build cascade and feedforward controllers
  • Handle nonlinear processes
  • Apply basic model predictive control

Chemical Engineering

  • CSTR modeling (kinetics, thermodynamics)
  • Heat exchanger dynamics
  • Tank level control
  • pH neutralization
  • Operating point optimization

Career Skills

  • Portfolio of 35+ working examples
  • Production-ready data pipelines
  • Bridge between ChemE and data roles

๐ŸŽฏ What You'll Be Able to Do

After completing this course, you can:

  • Build simulations generating millions of rows of realistic process data
  • Implement PID controllers from Seborg textbook
  • Tune controllers using industry methods (Z-N, Cohen-Coon)
  • Model complex systems (CSTR, heat exchangers, cascades)
  • Explain to data engineers: "This is a controlled variable"
  • Explain to process engineers: "This is a Parquet file"
  • Build production-ready data pipelines
  • Create digital twins of chemical processes

๐Ÿ“Š Course Statistics

Metric Count
Total Lessons 16 (L00-L15)
YAML Examples 35
Course Time 20-25 hours
Exercises 40+ hands-on problems
Seborg Coverage Chapters 1-20

๐Ÿ› ๏ธ Course Structure

Where Everything Lives

YAML Examples:

examples/cheme_course/
โ”œโ”€โ”€ L00_setup/           # 4 examples
โ”œโ”€โ”€ L01_cv_mv_dv/        # 2 examples
โ”œโ”€โ”€ L02_dof_balances/    # 2 examples
โ”œโ”€โ”€ L03_first_order/     # 2 examples
โ”œโ”€โ”€ L04_foptd/           # 3 examples
โ”œโ”€โ”€ L05_second_order/    # 3 examples
โ”œโ”€โ”€ L06_pid_basics/      # 1 example
โ”œโ”€โ”€ L07_tuning/          # 2 examples
โ”œโ”€โ”€ L08_disturbances/    # 2 examples
โ”œโ”€โ”€ L09_system_id/       # 2 examples
โ”œโ”€โ”€ L10_interacting_loops/ # 2 examples
โ”œโ”€โ”€ L11_cascade/         # 2 examples
โ”œโ”€โ”€ L12_feedforward/     # 2 examples
โ”œโ”€โ”€ L13_nonlinearity/    # 2 examples
โ”œโ”€โ”€ L14_mpc_lite/        # 2 examples
โ”œโ”€โ”€ L15_cstr_digital_twin/ # 2 examples
โ””โ”€โ”€ README.md            # Quick reference

Lesson Documentation:

docs/learning/cheme_data_course/
โ”œโ”€โ”€ START_HERE.md        # This file
โ”œโ”€โ”€ index.md             # Course overview
โ”œโ”€โ”€ lessons/
โ”‚   โ”œโ”€โ”€ L00_setup.md through L15_cstr_digital_twin.md
โ”‚   โ””โ”€โ”€ (16 lesson files with theory + exercises)
โ””โ”€โ”€ solutions/
    โ””โ”€โ”€ index.md         # Solutions hub


๐Ÿ“– Key Concepts & Patterns

Row Number Counter (Essential Pattern)

# Use this for step changes and time-dependent logic
- name: row_num
  data_type: int
  generator:
    type: derived
    expression: "prev('row_num', -1) + 1"

# Then create step changes:
- name: setpoint
  data_type: float
  generator:
    type: derived
    expression: "50.0 if row_num < 100 else 60.0"

PID Controller

- name: controller_output
  data_type: float
  generator:
    type: derived
    expression: >
      pid(
        process_variable,
        setpoint,
        Kp,    # Proportional gain
        Ki,    # Integral gain
        Kd,    # Derivative gain
        dt,    # Sample time (seconds)
        min,   # Output minimum
        max,   # Output maximum
        true   # Anti-windup enabled
      )

EMA Smoothing

- name: smoothed_value
  data_type: float
  generator:
    type: derived
    expression: "ema('raw_value', alpha, default)"

๐ŸŽ“ Teaching Philosophy

1. Hands-On First

Don't just read - run code, generate data, see results immediately.

2. ChemE Problems โ†’ Data Skills

Learn Parquet by simulating tanks. Learn validation by modeling reactors.

3. Incremental Complexity

L00: Simple CSV files โ†’ L15: Production digital twin pipelines

4. Real-World Focus

Every lesson connects to actual plant operations.

5. Portfolio Building

By the end, you have 35+ working examples for LinkedIn/interviews.


๐Ÿค” FAQ

Q: Do I need the Seborg textbook?
A: No! Lessons recap key concepts. But it helps for deeper theory.

Q: Can I skip lessons?
A: Start with L00-L03 to learn basics. Then pick topics you need.

Q: How long does the full course take?
A: 20-25 hours total. Part I (L00-L05) takes ~5 hours.

Q: Can I use this to teach others?
A: Absolutely! Share, improve, contribute back.

Q: What if I get stuck?
A: Check solutions, consult docs, or ask questions.

Q: Can I run these on Databricks?
A: Yes! All examples work on local Pandas or Databricks Spark.


๐Ÿ”— Additional Resources

Course Materials: - Course Overview - Philosophy and structure - Seborg Textbook Mapping - All chapters mapped to Odibi - Process Simulation Guide - Deep dive on stateful functions - Solutions Index - Exercise solutions

Framework Guides: - Chemical Engineering Simulation - Thermodynamics Transformers - Unit Conversion - Custom Functions Reference


๐Ÿ‘‰ Get Started Now

  1. Read Course Overview (10 min)
  2. Install Odibi (5 min)
  3. Start L00: Setup (45 min)
  4. Progress through L01-L05 at your own pace (4 hours)
  5. Advance to Part II (L06-L09) for control topics (4 hours)
  6. Master Part III (L10-L14) for advanced strategies (5 hours)
  7. Build L15 digital twin capstone project (2-3 hours)

Alternative Paths

Path A - Just the Basics (5 hours): L00 โ†’ L01 โ†’ L02 โ†’ L03 โ†’ L04 โ†’ L05

Path B - PID Focus (8 hours): L00 โ†’ L01 โ†’ L03 โ†’ L06 โ†’ L07 โ†’ L08

Path C - Advanced Only (6 hours): L00 (setup) โ†’ L10 โ†’ L11 โ†’ L12 โ†’ L13 โ†’ L14

Path D - Digital Twin Sprint (4 hours): L00 โ†’ L02 โ†’ L15


๐Ÿ‘‰ Start with L00: Setup & Basics


Built with โค๏ธ for Chemical Engineers learning Data Engineering
Part of the Odibi Framework - Explicit over implicit, Stories over magic