๐ ChemE ร Data Engineering Course - START HERE¶
A self-paced course teaching Chemical Engineers data engineering through process control
What's Inside¶
16 Lessons ยท 35 YAML Examples ยท ~20-25 Hours¶
Every lesson includes:
- Theory recap from Seborg textbook
- Runnable YAML examples (use odibi run to execute)
- Data engineering concepts explained
- Validation patterns and quarantine
- Exercises with solution hints
- Real plant operation connections
- Progressive difficulty from beginner to expert
Part I time: ~5 hours (beginner-friendly)
๐ Quick Start (5 Minutes)¶
Step 1: Install Odibi
Step 2: Run Your First Example
# Navigate to course examples
cd examples/cheme_course
# Run first simulation
odibi run L00_setup/tank_data.yaml
# View the generated data
python -c "import pandas as pd; print(pd.read_parquet('data/tank_data.parquet').head())"
Step 3: Start Learning Open L00: Setup and follow along.
You're learning! ๐
๐ Your Learning Path¶
Part I: Foundations¶
L00: Setup & Basics (45 min)
๐ฏ Install Odibi, run first pipeline, understand data formats
๐ Examples: tank_data.yaml, tank_data_parquet.yaml, tank_realistic.yaml, multi_entity.yaml
L01: CV/MV/DV and Time Series (45 min)
๐ฏ Controlled/Manipulated/Disturbance variables
๐ Examples: mixing_tank_ph.yaml, heat_exchanger.yaml
L02: Degrees of Freedom + Balances (60 min)
๐ฏ DoF analysis, mass/energy balances
๐ Examples: tank_mass_balance.yaml, cstr_energy_balance.yaml
L03: First-Order Dynamics (45 min)
๐ฏ prev() and ema() stateful functions
๐ Examples: tank_temperature_fo.yaml, ema_filtering.yaml
L04: FOPTD Transfer Functions (45 min)
๐ฏ First-Order Plus Time Delay modeling
๐ Examples: tank_foptd.yaml, heat_exchanger_foptd.yaml, foptd_parameterized.yaml
L05: Second-Order Systems (60 min)
๐ฏ Damping ratio, overshoot, settling time
๐ Examples: pressure_underdamped.yaml, valve_actuator.yaml, damping_comparison.yaml
Part II: Feedback Control & System ID¶
L06: PID Basics (60 min)
๐ฏ P/I/D actions, pid() function, anti-windup
๐ Examples: tank_pi.yaml
L07: PID Tuning Methods (60 min)
๐ฏ Ziegler-Nichols, Cohen-Coon tuning
๐ Examples: ziegler_nichols.yaml, cohen_coon.yaml
L08: Disturbance Rejection (60 min)
๐ฏ Load rejection, feedwater disturbances
๐ Examples: load_disturbance.yaml, feedwater_disturbance.yaml
L09: System Identification (90 min)
๐ฏ Step test, pulse test for parameter estimation
๐ Examples: step_response.yaml, pulse_test.yaml
Part III: Advanced Control Strategies¶
L10: Interacting Control Loops (60 min)
๐ฏ MIMO systems, loop interactions
๐ Examples: dual_temperature.yaml, pressure_flow.yaml
L11: Cascade Control (60 min)
๐ฏ Primary/secondary loops, fast inner loops
๐ Examples: temperature_cascade.yaml, level_flow_cascade.yaml
L12: Feedforward Control (60 min)
๐ฏ Anticipatory control, ratio control
๐ Examples: simple_feedforward.yaml, ratio_control.yaml
L13: Nonlinear Systems (60 min)
๐ฏ Valve characteristics, pH neutralization
๐ Examples: valve_nonlinearity.yaml, ph_neutralization.yaml
L14: Model Predictive Control Intro (90 min)
๐ฏ Prediction, optimization, constraints
๐ Examples: mpc_basics.yaml, constrained_control.yaml
Capstone: Real-World Digital Twin¶
L15: CSTR Digital Twin (2-3 hours)
๐ฏ Complete reactor model with mass, energy, kinetics
๐ Examples: cstr_full_model.yaml, optimization.yaml
๐ช What You'll Learn¶
Data Engineering Skills¶
- Generate realistic plant time-series data
- Use CSV, Parquet, and Delta Lake formats
- Validate data quality with range checks
- Implement quarantine patterns
- Build reproducible simulations
- Handle multi-entity pipelines
Process Control Skills¶
- Map process control to data schemas (CV/MV/DV)
- Implement mass and energy balances
- Model first and second-order dynamics
- Design and tune PID controllers
- Perform system identification
- Build cascade and feedforward controllers
- Handle nonlinear processes
- Apply basic model predictive control
Chemical Engineering¶
- CSTR modeling (kinetics, thermodynamics)
- Heat exchanger dynamics
- Tank level control
- pH neutralization
- Operating point optimization
Career Skills¶
- Portfolio of 35+ working examples
- Production-ready data pipelines
- Bridge between ChemE and data roles
๐ฏ What You'll Be Able to Do¶
After completing this course, you can:
- Build simulations generating millions of rows of realistic process data
- Implement PID controllers from Seborg textbook
- Tune controllers using industry methods (Z-N, Cohen-Coon)
- Model complex systems (CSTR, heat exchangers, cascades)
- Explain to data engineers: "This is a controlled variable"
- Explain to process engineers: "This is a Parquet file"
- Build production-ready data pipelines
- Create digital twins of chemical processes
๐ Course Statistics¶
| Metric | Count |
|---|---|
| Total Lessons | 16 (L00-L15) |
| YAML Examples | 35 |
| Course Time | 20-25 hours |
| Exercises | 40+ hands-on problems |
| Seborg Coverage | Chapters 1-20 |
๐ ๏ธ Course Structure¶
Where Everything Lives¶
YAML Examples:
examples/cheme_course/
โโโ L00_setup/ # 4 examples
โโโ L01_cv_mv_dv/ # 2 examples
โโโ L02_dof_balances/ # 2 examples
โโโ L03_first_order/ # 2 examples
โโโ L04_foptd/ # 3 examples
โโโ L05_second_order/ # 3 examples
โโโ L06_pid_basics/ # 1 example
โโโ L07_tuning/ # 2 examples
โโโ L08_disturbances/ # 2 examples
โโโ L09_system_id/ # 2 examples
โโโ L10_interacting_loops/ # 2 examples
โโโ L11_cascade/ # 2 examples
โโโ L12_feedforward/ # 2 examples
โโโ L13_nonlinearity/ # 2 examples
โโโ L14_mpc_lite/ # 2 examples
โโโ L15_cstr_digital_twin/ # 2 examples
โโโ README.md # Quick reference
Lesson Documentation:
docs/learning/cheme_data_course/
โโโ START_HERE.md # This file
โโโ index.md # Course overview
โโโ lessons/
โ โโโ L00_setup.md through L15_cstr_digital_twin.md
โ โโโ (16 lesson files with theory + exercises)
โโโ solutions/
โโโ index.md # Solutions hub
๐ Key Concepts & Patterns¶
Row Number Counter (Essential Pattern)¶
# Use this for step changes and time-dependent logic
- name: row_num
data_type: int
generator:
type: derived
expression: "prev('row_num', -1) + 1"
# Then create step changes:
- name: setpoint
data_type: float
generator:
type: derived
expression: "50.0 if row_num < 100 else 60.0"
PID Controller¶
- name: controller_output
data_type: float
generator:
type: derived
expression: >
pid(
process_variable,
setpoint,
Kp, # Proportional gain
Ki, # Integral gain
Kd, # Derivative gain
dt, # Sample time (seconds)
min, # Output minimum
max, # Output maximum
true # Anti-windup enabled
)
EMA Smoothing¶
- name: smoothed_value
data_type: float
generator:
type: derived
expression: "ema('raw_value', alpha, default)"
๐ Teaching Philosophy¶
1. Hands-On First¶
Don't just read - run code, generate data, see results immediately.
2. ChemE Problems โ Data Skills¶
Learn Parquet by simulating tanks. Learn validation by modeling reactors.
3. Incremental Complexity¶
L00: Simple CSV files โ L15: Production digital twin pipelines
4. Real-World Focus¶
Every lesson connects to actual plant operations.
5. Portfolio Building¶
By the end, you have 35+ working examples for LinkedIn/interviews.
๐ค FAQ¶
Q: Do I need the Seborg textbook?
A: No! Lessons recap key concepts. But it helps for deeper theory.
Q: Can I skip lessons?
A: Start with L00-L03 to learn basics. Then pick topics you need.
Q: How long does the full course take?
A: 20-25 hours total. Part I (L00-L05) takes ~5 hours.
Q: Can I use this to teach others?
A: Absolutely! Share, improve, contribute back.
Q: What if I get stuck?
A: Check solutions, consult docs, or ask questions.
Q: Can I run these on Databricks?
A: Yes! All examples work on local Pandas or Databricks Spark.
๐ Additional Resources¶
Course Materials: - Course Overview - Philosophy and structure - Seborg Textbook Mapping - All chapters mapped to Odibi - Process Simulation Guide - Deep dive on stateful functions - Solutions Index - Exercise solutions
Framework Guides: - Chemical Engineering Simulation - Thermodynamics Transformers - Unit Conversion - Custom Functions Reference
๐ Get Started Now¶
Recommended Path¶
- Read Course Overview (10 min)
- Install Odibi (5 min)
- Start L00: Setup (45 min)
- Progress through L01-L05 at your own pace (4 hours)
- Advance to Part II (L06-L09) for control topics (4 hours)
- Master Part III (L10-L14) for advanced strategies (5 hours)
- Build L15 digital twin capstone project (2-3 hours)
Alternative Paths¶
Path A - Just the Basics (5 hours): L00 โ L01 โ L02 โ L03 โ L04 โ L05
Path B - PID Focus (8 hours): L00 โ L01 โ L03 โ L06 โ L07 โ L08
Path C - Advanced Only (6 hours): L00 (setup) โ L10 โ L11 โ L12 โ L13 โ L14
Path D - Digital Twin Sprint (4 hours): L00 โ L02 โ L15
๐ Start with L00: Setup & Basics
Built with โค๏ธ for Chemical Engineers learning Data Engineering
Part of the Odibi Framework - Explicit over implicit, Stories over magic