ChemE × Data Engineering Course¶

Learn Process Control AND Data Engineering Together

What is This Course?¶

This is a self-paced, hands-on course that teaches you: 1. Process Dynamics & Control (from Seborg's textbook) - refreshed for working engineers 2. Data Engineering fundamentals - CSV, Parquet, time series, validation, pipelines

The approach: You already know ChemE. We'll use process control problems you understand to teach data engineering concepts.

By the end: You'll confidently model process systems with Odibi AND understand modern data engineering practices.

Who Is This For?¶

Chemical engineers in operations/analytics who want to learn data engineering
ChemEs who forgot process control and want a practical refresher
Data engineers learning process industries
Anyone tired of generic e-commerce data tutorials

Prerequisites: - Basic ChemE degree (understand tanks, reactors, control loops) - Python basics (can run scripts, install packages) - Willingness to learn by doing

Course Structure¶

15 Lessons + 1 Capstone (~20-30 hours total)¶

Part I: Foundations (Modeling + Data Basics)¶

Learn: First-order systems, transfer functions, time series, schemas, validation

L00: Setup - Odibi Basics & Data Formats (45 min)
L01: CV/MV/DV and Time Series Data (45 min)
L02: Degrees of Freedom + Mass/Energy Balances (60 min)
L03: First-Order Dynamics (45 min)
L04: FOPTD Transfer Functions (45 min)
L05: Second-Order Systems + Overshoot (60 min)

Part II: Feedback Control + Identification¶

Learn: PID control, tuning, disturbances, system ID, train/test splits

L06: PID Basics + Constraints (60 min)
L07: Tuning + Anti-windup (60 min)
L08: Disturbances + Setpoint Strategies (60 min)
L09: System Identification (PRBS/ARX) (90 min)

Part III: Multiloop + Advanced Topics¶

Learn: Cascade, feedforward, interactions, gain scheduling, MPC basics

L10: Interacting Loops (60 min)
L11: Cascade Control (60 min)
L12: Feedforward/Ratio Control (60 min)
L13: Nonlinearity + Saturation (60 min)
L14: MPC-lite (IMC Basics) (90 min)

Capstone: Real-World Pipeline¶

Build: Bronze → Silver → Gold CSTR "digital twin" with lineage and validation

L15: Digital Twin Pipeline (2-3 hours)

How to Use This Course¶

The Learning Path:¶

Start with L00 - Get Odibi installed and working
Do lessons in order - Each builds on previous concepts
Run the YAML - Don't just read; actually execute the examples
Do the exercises - Concepts stick when you struggle a bit
Check solutions - After attempting exercises, compare your approach
Reflect - Each lesson ends with "how does this relate to real plants?"

Each Lesson Contains:¶

Prerequisites: What you need to know first
Learning Objectives: 3-5 specific skills you'll gain
Theory Recap: Seborg concepts translated to discrete-time
Odibi Hands-On: Minimal YAML + realistic YAML you can run
Data Engineering Focus: The new DE concept for this lesson
Validation: How to check your work (tests, quarantine)
Exercises: 2-4 tasks to reinforce learning
Solutions: Full answers + YAML files
Reflection: Connection to real-world plant operations

Recommended Pace:¶

Intensive: 2-3 lessons per day → Done in 1 week
Steady: 1 lesson per day → Done in 2-3 weeks
Relaxed: 2-3 lessons per week → Done in 6-8 weeks

There is no deadline. Go at your own pace.

What You'll Learn¶

Process Control Skills:¶

First-order and second-order system dynamics
Transfer function models (FOPTD, SOPTD)
PID controller design and tuning (ZN, IMC methods)
Anti-windup and constraint handling
Cascade and feedforward control strategies
System identification from test data (PRBS, ARX)
Multivariable interactions (RGA basics)
Gain scheduling for nonlinear processes

Data Engineering Skills:¶

Time series data modeling (timestamps, sampling, resampling)
Data formats (CSV vs Parquet vs Delta Lake)
Schema design and validation
Data quality checks and quarantine patterns
Partitioning strategies for time-series data
Train/test splits for model validation
Feature materialization and storage
Pipeline dependency management
Lineage tracking (where data came from)
Incremental loading and idempotent runs

Odibi Framework Skills:¶

YAML pipeline configuration
Generators: random_walk, derived, range, constant
Stateful functions: prev(), ema(), pid(), delay()
Validation tests and quality gates
Quarantine routing for bad data
Multi-entity simulations and joins
SCD2 pattern for tracking equipment changes
Delta Lake integration
Medallion architecture (Bronze/Silver/Gold)

Course Materials¶

Textbook Reference:¶

Seborg → Odibi Mapping Guide - Complete chapter mapping

Technical References:¶

Process Simulation Guide - Stateful functions deep dive
YAML Schema Reference - All configuration options
Simulation Generators Reference - Generator catalog

Example Code:¶

All lesson YAML files: /examples/cheme_course/L##_name/

Solutions:¶

Full exercise solutions: Solutions Index

Teaching Philosophy¶

1. Learn by Doing¶

You won't memorize theory. You'll build working simulations.

2. Familiar Problems, New Skills¶

We use ChemE problems you already understand (tanks, reactors, PID loops) to teach data concepts.

3. Mistakes Are Learning¶

Exercises are designed to make you struggle a bit. That's where learning happens.

4. Real-World Context¶

Every lesson connects back to actual plant operations. No academic ivory tower.

5. Build Your Portfolio¶

By the end, you have 15+ working examples to show in interviews or LinkedIn.

What You'll Build¶

By Lesson 5:¶

Realistic FOPTD tank temperature data with noise, delays, and validation

By Lesson 9:¶

System identification pipeline that fits ARX models from PRBS tests

By Lesson 14:¶

Multi-loop cascade control with feedforward and constraint handling

By Lesson 15 (Capstone):¶

Production-grade CSTR "digital twin": - Bronze: Raw sensor CSV ingestion - Silver: Filtered, validated, quarantine routing - Gold: Control outputs, KPIs, features - Lineage: Track data provenance - Schedule: Incremental runs - (Optional) Cloud: Azure Blob + SQL sink

Why This Matters¶

Traditional ChemE education: - "Here's Laplace transforms and Bode plots" - Often disconnected from plant reality - Doesn't teach data engineering

Traditional data engineering: - "Here's e-commerce clickstream data" - Boring, generic, doesn't leverage your ChemE skills - Doesn't teach process control

This course: - Process control concepts you'll actually use - Data engineering skills that are immediately practical - Builds on knowledge you already have - Creates unique portfolio content (ChemE + DE)

Success Stories (What You'll Be Able to Do)¶

After completing this course:

At Work:¶

Simulate process behavior before building expensive pilots
Generate realistic test data for analytics projects
Validate control strategies with data pipelines
Communicate with both process engineers AND data engineers

In Interviews:¶

Show working examples (not just theory)
Demonstrate unique ChemE + data engineering combo
Have portfolio projects that aren't generic

On LinkedIn:¶

Post content that stands out (no boring clickstream demos)
Showcase process control knowledge + modern data skills
Attract roles that value both domains

Getting Started¶

Step 1: Installation¶

Go to L00: Setup and install Odibi

Step 2: Run Your First Pipeline¶

Complete the L00 exercises (15 minutes)

Step 3: Start Learning¶

Move to L01: CV/MV/DV

Need Help?¶

Questions? Open an issue: github.com/henryodibi11/Odibi/issues
Found a bug? Submit a PR
Want to contribute? Add your own lessons or examples

Let's Begin¶

Ready to learn process control AND data engineering?

👉 Start with L00: Setup

This course is part of the Odibi Framework - making data engineering accessible to chemical engineers.