AI TRAINING

Reinforcement Learning for Operational Optimisation

Build and deploy RL agents that outperform heuristics on real-world pricing, routing, and scheduling problems.

See if this training is the right one for your team — free diagnostic

Format: programme
Duration: 30–45h
Level: advanced
Group size: 6–16
Price / participant: €3K–€6K
Group price: €25K–€55K
Audience: ML engineers and data scientists with supervised learning experience who need to tackle sequential decision-making problems in operations
Prerequisites: Solid Python skills, familiarity with NumPy/PyTorch, and hands-on experience training supervised or unsupervised ML models

What it covers

This practitioner-level programme covers the full RL stack: Markov Decision Processes, policy gradient methods (PPO, A3C), value-based approaches (DQN, Rainbow), and multi-agent settings. Participants work in simulation environments (Gymnasium, RLlib) to tackle canonical ops problems — dynamic pricing, vehicle routing, and job-shop scheduling — then learn how to move agents from simulation to production. The programme balances theory lectures (40%) with hands-on coding labs (60%), culminating in a capstone where teams deploy an RL policy against a business KPI benchmark.

What you'll be able to do

Formulate a real ops problem (pricing, routing, scheduling) as an MDP with correctly specified state space, action space, and reward function
Implement and tune a PPO agent in RLlib against a custom Gymnasium environment
Diagnose and fix common RL failure modes: reward hacking, instability, and slow convergence
Compare RL against supervised ML and OR baselines to make a justified build-vs-buy decision
Deploy a trained RL policy to a staging environment and monitor it against a business KPI

Topics covered

Markov Decision Processes: states, actions, rewards, discount factors
Value-based methods: DQN, Double DQN, Rainbow
Policy gradient methods: REINFORCE, PPO, A3C
Simulation environment design with Gymnasium and RLlib
Multi-agent RL for fleet and supply-chain settings
Dynamic pricing and demand-responsive RL policies
Vehicle routing and job-shop scheduling as RL problems
Sim-to-real transfer, reward shaping, and safe exploration

Delivery

Delivered as a 5-week blended programme: two 3-hour live virtual sessions per week led by an RL practitioner, supplemented by async reading and coding assignments. All labs run on cloud GPU instances (provided); participants need a laptop and a GitHub account. A private Slack workspace supports peer Q&A between sessions. In-person cohort delivery at client premises is available for groups of 10+, adding a full-day capstone hackathon.

What makes it work

Start with a small, well-scoped sub-problem where a simulator already exists or can be built cheaply before scaling
Involve domain experts (ops managers, logistics engineers) in reward function design and environment validation from day one
Establish clear baseline KPIs from OR or rule-based methods before training any agent, so improvement is measurable
Run parallel shadow deployments before switching RL policies into production to build stakeholder trust

Common mistakes

Designing a reward function that is easy to optimise but misaligned with the true business objective, leading to reward hacking
Skipping the simulation fidelity step and attempting sim-to-real transfer with an environment that does not capture key real-world constraints
Applying RL to problems where a well-tuned heuristic or mixed-integer programme already delivers near-optimal results at a fraction of the cost
Underestimating infrastructure complexity: RL agents in production require continuous monitoring and periodic retraining as environment dynamics shift

When NOT to take this

A team whose optimisation problem has a stable, fully observable state space and a well-defined objective function that integer programming solvers already handle within acceptable time — adding RL introduces unnecessary complexity, training cost, and interpretability risk with no measurable gain.

Providers to consider

Sources

Use cases this training unlocks

Other trainings at this level

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.

Run the diagnostic Book a call