Altitud
Edition · 26 April 2026
All trainings

AI TRAINING

Reinforcement Learning for Operational Optimisation

Build and deploy RL agents that outperform heuristics on real-world pricing, routing, and scheduling problems.

See if this training is the right one for your team — free diagnostic

Run the diagnostic
Format
programme
Duration
30–45h
Level
advanced
Group size
6–16
Price / participant
€3K–€6K
Group price
€25K–€55K
Audience
ML engineers and data scientists with supervised learning experience who need to tackle sequential decision-making problems in operations
Prerequisites
Solid Python skills, familiarity with NumPy/PyTorch, and hands-on experience training supervised or unsupervised ML models

What it covers

This practitioner-level programme covers the full RL stack: Markov Decision Processes, policy gradient methods (PPO, A3C), value-based approaches (DQN, Rainbow), and multi-agent settings. Participants work in simulation environments (Gymnasium, RLlib) to tackle canonical ops problems — dynamic pricing, vehicle routing, and job-shop scheduling — then learn how to move agents from simulation to production. The programme balances theory lectures (40%) with hands-on coding labs (60%), culminating in a capstone where teams deploy an RL policy against a business KPI benchmark.

What you'll be able to do

  • Formulate a real ops problem (pricing, routing, scheduling) as an MDP with correctly specified state space, action space, and reward function
  • Implement and tune a PPO agent in RLlib against a custom Gymnasium environment
  • Diagnose and fix common RL failure modes: reward hacking, instability, and slow convergence
  • Compare RL against supervised ML and OR baselines to make a justified build-vs-buy decision
  • Deploy a trained RL policy to a staging environment and monitor it against a business KPI

Topics covered

  • Markov Decision Processes: states, actions, rewards, discount factors
  • Value-based methods: DQN, Double DQN, Rainbow
  • Policy gradient methods: REINFORCE, PPO, A3C
  • Simulation environment design with Gymnasium and RLlib
  • Multi-agent RL for fleet and supply-chain settings
  • Dynamic pricing and demand-responsive RL policies
  • Vehicle routing and job-shop scheduling as RL problems
  • Sim-to-real transfer, reward shaping, and safe exploration

Delivery

Delivered as a 5-week blended programme: two 3-hour live virtual sessions per week led by an RL practitioner, supplemented by async reading and coding assignments. All labs run on cloud GPU instances (provided); participants need a laptop and a GitHub account. A private Slack workspace supports peer Q&A between sessions. In-person cohort delivery at client premises is available for groups of 10+, adding a full-day capstone hackathon.

What makes it work

  • Start with a small, well-scoped sub-problem where a simulator already exists or can be built cheaply before scaling
  • Involve domain experts (ops managers, logistics engineers) in reward function design and environment validation from day one
  • Establish clear baseline KPIs from OR or rule-based methods before training any agent, so improvement is measurable
  • Run parallel shadow deployments before switching RL policies into production to build stakeholder trust

Common mistakes

  • Designing a reward function that is easy to optimise but misaligned with the true business objective, leading to reward hacking
  • Skipping the simulation fidelity step and attempting sim-to-real transfer with an environment that does not capture key real-world constraints
  • Applying RL to problems where a well-tuned heuristic or mixed-integer programme already delivers near-optimal results at a fraction of the cost
  • Underestimating infrastructure complexity: RL agents in production require continuous monitoring and periodic retraining as environment dynamics shift

When NOT to take this

A team whose optimisation problem has a stable, fully observable state space and a well-defined objective function that integer programming solvers already handle within acceptable time — adding RL introduces unnecessary complexity, training cost, and interpretability risk with no measurable gain.

Providers to consider

Sources

Use cases this training unlocks

Other trainings at this level

This training is part of a Data & AI catalog built for leaders serious about execution. Take the free diagnostic to see which trainings your team needs.