Markov Decision Processes Interactive App

Controls & Display

Algorithms

Prediction (Random Policy)
Control

Building the MDP

  • Add states: Double-click empty space
  • Add actions: Double-click a state
  • Connect: Shift+drag from an action to a state
  • Set type: Right-click a state to mark as Initial, Terminal, or Regular
  • Edit rewards: Select a state, then Up/Down arrows (±1)
  • Delete: Select items, then Delete/Backspace

Selection & Editing

  • Select: Click (single), Shift+Click (add), Cmd/Ctrl+Click (toggle)
  • Box select: Click and drag on empty space
  • Select all: Cmd/Ctrl+A
  • Move: Drag selected state nodes
  • Undo: Cmd/Ctrl+Z
  • Copy/Paste: Cmd/Ctrl+C (selection or full graph JSON) / Cmd/Ctrl+V

Algorithms

  • Expected Backup: Select states, compute Q = Σ P(s'|s,a)[R + γV(s')] using full distribution
  • Sample Backup: Select states, sample one outcome and do TD update Q ← Q + α[R + γV(s') - Q]
  • Value Iteration: Full dynamic programming until convergence
  • TD(λ): Temporal difference with eligibility traces (λ=0 is TD(0), λ=1 is MC-like)
  • Monte Carlo: Learn from sampled episodes (estimation or control)
  • Edit probabilities: Select an edge, then Up/Down arrows (±0.05)
  • SARSA: On-policy TD control (uses action actually taken)
  • Q-Learning: Off-policy TD control (uses max over next actions)
  • Dyna-Q: Model-based RL combining Q-learning with n simulated planning steps
  • Stop: Click a running algorithm's button to stop it