Part 7: Planning and The Generative Shift

Welcome to Part 7 of "Robotics Zero to Hero." We can model geometry, compute kinematics, and stabilize dynamics. The next step is autonomy: getting from Point A to Point B without crashing. This is Motion Planning — and it is the clearest example in all of robotics of how high dimensionality changes the kind of algorithm you must use.

The Math: Configuration Space and the Curse of Dimensionality

Humans plan in the 3D world (task space). Robots must plan in Configuration Space (C-space) $\mathcal Q$ , the $n$ -dimensional manifold from Part 1 where each point is a complete posture. Obstacles in the world map to complicated, warped C-obstacles $\mathcal Q_{obs}$ , and planning means finding a continuous path through the free space $\mathcal Q_{free} = \mathcal Q \setminus \mathcal Q_{obs}$ from $\mathbf q_{start}$ to $\mathbf q_{goal}$ .

Why not just lay down a grid and run A*? Because of the curse of dimensionality. A grid with resolution $r$ per axis has

N_{cells} = r^{\,n}

cells. At a modest $r=100$ : a 2-DOF planar robot needs $10^4$ cells (trivial), a 6-DOF arm needs $10^{12}$ (already infeasible), and a 20-DOF hand needs $10^{40}$ (more cells than atoms in your body). Grid and exact methods die exponentially in $n$ . This single fact is why high-dimensional robotics looks nothing like low-dimensional robotics.

Sampling-based planning: trading completeness for tractability

The escape is to sample C-space rather than enumerate it. Rapidly-exploring Random Trees (RRT) grow a tree by repeatedly shooting toward random configurations:

Initialize a tree at $\mathbf q_{start}$ .
Sample a random configuration $\mathbf q_{rand}$ .
Find the nearest tree node $\mathbf q_{near}$ .
Step from $\mathbf q_{near}$ toward $\mathbf q_{rand}$ to get $\mathbf q_{new}$ .
If the edge is collision-free, add $\mathbf q_{new}$ .
Repeat until the tree reaches $\mathbf q_{goal}$ .

RRT is probabilistically complete (probability of finding a solution → 1 as samples → ∞) but its paths are jagged and not optimal. RRT* (Karaman & Frazzoli, arXiv:1105.1186) adds a rewiring step and is asymptotically optimal — the path cost converges almost surely to the minimum. The crucial property: sampling cost scales with problem difficulty, not with $r^n$ , which is why it survives in high dimensions where grids cannot.

Python Implementation: A Simple Path Planner

Loading Interactive Python Environment...

The planner zoo and its trade-offs

Planner	Guarantee	Strengths	Shortcomings
Grid + A* / Dijkstra	Resolution-complete, optimal-on-grid	Simple, optimal in low-dim	$O(r^n)$ — dies above ~4–5 DOF
PRM	Probabilistically complete	Reusable roadmap for multi-query	Poor in narrow passages; build cost
RRT	Probabilistically complete	Fast single-query, high-dim friendly	Jagged, sub-optimal paths
RRT*	Asymptotically optimal	Converges to optimum	Slow convergence; memory grows
Trajectory optimization (CHOMP/TrajOpt)	Locally optimal	Smooth, respects dynamics/costs	Local minima; needs a seed
MPPI (sampling MPC)	Stochastic, real-time	Handles nonlinear dynamics, GPU-parallel	No global guarantee; tuning-sensitive
Diffusion planners	Learned, multimodal	Captures complex constraints from data	Needs demonstrations; inference cost

The Generative Shift: Diffusion Models

For cluttered scenes or nuanced contact-rich tasks (grasping an irregular object), classical planners struggle: the cost landscape is riddled with local minima and the "right" behavior is multimodal (many equally-good ways to act). The modern answer is generative planning with diffusion models.

A diffusion model corrupts data with Gaussian noise until it is pure static, then trains a network to reverse that process. In robotics the "data" is a trajectory $\boldsymbol\tau \in \mathbb{R}^{H\times D}$ ( $H$ time steps, $D$ degrees of freedom). The forward (noising) process is a stochastic differential equation

d\boldsymbol\tau = \mathbf f(\boldsymbol\tau, t)\,dt + g(t)\,d\mathbf w

and planning runs the reverse SDE, which depends on the learned score $\nabla_{\boldsymbol\tau}\log p_t(\boldsymbol\tau)$ :

d\boldsymbol\tau = \big[\mathbf f(\boldsymbol\tau,t) - g(t)^2 \nabla_{\boldsymbol\tau}\log p_t(\boldsymbol\tau)\big]dt + g(t)\,d\bar{\mathbf w}

Starting from pure noise and denoising, we generate a feasible trajectory; conditioning the reverse process on the current scene (a camera image) or on a cost (classifier guidance) steers the sample to avoid obstacles and reach the goal. Two foundational works: Diffuser (Janner et al., arXiv:2205.09991) folds planning into sampling, and Diffusion Policy (Chi et al., arXiv:2303.04137) shows this excels precisely in high-dimensional, multimodal action spaces — the exact regime where RRT and trajectory optimization strain.

High-Dimensional vs. Low-Dimensional, Summarized

	Low-dim (≤ ~4 DOF)	High-dim (arms, hands, tentacles)
Best tool	Grid/A*, analytic	Sampling (RRT*), optimization, learned/diffusion
Why	Enumeration is cheap	$r^n$ enumeration impossible
Optimality	Achievable	Asymptotic or local only
Bottleneck	Almost none	Nearest-neighbor queries, collision checks, local minima

The throughline of this entire series lands here: dimension dictates the algorithm. What is a solved problem at 2 DOF is a research frontier at 20.

Focus on the Octopus: Imagining Grasping Motions

For our metallic continuum octopus, grasping is a nonlinear nightmare. A rigid robot pinches with a two-finger gripper; an octopus wraps its whole tentacle around an object, conforming to its shape — a contact-rich, hyper-redundant motion (Part 3) with an enormous configuration dimension.

Computing such a conforming wrap with RRT is hopeless: the C-space is too high-dimensional and the contact constraints too intricate. Instead we collect thousands of teleoperated demonstrations of successful wraps across many shapes and train a diffusion policy on those trajectories. Faced with a new, unseen object, the model effectively imagines how to wrap — denoising a random trajectory conditioned on the live camera feed until a feasible grasp emerges. This is the generative shift made physical: where enumeration fails, we learn the distribution of good behavior and sample from it.

In Part 8 we look at the hardware needed to run these heavy models on the robot itself.

Further reading: LaValle, "Planning Algorithms" (2006); Karaman & Frazzoli (arXiv:1105.1186); Janner et al. (arXiv:2205.09991); Chi et al. (arXiv:2303.04137); Williams et al. on MPPI (arXiv:1707.02342).