In short, dynamic programming is a way of using cached information to optimize choice in a recursive problem. By eliminating the non-optimal choices, it reduces the complexity (possibly exponentially). It's actually a bit of an understated concept, since a lot of problems can naturally be expressed with recursion. A lot of the problems are also generally not feasible without it. Obviously the larger the problem, the larger the run-time improvement.
The specific problem has to do with optimally traversing a graph. Since the problem was simple enough, I ended up writing the solution in yasm (amd64) assembly.
"Find the maximum total from top to bottom in triangle.txt, a 15K text file containing a triangle with one-hundred rows."
So, a valid move on the graph follows a directed edge to the left or right child. An exhaustive search is $\Theta(n2^{n-1})$ with respect to height ($n$ nodes per path, $2^{n-1}$ paths). It's estimated there are $10^{80}$ atoms in the known universe. So given only a few hundred rows, the number of distinct traversals possible is roughly equivalent to the number of atoms in all of known existence.
Obviously, a pointer based tree structure implies the complexity, since it implies an exhaustive search. However, there are more ways than one to represent a tree. One possible representation to avoid the complexity is the same layout that's used for heaps (Eytzinger).
Finally, the dynamic programming step is comparing each of the possible traversals, and caching the optimal one as if it had been taken...
segment .data a dq \ 0, \ 59, \ 73, 41, \.
.
.
l equ 100
segment .bss
segment .text
global _start
_start:
xor rsi, rsi
mov rdi, 1
mov rdx, 1
level:
inc rdx
cmp rdx, l
cmovg rax, [a + rsi*8]
jg max
mov rcx, rdx
mov rbx, rcx
shl rbx, 3
inc rsi
inc rdi
mov rax, [a + rsi*8]
add [a + rdi*8], rax
mov rax, [a + rsi*8 + rbx - 16]
add [a + rdi*8 + rbx - 8], rax
dec rcx
inc rdi
next:
dec rcx
jz level
mov rax, [a + rsi*8]
inc rsi
cmp rax, [a + rsi*8]
cmovl rax, [a + rsi*8]
add [a + rdi*8], rax
inc rdi
jmp next
max:
dec rdx
jz _end
inc rsi
cmp rax, [a + rsi*8]
cmovl rax, [a + rsi*8]
jmp max
_end:
xor rdi, rdi
mov rax, 60
syscall
With dynamic programming, run-time becomes $2n^2 - 4n - 4$ or $O(n^2)$ (the math is left as an exercise).
And final execution time is...
real 0m0.001s
user 0m0.000s
sys 0m0.000s
So, other than what would probably be an extremely large scale solution taking the intuitive approach, the problem is essentially infeasible. Using dynamic programming, the problem can be solved in under a millisecond.