How would you propose to change the algorithm to come to a short solution fast?!
I haven't looked at your code, just read your description. However, the classical way is to do breadth-first, instead of depth-first. Outline:
- 0. Let the starting position be PS. Let k be 0. Goto 15 if PS is a solution.
- 1. Keep a cache C of "seen" positions.
- 2. Keep a fifo queue Q of "todo" positions, with their move number.
- 3. Push tuple (PS, 1) onto Q.
- 4. If Q is empty, goto 14.
- 5. Shift first tuple of Q. Let this be (P, k).
- 6. Calculate all possible moves of position P. Call this set of moves M. Unless the starting position doesn't allow for any moves, M will never be empty.
- 7. If M is empty, goto 4.
- 8. Remove a move m from M. Apply m to P, yielding postion P'.
- 9. If P' in C, goto 7.
- 10. If P' is a solution, goto 15.
- 11. Add P' to C.
- 12. Push tuple (P', k+1) on Q. (So this tuple will be inserted at the end).
- 13. Goto 7.
- 14. Terminate unsuccesfully. (No solution possible).
- 15. Terminate with success. Minimum number of moves is k.
In fact, this is just Dijkstra's algorithm applied on a graph where each vertex of the graph is a possible position of the puzzle, and there's an edge between two positions one can go from one position to the other in a single move.
I leave it up to the reader to turn the TAOCP style of describing the algorithm into a "real" program.