I will have to eat my words about saying it should only be a few thousand nodes. I did an A* in C++, using the heuristic sum(distance from final position). The 6x6 took 100k nodes, 50 moves as you said.
I then optimized the program for 4x4 puzzles, getting about 280k node/sec on my computer using the C++ STL priority queue (a binary heap, not optimal for this problem). The 4x4 you mentioned as being difficult took 54 moves, not 62. It used 8.1M nodes, with the final heap (aka priority queue) being 4.1M nodes, using 12bytes/node.
However my program doesn't generate solutions, sadly. So it's possible I'm wrong (although it was working fine for smaller puzzles). I would have to write my own heap to do that. I'll edit this post when I do that, maybe tonight, maybe not :)
This is a good problem for id-A* as someone mentioned. 50MB of RAM is not worth it for a 4x4 puzzle :)
Edit:
I implemented the iterative deepening A*, and now it goes crazy fast (probably since the program fits in the cache?). I get 2.6M nodes/s now.
Here is the 54-step solution:
(Whoops, L = right, R = left. :P)
1 11 7 5
14 3 4 13
2 9 0 12
10 8 6 **
DDDLULULDRDRUULLURRRDLLDRRU
LLURRDLLLDRDLURRRDLUULLURRR
depth 54, nodes 8405436
3.204 s
I tried various random 4x4's and most take a few seconds. Here are the hardest I found if you want to try to optimize your solver:
10 ** 14 3
12 6 11 0
13 1 9 2
8 5 4 7
depth 55, nodes 899119579
UURULDRDDLUUURDDRDLLURRUULDLULDD
RDLURDRURDLULUULDRRDRUU
342.741 s
5 4 11 10
** 14 6 9
1 13 12 8
0 7 2 3
depth 57, nodes 919366955
URRULDDDRUULLURRDRULLDDDRURDLLLU
URRDLDLUURRDRULURDLDDRUUU
345.206 s
Here is source and .exe:
http://media2.uploadjar.com/file.php?file=uploads/slidepuzzle.zip
(The source is hard to read because I went general A* -> 4x4 A* -> 4x4 IDA*, and there's a lot of optimized bit manipulation)