Well, I made some relatively easy headway on this, so I guess this is a WIP now?
Anyway, 86 input frames saved vs. andymac's, 62 frames faster in realtime; all from better movement (speed control in particular).
Further progress will come slower because different routes will need to be tested (stage 3 in particular). The published TAS makes at least one poor choice in that regard, so I'm expecting at least half a minute to come from better routing alone.
Improved version here.