Wikipedia says that this is Mode 7's transformation principle:
That's very simple to implement in Lua. The hard part (for me, at least) would be fetching the pixels from the source image.
Now, you used a track from Mario Kart as your "example" image, which is a little bit harder to replicate, since it would require some perspective effects, which Wikipedia states goes a bit beyond what Mode 7 alone can do. I still think it wouldn't be so bad, but I would need to sit down and give it some careful thought.
I'm intrigued by the idea, but I'm
very busy at the moment and probably will remain so through the month (though I tend to get a lot of work done on side projects when I'm putting off bigger work).
Also, I will never forgive you for causing Phil Hartman's death.
Edit: Since I'm guessing you want this for one of the Atlas encodings, I'm a little confused as to how you plan on using it. Are you sure you don't wish to
undo the Mode 7 transformations? After all, the analogous video for Mario Kart would impose the top-down view on the karters' positions. If that's your goal, I would instead recommend building the encode from the ground up: First, get all karters' x, y, and z positions and orientations every frame and have them output to some text file. Then have a script draw the karters on the map you've provided (I assume a program like Adobe Aftereffects can do this). Finish off the encode by syncing the video with the run's audio.