Submission #6432: Doomsday31415, BrunoVisnadi, Masterjun's SNES Super Mario World "game end glitch" in 00:41.68

Console Super NES Emulator lsnes rr2-β23
Game Version USA Frame Count 2505
ROM Filename Super Mario World (U) [!].smc Frame Rate 60.0988138974405
Branch game end glitch Rerecord Count 23139
Unknown Authors Doomsday31415, BrunoVisnadi, Masterjun
Game Super Mario World
Submitted by Doomsday31415 on 6/21/2019 1:49:29 AM

Submission Comments
Because Super Mario World wasn't beaten fast enough already, we decided to further optimize this category to save an additional 8 frames over the previous run.
Video with ghosts of previous runs:
Video with commentary:

General Information

  • Emulator: lsnes rr2-β23
  • Version: U
  • Objective: Reach the credits as quickly as possible
  • Categories:
    • Heavy glitch abuse
    • Corrupts memory
    • Genre: Platform

The Trick So Far

The primary catalyst for this credits warp is unchanged from when Masterjun discovered it: having Yoshi eat a specific Charging Chuck.
Normally they're immune to Yoshi's tongue, so how? A null sprite is left on Yoshi's tongue by grabbing a coin with Mario after Yoshi starts to eat the coin. If the Charging Chuck spawns at this moment, it ends up being the target of Yoshi's ravenous stomach.
Since Charging Chuck was never meant to be eaten, when the game tries to apply the effect of eating one, the assembly jumps to the garbage location $14A13. $14A13 is located in Open Bus.

Open Bus

Open Bus isn't so much a location as it is a lack of one: the last value in the Data Bus is used instead of what's at that location. In this case, the Data Bus is 0x01, so the assembly run is 01 01, or ORA ($01,x).
X is a register that will be 0x09 thanks to the previous assembly, but 0x0A (0x01 + 0x09) is a temporary. What temporary, you may be asking? Since the current object being processed is Yoshi, the last thing it was used for was calculating the high byte of Yoshi's X position. In order for the trick to work, this needs to be 0x0107, which is tile information that happens to be 0x17 for this level.
With a pointless ORA complete, the program counter moves forward two bytes and reads from $14A15, which is (surprise) still Open Bus. The Data Bus is now 0x17, however, which results in 17 17, or ORA ($17),Y. Conveniently, $17 and $18 contain a copy of the controller data, which makes it possible to perform a miracle...

The Miracle

This part was glossed over in Masterjun's original explanation, so let me explain in more detail just how unlikely this thing is.
We want to get to somewhere between $4218 and $421F because that's where the raw controller data is stored. This is no simple task, since we started at $14A13, which is both later than our target and in the wrong bank. To make it even harder, the area just before $4218 is various CPU registers, many of which will cause a BRK instruction that will send us right back to $4A13 if interrupts haven't been disabled. The only thing we can do is set the Data Bus to something, which has to then not only get us to before $4218, but also disable interrupts before we hit $4xxx.
That's exactly what happens here, albeit in a roundabout way:
  • First, we manipulated the controller data to be 0xE0A0, which causes the game to load 0x06 into the Data Bus.
  • 06 06 is ASL $06. $06 in this case is 0xFF, which is changed to 0xFE and loaded into the Data Bus. Note: $06 is also changed to 0xFE.
  • FE FE FE is INC $FEFE,X. This location happens to be 0x01, which incremented is 0x02.
  • 02 02 is COP $02. This triggers a very brief interrupt that immediately returns to $14A1E, putting 0x01 in the Data Bus again.
  • Once again, 0x01 results in 0x17, which results in 0x06.
  • This time, $06 is 0xFE, which is changed to 0xFC and loaded into the Data Bus.
  • FC FC FC, or JSR ($FCFC,X), is where the first magic happens. The entire area around $FCFC is also Open Bus. Because of the way indirect JSR works, this happens to be the low byte of where the instruction was read. Since this was called at $14A24, this causes the jump to go to $12626 with 0x26 in the Data Bus.
  • 26 26 is ROL $26. $26 in this case is based on the position of Mario, but in this case we'll go with $27, resulting in $4D in the Data Bus.
  • 4D 4D 4D is EOR $4D4D, which occurs in a loop until enough time passes and an interrupt occurs. This interrupt once again jumps back to $1xxxx, putting 0x01 in the Data Bus.
  • For the third time, 0x01 results in 0x17, which results in 0x06.
  • Now $06 is 0xFC, which is changed to F8 and loaded into the Data Bus.
  • F8 is SED, which just sets the decimal flag. As this doesn't change the Data Bus, this happens over and over until we reach $14016.
  • Remember how I mentioned there are CPU registers prior to our target? As it turns out, this location is actually only partially Open Bus. By what can only be described as a miracle, the resulting instruction here is FB, which swaps the emulation and carry flag.
This causes the system to enter emulation mode, effectively disabling interrupts and allowing us to do a few hundred garbage commands before eventually reaching $4219, early in where we were trying to get. Success!

The Payload

In order to warp to the credits, several steps need to happen:
  • Emulation mode needs to be disabled so interrupts work again.
  • Decimal mode needs to be disabled so calculations work as expected.
  • The game mode $100 needs to be set to 0x18.
  • The data bank needs to be 0x00.
  • The current cutscene $13C6 needs to be set to 0x08.
  • The main game loop at $8072 needs to start running again.
If all of the above is done correctly, the game will fade out and start playing the credits!

What's New?

All of the above was already done in the previous submissions, but with better movement and a more efficient payload, further improvements have been made.

New Movement

The first improvement to movement comes when landing on Yoshi: by landing when the vertical subpixel is sufficiently high (.a-.f if not holding B), Mario will be able to start moving a frame faster.
The rest of the movement centers around grabbing the coin from the opposite side. This difference allows the fireballs, which move slower than Mario, to be spit out much sooner than before. Accordingly, the shell can be picked up earlier, allowing Mario and the camera to be further to the right.
In order to reduce lag, various objects are only active every so many frames. In the case of the fireball, it can only collide with objects every four frames. The fireballs that can collide on a given frame vary depending on the ID of the fireball. For example, 2, 6, and 10 would collide on one frame while 3, 7, and 11 would collide on the next. By spitting out the first set of fireballs so the right number are out, this frame rule can be manipulated to hit the shell on whatever frame is optimal.
Ultimately, Mario hopped at the maximum 49 speed instead of straddling 47-49 at the end in order to make the shell hit the wall as soon as possible. It touched the fireball on its rightmost pixel on the frame its collision is active.
The biggest thing holding back this new movement is Yoshi's position. As explained above, Yoshi needs to have a certain high byte for his X position or else the trick doesn't work. Unfortunately, Yoshi effectively must bounce off the wall to be far enough left, so going high is not an option. If there was a way to get around this limitation, several more frames would be possible to save.

New Payload

The original payload was broken up into five frames, doing all the things mentioned above directly while sacrificing 5 of the 8 bytes in a frame to loop to the beginning and get new controller data.
The new payload reduces that to three frames by taking advantage of the "intended" logic to start the credits. The new payload looks like this:
(E0) 0A FB 64 10 CB -- --
 D8  0B AB 64 10 CB 80 F8
 CA  20 20 CA 20 72 80 F8
0AASL AOnly purpose is clearing carry
FBXCEDisables emulation mode
64 10STZ $10Needed for controller data to fully update during interrupt
CBWAIWaits for the next interrupt (next instruction will be new controller data)
80 F8BRA $F8Jump back to $4218
D8CLDDisables decimal mode
0BPHDPuts 0x00 on the stack
ABPLBSets the Data Bank to 0x00
CADEXDecrements X to 0x08
20 20 CAJSR $CA20Calls a location that sets $100 to 0x18 and $13C6 to X
20 72 80JSR $8072Starts running the main game loop again
Those paying close attention above will notice the last "80" is used by both 20 72 80 and 80 F8. It was the final byte saved to get it down to three frames.

Two Frame Payloads

It's entirely possible to make two frame payloads that work on emulator. These payloads avoid sacrificing bytes for the interrupt and instead loop back to $2000~$4000 repeatedly while hoping the controller data isn't updated at the wrong time.
Unfortunately, none of these worked on console, so three frames is the best we have.

Special Thanks

BrunoVisnadi: He came up with most of the new movement mentioned above.
dwangoAC: I threw a bunch of payloads at him and he happily tried each one until we managed to get one to work on console. He also did the commentary in the console-verified video!
Ilari: Open Bus is a complicated beast, and I was only able to walk through most of it thanks to him.
Masterjun: For making a miracle happen.

Suggested Screenshot

Nach: Not a big improvement over the previous, but it is a nice one. Audience likes it. Accepting to moons.
Spikestuff: Published.

Last Edited by on 1/1/2022 6:13 PM
Page History Latest diff