This is the first and fastest Super Mario Bros. "game end glitch" (ACE) TAS (with original FDS ROM, no cartridge swap).
Although this TAS is slower than the any% (warps) TAS, its primary goal is to achieve the impossible. It has everything: Second Quest, intentional death, killing Bowser, touching the axe, and rescuing (or encountering) five Princesses.
This TAS has been fully console verified at least 3 times on the original hardware by Kosmic (however, the success rate isn't 100% due to alignment, etc.; further test is needed).
threecreepio
I started looking into what options we could have for ACE in SMB1 shortly after our ACE project in SMB2J wrapped up in early 2025. I was confident that it was of course not possible, but this would be a good time to actually check. I focused on the Famicom Disk System (FDS) version as interrupt requests won't crash the game like they do on the cartridge version, and it also gives us access to a far more interesting -1 glitched “minus world” area which interprets some of the game’s program code as enemy data.
Initially AndrewG and I briefly discussed options using
the vine glitch back on March 30th, 2025. In underwater levels the vine glitch changes the low byte of an EnemyData pointer to be 8px above Mario's starting position. This shifts and sometimes glitches the enemies that load in. But in -1 the vine glitch gives us nothing to work with, interesting but a dead end, which would become a theme. It did at least make me aware of just how many more enemies there were to potentially work with in -1.
So I made a ROM to be able to easily check through the map data, and found that we had a very large amount of enemies to work with. To exploit the same bug used for the SMB2J ACE we need two Long Firebars or Bowser enemies and a Green Koopa. The Long Firebar and Bowser enemies occupy two enemy slots, and if you spawn them when there is only a single slot available the second slot lookup overflows and finds any zero byte in memory. A Green Koopa has Enemy ID 0 and so will become corrupted by the glitch. In -1 we have a lot of Bowser enemies, and some are swarmed with Koopas that we want for our ACE, what luck! If I can just spawn in the right set of these enemies the rest should be easy.
Over the period of April to October I kept coming back to this, and giving up again, trying every combination of enemy spawns I could get the game to give me, with and without the vine glitch, from the midway point, delaying movement, anything really, but unfortunately there were countless dead ends. The most important problem is that in underwater stages (like -1) a lot of enemy logic gets skipped over in ways that limit manipulation, we can’t stomp enemies and they don’t turn around when colliding. The later Bowsers are simply too far into the level to reach in time. The Koopas are just not quite laid out in the enemy data to where they will actually spawn when we need them, regardless of how angry I got at them. At the end of the year I started reaching out to Kosmic, Lain and HappyLee, describing some of my issues and hoping they could see something I’d missed, leading to us setting up a dedicated chat with myself, Lain, Simplistic6502 and Kosmic on Christmas Day 2025.
This led to sessions lamenting the many ways in which it was close, but impossible. While talking with Kosmic one day we were once again going through the ways it doesn't work, like if just the one single goomba in the stage could walk to the right we would have the glitch. I suddenly had a blinding realization, Goombas turn into Buzzy Beetles after you beat the game once. Buzzy Beetles turn to face the player when landing, and this particular Buzzy Beetle spawns way up in the air. If Mario is on the right side of the screen we can turn that beetle and walk it right up to Bowser, filling up the last remaining slot we needed to trigger the glitch. It’ll be very slow, but, using this we can definitely do.. something!
There's an issue though, the way the DuplicateEnemyObj glitch we use for ACE works is that it overflows the memory area for flags used by the entities and sets the first zero byte it can find to a value between 80-84. In the SMB2J ACE the first Long Firebar fills an empty padding byte, and the second one overwrites an enemy with ID 0, the Green Koopa I mentioned, with the 84 value which happens to make the game execute from a memory address right next to the player's controller inputs at $074A. In SMB1 we have no Koopa, instead we have a Buzzy Beetle. We’re going to move past all the Enemy IDs and hit a second empty padding byte and do absolutely nothing, making you need to actually trigger the glitch three times to do anything useful, and we're all out of Bowsers.
Something important is that in SMB2J we’re spawning in Long Firebars, a Long Firebar would write a value into that empty padding slot and do nothing, as expected. In SMB1 we’re using Bowser enemies, and they happen to also overwrite the Enemy ID of its duplicated slot. This lines up so that the first Bowser used for the glitch just fills both of the padding bytes in one go. So this problem actually neatly resolved itself!
So now the second glitched Bowser happily continues and overwrites the first zero byte found in the entity state data, the first of which is the player's state, responsible for selecting a routine to use depending on if the player is jumping, climbing, etc. The first 4 values we can write here do nothing useful, but if Bowser spawns into the final enemy slot we actually jump to an address outside of the list, specifically to memory address $2060. That’s actually legitimate ACE, Mario has escaped PRGatory and can run any code that the player desires! All that we need now is to deliver a payload through this mechanism and we're home free.
But this memory area contains something called PPU registers. Essentially data leaked by the Picture Processing Unit while it controls graphics rendering, which is inherently very difficult to work with and behaves a bit inconsistently. We aren’t even sure if any emulators are accurate enough to trust their results. Suffice it to say testing would be difficult if it’s even possible, it feels unlikely that we can use this. I went through every other avenue that this Buzzy Beetle may open up for us hoping something would give us better options, but of course there was nothing, this is what the game will give us.
Fortunately, 100th_Coin came to our rescue. He had been working on PPU accuracy for a while and was not quite as easily deterred by this as a sensible person would be. After assuring us that all hope is not lost quite yet we start a journey down this very deep rabbit hole. I'll let his write-up speak for itself, suffice it to say that after some long, interesting discussions we somehow found ourselves executing a single triplicated byte of a sprite’s Y position as a CPU instruction. Limited but controllable, 100th_Coin makes a list of every opcode that writes to memory and what exact values it would be writing, and I checked each of these against the disassembly to figure out precisely how the changes we could make would alter the game’s behavior. Most were useless as expected, but there was one possibility, we could break the scroll handler! With 8F 8F8F a BRK instruction is inserted into the screen scrolling subroutine, causing the game to execute:
BRK
SLO $A9
BRK
RTI
This ends up moving execution to $1061. That's not a great place to be, it's in the middle of entity X speeds, but this triggers after scrolling the screen 32px and it’s not easy to even reach enemies to manipulate without going 32px into a level. But you know, it isn't exactly nothing, and this is what we've got. And what we’ve got is real work RAM execution, we did it, controllable execution! After experimenting with levels with early enemies I traced execution all the way to $06E8 without crashing. That is 20 bytes before where the game stores controller inputs! $06E8 contains the offsets for sprite shuffling, responsible for the familiar sprite flicker effect. And of course as luck would have it there is just no way to get through this block of code. It will always branch you back to an earlier point or just straight up crash. It’s so close, it manages to traverse so much RAM and get so close but there appears to be nothing more to see here. And even if there is a way, making a workable payload out of this would be next to impossible. So. Back to the drawing board, by this point it’s hard to act surprised.
Simplistic6502 notes that we could play as Luigi, changing the value in the CPU's X register when we jump to the PPU. That slightly shifts a few of the memory values we can overwrite by a single byte. I looked through the new options and a single one stood out, $9C9D. Looking at this code, it replaces the opcode of lda PowerUpType inside PowerUpObjHandler, changing the lda to a php which will push the current CPU status to the stack:
09C9B D0 43 bne RunPUSubs ;branch ahead to enemy object routines
09C9D A5 39 lda PowerUpType ;check power-up type
09C9F F0 11 beq ShroomM ;if normal mushroom, branch ahead to move it
09CA1 C9 03 cmp #$03
Is changed to:
09C9B D0 43 bne RunPUSubs ;branch ahead to enemy object routines
09C9D 08 php ; push the current cpu state to the stack
09C9E 39 F0 11 and $11F0,y ; AND with some value
09CA1 C9 03 cmp #$03
With this change, whenever a powerup finishes growing, the stack is corrupted and execution returns immediately back to normal instructions, which is an interesting result. Checking the stack at this point our top value is $0F, and we will push the processor state of $33, which means we will return to $0F34, mirror RAM of $0733. That is.. actually really good? We are in RAM, and we are very close to the controller inputs at $0F4A that we were using in SMB2J, while luck finally decides to be on our side and the path from $0F34 to $0F4A can all be manipulated and worked around to make our way through. So surprisingly this ended up in just about the same place as the SMB2J ACE exploit but through an entirely different and wildly elaborate route. With some testing and adjustments execution arrives neatly at player 1's controller inputs. 15 minutes after testing the $9C9D, this is easily controllable and flexible ACE. It was a surreal mix of an entire year's worth of hopeless effort actually paying off and the anticlimax of it so suddenly clicking into place.
From there Simplistic6502 worked on coming up with the fastest way to use this to trigger the game ending cutscene, you can read all about that process in his writeup!
Meanwhile I built out
a total control toolkit from my earlier work in SMB2J. This creates a few bootstrap stages leading to total control of execution. The first stage is written by shifting a single bit per frame into place by alternating
ASL and
INC while the game is still running, it's a slow process that can very easily crash since Mario can still move around as we're using the inputs to build the payload. The built stage looks like:
: jsr FDSBIOS_VINTWAIT ; delay until next frame
jsr FDSBIOS_READPADS ; read the controllers
ldx TRAMP_INDEX ; load a write offset
lda READPADS_CTL1 ; load controller 1 inputs
sta EXEC_LOCATION,x ; store the inputs at jump target
inc TRAMP_INDEX ; advance to the next write offset
bne :- ; and loop until X wraps around
jmp EXEC_LOCATION ; then jump to our next stage
so with this code running the game execution will stop. we read in the next stage of the payload into EXEC_LOCATION from controller 1, and then jump to it. With this next payload you can do anything you want. For example, this loads you into some arbitrary world to have a look around!
lda #$33 ; switch world number
sta WorldNumber ;
lda #0 ; clear our level number
sta LevelNumber ;
sta AreaNumber ;
sta OperMode_Task ; start level load
sta VRAM_Buffer1 ; clean some ram that will wreck our state
sta MusicData ;
sta MusicData+1 ;
jsr $7C0B ; reload area pointer
jmp $6054 ; and jump to wait for the next frame
But you could do anything, even load in a completely different game using only controller inputs, or whatever else you can think of. When you run make in the toolkit project it builds the assembly files and creates corresponding TAS'es containing the inputs needed to enter each stage of these payloads with inputs that won't crash the game!
I'd like to pretend we've neatly solved all the problems with this approach, but if we've learned anything it's that the problems do not end with this project. This was only the start of a series of compounding headaches. Console verification, alignment issues, all the peril involved with executing from PPU registers. 100th_Coin will take you through that whole process, and Simplistic6502 will go into detail on the technical aspects that I’ve glossed over. All in all, an incredible project with fantastic people, and a great example of how a little persistence, creativity, and a total inability to sensibly value one's own time can lead to unexpected breakthroughs in even the most obviously hopeless situations.
OnehundredthCoin
I have recently been working on a series of accuracy tests designed to assist emulator developers. The main purpose of this ROM was to assist myself in discovering issues I was having with the emulator I’m developing, but I figured I could thoroughly explain every test in the ROM so anybody making an emulator can learn what I learned. One of my most recent tests is called “Address $2004 Stress Test”, which surrounds what happens when the CPU reads from address $2004, which I test on every single PPU cycle of a specific scanline. This test can reveal some interesting behavior with the Picture Processing Unit, as reading from address $2004 returns the current state of the OAM Buffer, which is constantly being used during sprite evaluation. This can also highlight some amusing edge cases surrounding what the PPU does when the OAM Address overflows during sprite evaluation, or what happens when there are 8 or more objects found on the next scanline during sprite evaluation. This was something that all of your favorite emulators (Bizhawk, Mesen…) were making small mistakes with. This is also somehow our gateway to running arbitrary code.
I was watching Kosmic on Twitch, when I heard him say that ACE in SMB1 is extremely close but not possible, which will be the topic of a video he’s working on. Instead of patiently waiting for the video to come out, I just sent him a DM asking for the details. Who knows, maybe I could help out in some way. We were incredibly close to running arbitrary code using the same method as the SMB2J ACE exploit, but it seemed like there was a tiny detail blocking every path we had to using this method. Instead of having Bowser overwrite an enemy ID, we had the option to overwrite Mario’s “state”, and this results in a jump to address $2060. The PPU Registers. In case you are unaware, from address $2000 to address $3FFF are the same 8 PPU registers, repeated over and over. So a read from address $2064 is equivalent to a read from address $2004. And I just did a LOT of research into reads from address $2004.
With some tracelogging, I realized that the CPU would read from address $2064 as the opcode, and this occurred around PPU cycle 240-ish of the scanline. This means that the PPU would be in the middle of the “OAM Evaluation” step of Sprite Evaluation, likely after the OAM Address overflowed. This means, every ppu cycle until cycle 257 will alternate between reading from primary OAM (the primary OAM address will still be changing every other cycle) and reading from secondary OAM (the secondary OAM address will not be changing.) In other words, 50% of the ppu cycles here will be reading from some specific index of secondary OAM.
Fun fact: Version 2.11 and earlier of bizhawk have a slight issue with this read from secondary OAM after the OAM Address overflows. Bizhawk 2.11 and earlier always read from index zero of secondary OAM, instead of reading from the current secondary OAM address. I made a PR for the dev build that fixed this issue, and all the TASing was done with the fixed version under the assumption that it was critically important for the TAS. However, I just realized that the TAS reads from index zero of secondary OAM anyways, as there are no objects on this scanline… so it actually runs just fine in version 2.11. Oh well. That’s probably a good thing anyways.
Without giving an entire dissertation on sprite evaluation, I’ll just state that in the situation in Super Mario Bros, half the PPU cycles will be reading from the Y position of the final object in OAM. This ended up being set up by a blooper which we could easily manipulate the Y position of!
So we can run a single instruction based on the Y position of the nearby blooper. Most PPU Registers are “write-only”, so reading from them returns the value of the “PPU Data Bus”. This data bus is set up when reading $2004, and since addresses $2005 and $2006 are write-only, we’d actually get 3 bytes in a row all with the same value. Could I run something meaningful with 3 identical bytes in a row? Let’s find out! Normally I would boot up the game in my emulator, replay the TAS, make a savestate just before executing address $2064, where I would then hijack the results from address $2064 to be whatever value I want, then load the savesate and run the test again with a different value from $2064. This would happen in a loop testing all 256 possible values. Alas, my emulator doesn’t yet support Famicom Disk System games, so I just copy/pasted the contents of the “CPU Address Space” from bizhawk into my emulator and ran the test like that. It was a bit clunky, but it revealed that a handful of values being read from $2064 perform write instructions to some useful locations. I ended up finding about 20 options to choose from, and Threecreepio went to work checking every outcome and seeing if it could be useful.
Games for the Famicom Disk System are stored on a disk, that are then read one byte at a time and stored in 32 Kibibytes of RAM. Since there’s no write protection for this RAM, you can write self-modifying code for the FDS. This also means that our instruction at address $2064 would be capable of overwriting a byte in the game’s code! (The code that’s stored in RAM. We’re not overwriting the disk.) Anyway, if we read the value $9C from address $2064, we run the unofficial `SHY $9C9C, X` instruction. This instruction will store in memory a bitwise AND between the Y register and the value of the high byte operand plus 1. In this case, Y = $0A, and the High byte operand plus 1 = $9D. So we write $0A & $9D = $08 to address $9C9C+X. But what is X? Well, X is most recently prepared by running `LDX $0753` which Simplistic6502 immediately noticed is the address keeping track of what player is playing. 0 For Mario, and 1 for Luigi.
So if we’re playing as Luigi, and the crash occurs with the blooper manipulated such that we read $9C from address $2064, then we can write $08 to address $9C9D. This address is in the middle of the routine that controls how powerups move around. So instead of running “LDA <$39` where the game would determine the powerup type, it instead runs `PHP`, pushing the status of the processor flags to the stack. Oh boy! Now whenever we run an `RTS` instruction, instead of correctly pulling off the return address, we pull off this processor flag byte as well. This moves the PC to RAM, and the rest of the team is here to figure out the setup. Enter world 3-1, bump the item block, manipulate the enemies, and once the mushroom starts moving around after the rising animation they have total control.
Now that we’ve just executed arbitrary code, it’s good time to mention that I’m considering this TAS as my “ACEvideos” submission for the year.
This was console verified with some… difficulty. The CPU and the PPU are two different components running independently of each other. Communicating between the two is not as elegant as one might assume. Reading from address $2004 has something called “alignment specific behavior”. There are 12 master clock cycles per CPU cycle, and 4 master clock cycles per PPU cycle. This means the CPU and PPU clock could be synced on the same master clock cycle, or perhaps the CPU cycle occurs one master clock cycle later than the PPU cycle, or even 2 or 3 master clock cycles later. (4 master clock cycles later would result in them being in sync again.) This means there are four “alignments” between the CPU and PPU clocks, essentially chosen at random the moment the console powers on. To spare you another dissertation, different alignments can result in different behavior when the CPU is accessing the PPU registers. To make a really long story short, on a single alignment, the CPU would actually see a different value from address $2004. (That alignment could also see bitflips when reading from $2004 on some consoles, but shhhh.) So we’re looking at a 3/4 chance the TAS reads the value we want from $2004. Or are we? You see, when the CPU reads from address $2002 to see if a sprite zero hit has occurred (something Super Mario Bros. does to keep the HUD in place while the rest of the screen can scroll) half the alignments can see the sprite zero hit a PPU cycle earlier than the other half. The PPU also drives the CPU’s NMI with alignment specific differences! Every other frame, the PPU skips a single ppu cycle at the end of the pre-render line, and resetting on a Famicom doesn’t reset the even/odd latch, so we might just read from $2004 several ppu cycles off than expected. Our coinflips have coinflips which themselves have coinflips. I’m not even going to try to explain how reading from address $2007 behaves when rendering is enabled. Furthermore, this alignment specific behavior varies from console to console, and as we found out while trying to console verify this, Kosmic’s console behaves quite differently than any of the consoles I own. This made it incredibly tedious and frustrating to console verify.
Is the emulator we’re using to TAS this accurate enough? Did we fail this time because we got unlucky? Would it be more consistent if we ran it on a different console? Is it failing because of the RGB mod in Kosmic’s Famicom? Could it have been due to VBlank Suppression as the game booted up? Is it due to the PPU skipping a cycle every other frame, and we had the wrong even/odd alignment? Was it due to the alignment between the CPU and PPU clocks? What PPU cycle did the read from $2004 even occur on? It’s hard to tell.
Also any console with a revision E or earlier PPU is unable to read from address $2004, so it’s not even possible to console verify this with a significant number of consoles. This is just one of many nightmares surrounding what it means to make an “accurate emulator”. This console isn’t consistent, and anybody who dives deep enough into making their own emulator will find out this cold truth.
There’s no singular way this TAS should behave when running this on real hardware. However, one of the possible outcomes is the ACE exploit we aim to present. That being said, we have console verified this TAS at least three times while hanging out in a voice call. It’s just not nearly as consistent as we had all hoped.
HappyLee
My main contributions were optimization and entertainment. I completely redid the TAS for the "Minus World", and in World -3 I used a rare Blooper wall clip, which is allowed by the frame rule.
ACE is not my strongest area, so I'd like to thank my amazing teammates - they taught me a lot. I even learned 6502 instructions, and by manipulating Mario's and the Koopa shell's X positions in 3-1, I improved the Simplistic6502 payload, saving 2 frames.
Simplistic6502
On Christmas Day 2025, Kosmic and I were having a discussion about glitch levels in The Lost Levels and he briefly mentioned that threecreepio was close to achieving ACE in SMB1. Kosmic planned to make the SMB1 ACE investigation part of his next video and wanted to bring me along to see if further analysis would bring us closer to achieving ACE. At this point, I joined the team and got caught up with all of the prior research and existing leads. Things didn’t seem promising and for a while I was convinced there was no hope for ACE, so little did I know what an ordeal this whole project would end up being!
Because there’s a lot of technical details to go over, I’ll let the other authors explain the story of how everything came together. As for what I contributed to the TAS, I investigated the fastest method of triggering ACE after the write to $9C9D occurs and developed the payload used after hitting the power-up block; note that the payload seen in this TAS is a two frame optimization over my work thanks to clever manipulations from HappyLee. In addition, I made a couple observations that proved to be crucial to our success: using the DuplicateEnemyObj subroutine bug to write 0x84 to Player_State allows for jumping to the PPU registers, and playing as Luigi allowed us to set X to 0x01 before jumping to the PPU registers.
This TAS exploits the same DuplicateEnemyObj subroutine bug used in the SMB2J ACE TAS to write to the Enemy_Flag array with an out-of-bounds index. The key difference here is that we aren’t able to overwrite part of the Enemy_ID array since there is no green Koopa Troopa loaded at the time of the second out-of-bounds write, which means there is no 0x00 byte in the array available for corruption. This locks us out of using enemy ID 0x84 as an entry point to ACE, so a craftier solution was needed for this TAS. What follows the Enemy_ID array in RAM is the Player_State variable, which selects what player movement logic should be executed by indexing into a list of pointers using the JumpEngine subroutine. Under normal conditions, Player_State is restricted to 0x00 through 0x03, so larger values will fetch data past the list of pointers. Since JumpEngine shifts out the most significant bit of the index parameter when fetching the correct pointer, writing 0x80 through 0x83 to Player_State only results in us executing standard movement routines. As it turns out, writing 0x84 results in a jump to $2060, a mirror of the PPUCTRL register. By triggering the second out-of-bounds write while on the ground so Player_State holds a value of 0x00, Player_State is overwritten and we manage to escape the confines of normal program execution. Having next to no knowledge about PPU open bus and the rendering process, I naively assumed we couldn’t do anything useful here, but 100th_Coin had way too much knowledge about the PPU to let this opportunity go to waste. 100th_Coin and threecreepio go more in-depth into what can be accomplished by forming instructions from PPU registers, but to briefly explain the result: by manipulating the last OAM slot to have a specific Y position and triggering an OAMDATA read at a specific point in rendering, it is possible to overwrite select bytes in the program code. Our options are limited but after some trial and error, threecreepio found that writing 0x08 to $9C9D was our golden ticket to ACE!
Once we reset the game, we find that letting a power-up fully rise out of a block provides a fairly convenient entry point to ACE; this corruption modifies PowerUpObjHandler such that processing a fully grown power-up will execute a PHP instruction, pushing the processor flags to the stack and modifying the subroutine’s return address to $0F34. Note that the Famicom’s internal work RAM is mirrored three times across $0000 to $1FFF, so this entry point functions as a mirror of $0734. Assuming execution can make its way to $0F4A, we can use the controllers to create desired instructions much like ACE in SMB2J! Unfortunately, reaching $0F4A is not guaranteed since some memory values can cause the CPU to freeze or result in us overshooting our destination.
Since we needed to enable hard mode to even get this far, we’re able to choose which world we want to start on before triggering ACE. Although we need to be in world 8 to trigger the ending sequence, 8-1’s only power-up is halfway through the level; it’s much faster to have the payload modify our world number instead. Because 1-1, 4-1, and 6-1 use the mountain backdrop, these levels create a JAM opcode at $0F42, freezing the CPU and locking us out of executing $0F4A. Of the remaining first levels, 2-1’s first power-up is the quickest to access but we can’t hit this power-up without $0F3A being 0x02, so ACE in 2-1 would require using the 1-Up or a later power-up instead. With this in mind, 3-1’s first power-up is our fastest option; we need to free up one of the enemy slots before triggering ACE so $0F3A updates to 0x03, but this is easily accomplished by stomping a Koopa Parakoopa and kicking its shell to defeat the other one.
Though 3-1 is a suitable level for executing controller inputs though ACE, we need to manipulate a couple other memory values to modify the stack and place an RTS instruction after the controller inputs so execution safely returns to program code. By adjusting our screen scroll, we manipulate $0F3F to 0x28 so we have a PLP instruction to pull a byte off the stack and adjust our return address to $617E, the end of the NMI routine. We also adjust Mario’s on-screen X position so $0F55 has a value of 0x60 when the power-up has fully emerged, creating the RTS we need for a safe return. $0F55 will not change once we begin executing in RAM, so inputs performed on subsequent frames don’t automatically ruin our safe return. However, $0F3F is still able to change, so our payload inputs avoid moving right when possible.
The goal of this TAS is to beat the game as quickly as possible using ACE, so our payload is responsible for setting up the memory values we need to run the 8-4 ending sequence. Thankfully, we don’t need to change much…$075F must be set to 0x07 so we’re considered to be in world 8 and $0770 must be set to 0x02 or 0x82 to tell the game to run the end of castle subroutines; since $0772 is 0x03 during standard gameplay, these modifications cause the game to start executing PrintVictoryMessages, which displays the ending text and queues the ending music.
Before discussing the payload in depth, it’s important to discuss how we can form instructions using controller inputs and what limitations affect us. $0F4A and $0F4B contain filtered controller 1 and 2 inputs, respectively; these addresses can be manipulated to hold any value but an input with start or select pressed will prevent the respective memory address from changing until that button is released. This means that we want to avoid inputs with start or select presses unless the next unique input releases that button. While $0F4A and $0F4B are fairly malleable, $0F4C contains 0x00 and isn’t modified in normal gameplay, so any absolute addressing instructions target zero page unless we go out of our way to modify this address. This comes to our aid because we’re initially locked out of using 2-byte instructions; the warp to the underground coin room loads 0x42 into $0F50, creating a CPU freeze that must be avoided with 1-byte or 3-byte instructions.
Considering these limitations, the most efficient solution we found was forming pointers to the addresses we need to modify using entity X positions stored in addresses $86 to $90, then using pre-indexed indirect instructions to modify the values at these addresses. An alternative approach we tested involved forming a pointer to $174C, changing $174C to 0x0F/0x1F, then using absolute addressing instructions to modify the remaining memory values, but this method proved slower after further optimizations. This is the payload we settled on, with each instruction corresponding to one frame of the TAS:
DEC $008C
LSR $008F
STY $008D
LSR $008D
STY $0090
LSR $0090
STY $0087
STA ($8A,X)
ISC ($87,X)
SLO ($87,X)
ISC ($87,X)
SLO ($81,X)
We begin by using absolute addressing instructions to form a pointer to $175F at $8C and a pointer to $1750 at $8F. Since the power-up’s X position at $8C is 0x60 once revealed, decrementing this value prepares the low byte for $175F. Similarly, because $8F contains 0xA0 from the brick block we hit, a right shift prepares the low byte for $1750. Since the Y register contains the power-up’s ID of 0x2E every time we execute controller inputs, we prepare both high bytes in the same way; by storing the contents of Y at $8D and $90 then right shifting both addresses, both high bytes are set to 0x17. Before switching over to indirect instructions, we store Y at $87, the kicked shell’s X position, to set up the high byte for the pointer to $1F70 later on.
From this point forward, we use indirect instructions to modify the memory values at the addresses we formed in zero page; note that the X register contains the power-up’s slot index of 0x05 every time we execute controller inputs, so the operands for these instructions were adjusted accordingly. First, we write the accumulator’s value of 0xFE to $1750, which circumvents the CPU freeze and adjusts instruction alignment so 2-byte instructions safely return to program code. We then adjust $175F from 0x02 to 0x07 using two increments and a left shift; unfortunately, since the carry flag is clear when executing controller inputs, a rotate left does not save an extra instruction. By this point, Mario’s X position at $86 has reached 0x70 and the kicked shell’s X position at $87 has reached 0x1F; this optimization by HappyLee is what forms our pointer to $1F70. We adjust $1F70 from 0x01 to 0x02 using a left shift, and set the controllers down since our work is done.
While this TAS uses ACE to complete hard mode in record time, this exploit can be used for so much more. threecreepio has already adapted his SMB2J total control toolkit to this game, opening the door to all kinds of absurd and entertaining demonstrations only previously possible with 100th_Coin’s cart swap ACE exploit for the cartridge version. Since the FDS is equipped with 32KiB of PRG-RAM and 8KiB of CHR-RAM, total control on the disk version is considerably more flexible; it’s only a matter of time until someone plays the Bad Apple!! music video on this version. As for me, I’m content with having contributed to SMB2J ACE and now SMB1 ACE. Being a part of this journey was informative and I had the pleasure of working with some brilliant minds. Certainly time well spent, even if we haven’t solved every mystery yet.
Kosmic