Submission #9626: OnehundredthCoin's NES Super Mario Bros. 3 "Intercycle Cart Swap" in 00:00.02

Nintendo Entertainment System
Super Mario Bros. 3
Intercycle Cart Swap
TriCNES v1.0.1
1
60.0988138974405
0
PowerOn
MissingRerecordCount
Super Mario Bros. 3 (U) (V1.0) [!].nes
PRG0
5f9019040fe23cb412a484e1ef430e59e589f9b4
Submitted by OnehundredthCoin on 4/1/2025 4:04 PM
Submission Comments
"Gotta go fast!"[1]
Notes for judges/TASVideos staff: the actual TAS file is not the one I submitted, due to my own TAS file type obviously not being supported. I have currently nested a ".3ct TAS file" inside the .bk2, and this .3ct is the file I am submitting here. I have also uploaded it in the form of a .txt file as a userfile: UserFiles/Info/638791195450995494 which would need to be renamed as a .3ct file to run it in my emulator. It's a long story, but to save you some time looking for the emulator, I've put a link to the GitHub right here. Thanks! I guess .3ct files don't have any way of tracking rerecord counts. Let's just say this one had 101405 rerecords, since my code the find the best list of cartridges found that many potential working set of cartridges, and this one was chosen due to how much I liked the cartridges in this list.
For a video based explanation of this run, see the following:

Introduction to the Objectives:

  1. Do a little tomfoolery
  2. Stop n' Swop but without stopping
  3. Beat the game in 0.00001676 seconds from power on

Introduction to the Introduction

ACEVideos day has arrived once again, so I want you to consider the following hypothetical. You want to play a good ol' fashion run of Super Mario Bros. 3, but your bedtime is in 17 microseconds. By following this handy tutorial, we might just be able to a get a full run in, with a quarter of a microsecond to spare.
Back in 2021 when I submitted this run of Super Mario Bros. 3, some people asked me if this was "the fastest possible completion" of the game. One of the first ideas I had to improve it was "in theory, if you could swap NES cartridges between CPU cycles, you could run arbitrary code by knowing where the PC is and the value of that address in every single NES cartridge". I call this method of running arbitrary code "Intercycle Cartridge Swapping", and in this TAS I aim to use this in order to complete Super Mario Bros. 3 as quickly as possible.
Before we actually talk about the run, I need to dump a couple years worth of NES programming knowledge upon you, so bear with me. (And if you aren't interested in how the NES CPU works, I have provided an optional "Story Mode" at the end of each section.)
I want to just give a quick overview of how this came to be. I was making a custom NES emulator in late 2021 around the time I submitted my first TAS here on TASVideos. When I optimized the run down to 13 frames, 3 of which have inputs, it started to sound realistic that the game could be brute forced. Could the game be beaten in only 2 frames of inputs? I had to find out. I can't recall how accurate my emulator at the time was, (I know it didn't have open bus support) but I ran all combinations of 2 controllers for those two frames. My emulator at the time determined the 13 frame TAS I submitted in 2021 was the best I could make. But people asked me, and I still wondered, "Could it be done faster if we get looser with the rules?"
People joke about creating some sort of high-energy-particle-emitter and firing it at the NES to cause bit flips, and while that would be amusing, I like the idea of using official hardware with my exploits. When making a NES emulator, you become quite familiar with how the CPU works, and while implementing the MMC3 mapper, I had an interesting thought. The way the code is swapped out in ROM banks completely changes what bytes exist in the ROM address space. But what if instead of swapping banks intentionally, I maliciously used the cartridge slot an another input for a TAS?
This idea sat in the back of my mind for a few years, and I didn't really consider making it until 2023, where I modified bizhawk to allow cart-swapping mid-TAS. That's when I realized, "Hey, this might not be as far-fetched of an idea as I thought."
After submitting the bad apple TAS, I got to work making a new NES emulator. The old ones I had made have never really had any amount of accuracy tests done on them, so this time I wanted to do it right. As I got the emulator working, I began testing the idea of swapping carts between CPU cycles, and it worked. Perhaps I was jumping the gun a bit, for my emulator still had accuracy issues to work out, but in that moment I knew I had to make a video about this. At the time, I was still working on my video that explains how the Bad Apple TAS works, but I included a little teaser at the end saying "What if I could beat Mario 3 in 5 millionths of a second?". I wrote a script, but realized something important. I still never made a video explaining the 13 frame TAS from 2021. "Alright, a two-parter." I said to myself.
And so I got to work, making the NES emulator, and passing as many accuracy test ROMs as I could find. I had a lot of help from people in the NesDev Discord, and after 8 months, I was satisfied.
This has been the combination of all my knowledge about the NES, several months of work creating my own NES emulator to pull this off, more months of work to spread the word about this, and the eventual console verification of this method. I present to you, the Intercycle Cart Swap TAS Proof of Concept.

Introduction to the 6502 CPU

In this context, a byte is an 8 bit value. This means a single byte could represent any number from 0 to 255. If we're using base 16 (hexadecimal) then that range is conveniently every possible 2-digit hex number. $00 to $FF. (the '$' symbol is used in 6502 Assembly to denote the following number is in base 16.)
Suppose we have more than one byte, and each of these bytes could have a different value. If the CPU wants to read one of these bytes, it needs to have some way of referring to each byte separately. We'll give each byte an "address". The first byte will be address 0, the next will be address 1, and so on. Now when the CPU wants to read the value of a specific byte, we can also say which byte we want to read by providing the address.
The CPU of the NES has a 16-bit address space, which means there are 65,536 addresses, which we will label in hexadecimal from $0000 to $FFFF. The program data of a game cartridge will be accessible from address $8000 through $FFFF. While the CPU is running it processes bytes one at a time. The CPU needs to keep track of which byte to process next, and it tracks this information in a 16-bit register called the Program Counter, or PC for short. So if the PC is at address $8000, when the CPU processes the next byte, the value of the byte at address $8000 will be read, and then the PC will increment to address $8001 for the next byte to be processed.
To provide even more detail, when the CPU needs to read a byte, it needs to put the target address on the 16-bit "Address Bus". Then, the value read will be put on the 8-bit "Data Bus". So if the CPU needs to process another byte, the value of the program counter is transferred to the address bus, which will then be used to determine the byte to read.
The CPU works one cycle at a time. Each cycle is either a read, or a write. For an example, let's study the instruction LDA $8000. The bytes that create this instruction in the program data of the cartridge would look like this: $AD $00 $80.
  1. The value of the PC is transferred to the address bus, and Cycle 1 will read the byte determined by the address bus. This would be $AD. The program counter is incremented. The value of $AD tells the CPU we're running the LDA Absolute instruction.
  2. The value of the PC is transferred to the address bus, and Cycle 2 will read the byte determined by the address bus. This would be $00. The program counter is incremented. That value of $00 will be stored in the "data latch" of the CPU.
  3. The value of the PC is transferred to the address bus, and Cycle 3 will read the byte determined by the address bus. This would be $80. The program counter is incremented. That value of $80 and the value stored in the data latch, ($00) are combined to form the value $8000 which is stored in the address bus.
  4. The value of the address bus is NOT overwritten by the PC, and Cycle 4 will read the byte determined by the address bus. This would be whatever value is at address $8000. Since this is the LDA instruction, the value read by cycle 4 will be stored in the CPU's A register.
The instruction STA $8000 would behave very similarly. The first 3 cycles would actually be nearly identical, with the exception of the value read in cycle 1 being $8D instead. Since that value represents the STA Absolute instruction, cycle 4 will instead write to the value at address $8000.
To fully understand how the CPU works, I'd basically need to explain every instruction of the CPU, which I don't plan to do here. I will however explain the 5 most important ones for this run. LDA Immediate, STA Absolute, JMP, BRK, and RESET.
LDA Immediate is a 2 cycle instruction that loads the A register with the value read during cycle 2. Eg. LDA #5A loads the A register with the value $5A.
STA Absolute was already explained, but I'll explain it again for safe measure. STA Absolute is a 4 cycle instruction that will store the value of the A register in the byte at a target address.
JMP is a 3 cycle instruction that moves the program counter to a target address.
BRK is a 7 cycle instruction that moves the program counter to a specific address. The value of address $FFFE and $FFFF are combined to form the new location of the Program Counter. This instruction also pushes 3 bytes to the stack, but don't worry about that. The logic for this instruction is primarily used in interrupts[2] to move the PC to a specific function to handle interrupts, but again, don't worry too much about that.
RESET is a 7 cycle instruction, reusing logic from the BRK instruction. When powering on the console, this is the first instruction the CPU will process, but unlike the BRK instruction, this uses the value of addresses $FFFC and $FFFD to determine where to move the PC. It's worth knowing the timing of this, as it will be relevant when the run begins. Address $FFFC is read during cycle 6, and address $FFFD is read during cycle 7.
So to recap, when you power on the console, the first 7 CPU cycles will be spent running the RESET instruction, which moves the PC to some specific address where the game's logic will begin. Each instruction will begin by reading from where the PC currently is, and every time the PC is transferred to the address bus, it is also incremented. The LDA instruction loads a value into the A register, while the STA instruction stores the value of the A register in a target address. JMP can relocate the PC to any target address, and now you're pretty much all caught up on how the CPU works for the purposes of this run.
With the CPU of my custom emulator implemented and fully functional according to some accuracy tests, I figured it was time to make the run. It was to my horror, that the run failed. I had already tested it before, and I even wrote about it being done in 5 microseconds at the end of my Bad Apple TAS explanation video, but somewhere along the way, something was fixed, and the run no longer worked.
(Image unrelated, the game actually crashed before loading the victory screen, making the run impossible.)
Or so I thought...
You see, the run might not work from the moment you power on the console, but the main issues include the MMC3 chip putting the credits code in another spot at the moment the console boots, and some magic number SMB3 reads and if the value isn't there, the game resets. If I run my TAS at the moment I press the RESET button, assuming I'm on the title screen, it once again works. Phew! But what to do about the run from power on?

Introduction to the MMC3 mapper chip

Amazingly, we're still not ready to talk about the run, as we need to talk about the mapper chip[3] inside the cartridge of Super Mario Bros. 3. As stated before, the program data of the inserted cartridge can be read from address $8000 to address $FFFF (32,768 bytes) but the program data of Super Mario Bros. 3 is 262,144 bytes! 262,144 is greater than 32,768,citation needed so it begs the question, "how can we fit all the data from the game into a smaller space?"
The answer is to break the program data of the game down into smaller 8 kibibyte "banks" that can be swapped in and out when needed. If you need to run the logic for how mario moves, swap in banks 7 and 8. If the background graphics need updating, swap in banks 25 and 26. By splitting the game into banks like this, the program data could be dramatically larger than the CPU address space. But this mapper chip has some rules you need to follow.
First off, the final 8 kibibyte bank of program data will always be accessible at address $E000 through $FFFF. This way, when instructions like RESET read from address $FFFC, these bytes will always be the same, as you cannot swap another bank into this address. Secondly, the 2nd the last program bank will also always be loaded, but the MMC3 chip has a bit of leniency on this one. If you (the game developer) would prefer, the 2nd to last program bank could always be loaded at address $8000 through $9FFF, or if you'd prefer, it could be loaded at address $C000 through $DFFF.
You might be wondering, "How does a programmer utilize the MMC3 chip to swap banks in and out, or decide where to put the 2nd to last program bank?". At this point, I'm going to once again display this footnote[3] as the documentation on the NesDev Wiki will do a better job that I, but I'll still give a brief rundown. In short, the MMC3 chip has "registers" you can access and modify by running a store instruction to address $8000 and $8001. The value written to $8000 is the "bank swap mode", and the value written to $8001 is the value of the bank you want to swap in.
The bank swap mode determines both the location of the 2nd to last program bank (determined by the 6th bit of the value written to $8000), as well as the location of the bank you want to swap in (determined by bit 0). So in the case of Super Mario Bros. 3, which always has the 2nd to last bank at address $8000 through $9FFF, every single time a bank is being swapped out, it also needs to make sure bit 6 of the value written to $8000 is set.
The MMC3 chip also features "program RAM", which contains $2000 extra bytes for the CPU to store data in from address $6000 to $7FFF. This RAM is enabled by writing a value of $80 to address $A001.
But of course I wasn't going to stop just because my initial run didn't work. If this could work from RESET, it can work from POWER too. I spoiled it in the previous story mode section, but setting up the MMC3 registers wasn't enough. We need to write to some "magic" location in the cartridge's external Program RAM, and also make sure we move the PC to the function which prepares the victory screen. Let's look through the SMB3 disassembly and figure out what to do.
Delightfully devilish, 100th_Coin!

Introduction to the Super Mario Bros. 3 code

Okay, now that we've learned the basics of how the CPU works, and how the MMC3 chip inside the Super Mario Bros. 3 cartridge works, let's talk a little bit about how Super Mario Bros. 3 is organized. It was mentioned that the second to last program bank is loaded from address $8000 to $9FFF, and inside this bank at address $8FE3, we'll find the function that prepares the victory screen of the game. This function prepares some values in RAM, jumps to a subroutine that swaps in the program banks for the credits, and then jumps to address $B85A inside one of the newly loaded program banks.
Every frame, an Interrupt Request (or IRQ for short) occurs. This IRQ is used to jump to a function that will update the graphics so the checkerboard floor can remain in one place while the curtains can drop or raise during the credits. However, (and the reason for this is beyond me) inside the IRQ function, the game will check if address $7964 contains the value $5A, and if it does not, the PC is moved to the same location used when the game is initially powered on. Everything in RAM gets cleared, and the game reboots. It's also worth noting that the only time this value of $5A is ever written to address $7964 is during the code executed while preparing to load the title screen, and this one line of code in the IRQ function is the only use for it.
And finally with the knowledge of where the function is that loads the credits, the fact that we need to write $5A to address $7964, and that we would need to write to address $8000 to move the 2nd to last bank in the correct location such that the credits is in-fact at address $8FE3, we're finally ready to talk about the run.
I was feeling confident. Perhaps too confident. This run initially worked in my emulator, but to test it on actual hardware, I wrote a ROM Hack of SMB3 that simply executes the code of my TAS from power on. It crashes.
Just when I thought I had everything figured out, another roadblock appears.
The Audio Processing Unit.

Introduction to the Audio Processing Unit

Just kidding. Before we talk about the run, we also need to talk about the APU's "Frame Counter"[4].
Approximately 59.99908 times a second (keep in mind, the NES PPU actually has a frame rate of approximately 60.098814 frames a second), the audio chip of the console will create an Interrupt Request. The APU will also continuously send IRQ requests infinitely unless you either disable this behavior, or read from address $4015. Since the IRQ enables the interrupt flag (which prevents another IRQ from running), there won't be another IRQ until the flag is either disabled, or an RTI instruction occurs (since RTI will restore the previous state of the interrupt flag), but that RTI would immediately lead into another IRQ, and this pattern would never stop. Since the IRQ routine of Super Mario Bros. 3 does not read address $4015, the RTI at the end of the IRQ routine would immediately be followed by another IRQ, causing an infinite loop.
This behavior can be disabled, though it's enabled by default. To disable it, we need to write to address $4017 with a value that has bit 6 enabled.
Now that we know where the function to prepare the credits is located, how to make sure the bank containing that code is in the correct spot, how we need to write $5A to address $7964, and how to disable the APU Frame counter IRQ's, we can finally talk about the run, for real this time.
And finally the run works. The new custom Rom Hack that runs this code from power on doesn't crash, and I felt this huge wave of relief. I had a proof of concept for my proof of concept.
In the context of this run, I don't believe there's such a thing as going too far. Some will say "rules were meant to be broken", but I like to think my angle is a bit more "rules were meant to be discovered" than anything else. Perhaps that rule is "This is allowed", who knows. If you are a TASVideos staff member reading this, I just wanted to say "oops" for making this. I can almost hear a faint musical number beginning...
♫ 100th_Coin with his crazy innovations! ♫
♫ TASVideos staff are gonna update regulation ♫
♫ When they see Coin's outrageous emulation. ♫
♫ There'll be trouble in town tonight! ♫

Introduction to the Intercycle Cartridge Swap

Okay, just one more tangent.
For this run, we won't be using the controllers. Instead, we're going to be swapping cartridges between CPU cycles in order to run custom code. Here is the assembly code I wish to run, with comments:
RESET:
    LDY #$80  ; Load the Y register with the value $80.
    LDA #$5A  ; Load the A register with the value $5A.
    STY $A001 ; Store $80 in address $A001 to enable the MMC3 PRG RAM.
    STA $4017 ; Store $5A in address $4017 to disable the APU Frame Counter IRQs.
    STA $8000 ; Store $5A in address $8000 to move the 2nd to last program bank to the $8000 - $9FFF range.
    STA $7964 ; Store $5A in address $7964, as without it, Mario 3 will reboot during the IRQ function.
    JMP $8FE3 ; Move the PC to address $8FE3 in order to run the function that loads the credits for Super Mario Bros. 3
If we were to "assemble" those instructions into bytes, we would get this string of hexadecimal values: A0 80 A9 5A 8C 01 A0 8D 17 40 8D 00 80 8D 69 79 4C E3 8F.
Here's the plan. After the CPU reads A0 from some cartridge, we will then remove the currently inserted cartridge from the NES, and insert another cartridge. The PC will be incremented, and the cartridge we just inserted would have the value 80 at the new address of the PC. That value is read, and we then swap the cartridge out for yet another cartridge. Whenever the CPU is writing, we'll insert Super Mario Bros. 3 to make sure the MMC3 registers/PRG RAM get updated.
And without any further ado, let's see how the run goes.
This run initially came together in August 2024, and I'm loosely following the order of events here with this "story mode", but surely you have started to ask an important question. How were you testign these runs in your emulator? How do you even get a list of cartridges for this to work?! I mean, logically you can't run such a TAS without making it, so how do we figure this all out?
Oops, I turned the story mode into nerd-talk. If you're not into this sort of thing, feel free to skip this one.
Let's talk about how this run was put together. By now you should understand the basics of how the run works. The cartridge is replaced with another cartridge, the PC is initially moved by the RESET instruction, but then incremented during instructions.
Let's start with the "from RESET" run to understand how this was put together. If I wanted to immediately move the PC to address $8FE3, could this be done in the RESET instruction? We would need to find a game with the value $E3 in address $FFFC, and $8F in the address $FFFD. If there are games that meet this criteria, how do we determine this?
Assume for the sake of this programming exercise, that you have a copy of every NES ROM. The data is stored in the following way: The header, the Program ROM, and then the Character ROM. By first reading the contents of the ines header, we could determine the length of the Program ROM. With this length determined, we know exactly which bytes are for the Program ROM and which bytes are for the Character ROM. See the ines header info here: https://www.nesdev.org/wiki/INES
Let's analyze a NES ROM. Correction- let's analyze every NES ROM. Alas, reading the value at a target address is not as simple as reading the index into the ROM. Some games have mapper chips, and of course, index $10 into the ROM is the first byte of the Program Data, which we would find at address $8000, which could potentially be mirroring the values at address $C000 if the cartridge is small enough. It's all sorts of complex! Luckily, right now, we're just after the interrupt vectors.
Regardless of mapper chip, the Interrupt Vectors are always the final 6 bytes of the Program ROM, so the RESET Vector is always 4 bytes before the end of the ROM. This means, if we were to analyze every NES game and check the value 4 bytes and 3 bytes before the end of the program data, we can check if any games have the values we need!
The function to do this would simply be, take the length of the Program ROM, and then look at the byte at index $10 + ProgramRom.Length - 4.
The function to check every NES game for a specific value in the interrupt vectors would be as easy as looping over every NES game and checking if that target address has the value we want.
And the moment you've been waiting for! For address $FFFC, we could use... none. There are no games with the value $E3 in address $FFFC. Zero. Oh- and the same applies to a value of $8F in address $FFFD. (And the unbelievably clever among you might recognize that the title screen and credits of SMB3 use the same program banks. Could we try a RESET to address $B85A, where the credits-preparation function at address $8FE3 jumps to after loading the already-loaded program banks? Nope, there aren't any games featuring a value of $B8 in address $FFFD, but good try!)
That's a bit of a set back. I guess we'll need to be able to read other addresses than just the interrupt vectors. Ignoring games with mappers other than NROM (since those can have nondeterministic power on states) let's assume address $10 into the ROM file will be address $8000 of the CPU address bus. If the Program ROM is only $4000 bytes long, we'll duplicate these values to address $C000-FFFF as well. Now, reading from a specific address is as simple as subtracting $7FF0. You want to read address $9000? $9000-$7FF0 = $1010, so we read index $1010 of the ROM. Remember, this only applies to games with the "NROM Mapper", and a handful of other mappers without Program Bank swapping. More info on NROM can be found here: https://www.nesdev.org/wiki/NROM
With that sorted out, can we jump to $9000? There are over a hundred NES games with the value $00 in address $FFFC, so we got that covered. Are there any games with a value of $90 in $FFFD? Yes! Just one. "Dash Galaxy in the Alien Asylum".
So, if we move the PC to address $9000, we could branch to address $8FE3 faster than we could JMP to address $8FE3. (Well, technically the JMP takes fewer CPU cycles, but we could swap to SMB3 sooner with the branch.) Let's talk about branches. We need a branch instruction that will be taken, and the operand is a signed value for how many bytes to move the PC. I chose a BCC instruction, since the Carry flag is clear for the majority of a frame on SMB3's title screen.
The BCC instruction has the value $90. Are there are games that have the value $90 at address $9000? Let's ignore games with mapper chips here, as they tend to have nondeterministic power-on states, and we want to guarantee the values exist where we need them. There's a handful of games with the value we need, but the best of them is Kung Fu. This game is the most "first-party" of the options, so I went with it.
The operand of the branch needs to move the PC to address $8FE3 from address $9002, which means the value needed is $E1. By searching for a value of $E1 ad address $9001, I found the game Pipe Dream. And after that, we can swap to SMB3.
So that's how the "from RESET" run was created, but the "from POWER" run is 30 cycles long. Surely we can automate the process of looking through the entire NES address space for a 19-byte string, right? Yeah we can!
I basically just looped from address $8000 to $FFFF until we found all 19 bytes in a row across various NES cartridges. (I also implemented ways of checking the MMC3 and MMC1 mappers.) The end result was less impressive than I hoped, for I barely recognized any of the games in this list:
Adventures of Lolo 3
NES Open Tournament Golf
The Addams Family
Dig Dug II - Trouble in Paradise
Alpha Mission
Archon
Muppet Adventure - Chaos at the Carnival
Attack of the Killer Tomatoes
Battle Chess
Formula One - Built to Win
The Addams Family
Jimmy Connors Tennis
Dig Dug II - Trouble in Paradise
Adventures in the Magic Kingdom
Skate or Die 2 - The Search for Double Trouble
Castlevania II - Simon's Quest
Casino Kid
Amagon
Romance of the Three Kingdoms
To improve this list, I assigned a "popularity score" to each game, and calculated the total score for every possible set of cartridges that works. This was entirely opinion based, and a very arbitrary metric, but the end result had a lot of cartridges I was familiar with, which makes me pleased. (see the cartridges listed below for the final list)

Introduction to the Run

We power on the console. At this point, it doesn't yet matter what cartridge is inserted.
The first few CPU cycles of the RESET instruction are "dummy reads" which means they don't do anything.
CycleProgram CounterAddress BusValueCartridge InsertedPurpose
1$????$????$??Any cartridgeReading the opcode of the next instruction. The CPU "reset flag" is set, so this value is discarded and replaced with $00.
2$????$????$??Any cartridgeReading the operand of the RESET instruction. This does nothing.
3$????$0100$00Any cartridgeReading from the stack. The value read is unused. The stack pointer is decremented.
4$????$01FF$00Any cartridgeReading from the stack. The value read is unused. The stack pointer is decremented.
5$????$01FE$00Any cartridgeReading from the stack. The value read is unused. The stack pointer is decremented.
6$????$FFFC$24BurgerTimeReading from $FFFC. This will be the low byte of the PC once the RESET instruction ends.
7$????$FFFD$F9Krusty's Fun HouseReading from $FFFD. This value will become the high byte of the PC. The PC is now at $F924.
8$F924$F924$A0BreakThruReading the opcode of the next instruction. $A0 is for the LDY Immediate instruction.
9$F925$F925$80Super Mario Bros. 3Reading the operand of the LDY Immediate instruction. Y = $80.
10$F926$F926$A9Super Mario Bros. 3Reading the opcode of the next instruction. $A9 is for the LDA Immediate instruction.
11$F927$F927$5ABomberman IIReading the operand of the LDA Immediate instruction. A = $5A.
12$F928$F928$8CMickey MousecapadeReading the opcode of the next instruction. $8C is for the STY Absolute instruction.
13$F929$F929$01GradiusReading the first operand of the STY Absolute instruction. This will be the low byte of the target address.
14$F92A$F92A$A0BurgerTimeReading the second operand of the STY Absolute instruction. This will be the high byte of the target address. ($A001)
15$F92B$A001Y ($80)Super Mario Bros. 3Store $80 at address $A001 to enable the MMC3 chip's PRG RAM.
16$F92B$F92B$8DSuper Mario Bros.Reading the opcode of the next instruction. $8D is for the STA Absolute instruction.
17$F92C$F92C$17AthenaReading the first operand of the STA Absolute instruction. This will be the low byte of the target address.
18$F92D$F92D$40BurgerTimeReading the second operand of the STA Absolute instruction. This will be the high byte of the target address. ($4017)
19$F92E$4017A ($5A)BurgerTimeStore $5A in address $4017 to disable the APU Frame Counter IRQs
20$F92E$F92E$8DSuper Mario Bros. 3Reading the opcode of the next instruction. $8D is for the STA Absolute instruction.
21$F92F$F92F$00Kid IcarusReading the first operand of the STA Absolute instruction. This will be the low byte of the target address.
22$F930$F930$80Super Mario Bros. 3Reading the second operand of the STA Absolute instruction. This will be the high byte of the target address. ($8000)
23$F931$8000A ($5A)Super Mario Bros. 3Store $5A in address $8000 to set up the MMC3 chip so the 2nd to last program bank is loaded from address $8000 to $9FFF
24$F931$F931$8DSoccerReading the opcode of the next instruction. $8D is for the STA Absolute instruction.
25$F932$F932$00JawsReading the first operand of the STA Absolute instruction. This will be the low byte of the target address.
26$F933$F933$80Kirby's Adventure (PRG1)Reading the second operand of the STA Absolute instruction. This will be the high byte of the target address. ($7964)
27$F934$7964A ($5A)Super Mario Bros. 3Store $5A in address $7964 of the MMC3 Program RAM. This prevents Super Mario Bros. 3 from rebooting every frame.
28$F934$F934$4CThe Legend of ZeldaReading the opcode of the next instruction. $4C is for the JMP instruction.
29$F935$F935$E3ZanacReading the first operand of the JMP instruction. This will be the low byte of the Program Counter.
30$F936$F936$8FSuper Mario Bros. 2Reading the second operand of the JMP instruction. This will be the high byte of the Program Counter. ($8FE3)
31$8FE3$8FE3$A9Super Mario Bros. 3Before this cycle actually runs, stop the timer with Super Mario Bros. 3 inserted. Without any further input from us, the game will begin preparing the victory screen.
Keep in mind, Super Mario Bros. 3 will be inserted before the 31st CPU cycle, to the run is only 30 CPU cycles long. And if all goes according to plan, we should see the victory screen!
Uh-
So this was actually found out the day before I uploaded my video about this. I had a deadline to stick to, and there wasn't time to fix this, so this ended up as a footnote in the section about the PPU reset flag. I was pretty shocked (to say the least) that I had overlooked this. I wasn't really able to do any tests on my console about the PPU Reset Flag, since the Everdrive I'm using does some funny stuff behind the scenes on the first few frames after a RESET, so the flag is already cleared by the time my custom ROM would begin running. I later learned the exact timing of it, wrote it down in the footnote for the video, and didn't make the realization that the flag would be set during the SMB3 runs too!
I'm going to pretend this is fine. Once the curtains drop, it's visually identical to any other run anyway.

Introduction to the Picture Processing Unit's Reset Flag

When the console is powered on or reset (but only when powered on for the Famicom), the Picture Processing unit sets a "reset flag" that is cleared on dot 1 of the 261st scanline.[5] This reset flag prevents writing to a handful of the PPU Registers. To list out the important parts of what cannot be done with this reset flag set, you can not enable the NMI, enable rendering, or change the 'v' register of the PPU[6] (The read/write address). Since the read/write address of the PPU cannot be modified until the 261st scanline, a lot of the princess's chamber isn't properly rendered, as the game attempts to write to the PPU before the reset flag is cleared. This results in the default pattern of PPU RAM being shown for much of the background. This default pattern is different per RAM chip, but with my console it has the following pattern:
F0 F0 0F 0F F0 F0 0F 0F F0 F0 0F 0F F0 F0 0F 0F 0F 0F F0 F0 0F 0F F0 F0 0F 0F F0 F0 0F 0F F0 F0 (Numbers shown in hexadecimal)
The F0 value corresponds to the '0' character being drawn, and the 0F value is that other character next to the 0's, which is the upper right of a leaf. The fun way the color palettes are assigned to these tiles also has to do with this default pattern.
With the PPU Reset flag implemented in my emulator, the run from reset looks broken, and you've already seen how the run from power on looks. It was a bit disappointing to see the runs no longer looking identical to a regular completion of the game. Oh well. Easy come, easy go I guess.
Anyway, I forgot to make a Steamed Hams reference here.

Introduction to the Physics Problem

Okay, cool. So this has been a big ol' mess of various NES programming knowledge, and it's quite possible you don't care about any of that, so let's talk about something entertaining for a paragraph. To recap, the idea here involves swapping NES cartridges between CPU cyles. If we were to perform this on actual hardware, we would likely need to use a "toploader console". This is where I would mention that the toploader is missing the CIC chip as well as how the CIC chip works, but I said this paragraph would be entertaining.
Assume the cartridge needs to be inserted with a depth of 1.6 inches. The "M2 pin" of the cartridge slot is the clock, and when M2 has low voltage, the address bus is unstable. By taking into account the clock rate of 1789772.72 cycles per second and the 5/8 duty cycle of the M2 line, that leaves us with 175 nanoseconds to take out a cartridge and 175 more nanoseconds to insert a different cartridge. Some quick math gives me 9.16 million inches per second, or 520,661 miles per hour. (837922 Kilometers per hour)
Did I say this section would be entertaining? I meant it would have math. At any rate, those cartridges would be the fastest man-made object as of writing this, which only complicates things further if we account for air resistance. It goes without saying, the cartridges would be destroyed. Also the console. Probably everything within a few hundred feet too. I passed the numbers to a friend of mine, who calculated the Reynolds Number[7] of this hypothetical cartridge in this situation as 10 trillion.
Well, if we're not able to move these cartridges fast enough to perform this, just how do we test it?
I hope the numbers haven't lost all meaning. Sometimes you just see big numbers and it becomes white noise. A metaphor for "big". I don't even know what the heck a Reynolds Number is, but 10 trillion sounds mighty impressive.
But you bring up a good point. This run, these hypotheticals... that's not a TAS. Data alone cannot be submitted to TASVideos, even on ACEVideos day. I'll need something concrete. Something that can be submitted in a .zip file, and ran on your own computer. But how...

Introduction to the TriCNES emulator

We emulate it of course! Since swapping cartridges between CPU cycles is not a feature of Bizhawk (and I highly doubt it ever will be) I had to take matters into my own hands and make an emulator that could achieve just this.[8] (Oh man- I bet the judges will have a few words to say about this.) I spent 8 months working on emulator accuracy. Is it perfect? no. Is it just as good as Bizhawk? I'd say so. There are some tests where TriCNES fails but Bizhawk succeeds, and there are tests that Bizhawk fails and TriCNES succeeds. No emulator is perfect, but I'm happy with what I made.
Also, if you're curious, TriCNES is the abbreviated version of the name. The full name is "Coin's Contrabulous Cartswapulator", but I figured Tri-C NES was a good shorthand for it.
Let's talk about this TAS file. Since TriCNES was designed with cart-swapping TASes in mind, we needed to create a custom file format. Introducing a .3ct TAS file! (which of course stands for Tri-C (Coin's Contrabulous Cartswapulator) TAS)
The format works like this. The first line is the number of cartridges used in this TAS, and let's call this value 'n'. The following 'n' lines are a list of cartridges used in the form of a local file path from the TriCNES "roms" folder. Every line after that until the end of the file are in the following format. "x y" where x and y are integer values separated by a space. With this format, just before cycle x, the emulator will swap to index y of the cartridge array.
Let's see the submitted TAS as an example.
16
BurgerTime (U) [!].nes
Krusty's Fun House (U) [!].nes
BreakThru (U) [!].nes
Super Mario Bros. 3 (U) (V1.0) [!].nes
Bomberman II (U) [!].nes
Mickey Mousecapade (U) [!].nes
Gradius (U) [!].nes
Super Mario Bros. (W) (V1.0) [!].nes
Athena (U) [!].nes
Kid Icarus (UE) (V1.0) [!].nes
Soccer (JU) [!].nes
Jaws (U) [!].nes
Kirby's Adventure (U) (V1.1) [!].nes
Legend of Zelda, The (U) (V1.0) [!].nes
Zanac (U) [!].nes
Super Mario Bros. 2 (U) (V1.0) [!].nes
6 0
7 1
8 2
9 3
11 4
12 5
13 6
14 0
15 3
16 7
17 8
18 0
20 3
21 9
22 3
24 10
25 11
26 12
27 3
28 13
29 14
30 15
31 3
Just before cycle 6, the emulator will swap in cartridge 0, BurgerTime (U) [!].nes, before continuing. Just before cycle 7, the emulator will swap in cartridge 1, Krusty's Fun House (U) [!].nes, before continuing. And so on. See the table in "Introduction to the run" to compare with the .3ct file.
With the TriCNES emulator open, click the dropdown menu "TAS > Run .3ct TAS" and select the .3ct TAS file you wish to watch. Click the "Load Cartridges button" inside the menu, which will check if you have all the required cartridges and load them into an array. Optionally, you could change the CPU/PPU clock alignments, but that has essentially no effect on this TAS. Make sure the run is set to start from RESET. Finally, click the "Run TAS button". You will then see the victory screen, with a few artifacts here and there as discussed in the "Introduction to the Picture Processing Unit's Reset Flag" section.
As absurd as this run has been to make, it has been so much fun. I spent so much time making this emulator, you have no idea how proud I am of it. Sure, there are a couple things it's getting wrong, and a notable lack of support for certain mapper chips, but I made it from the ground up, and the code is available to be dissected and for people to tell me I'm doing something wrong. I will happily make updates and fix issues.
And it was at this point where I thought the story had come to an end, but suddenly, about 48 hours ago, I received a message in my discord server... someone had done it. A console verification has occurred.

Introduction to the console verification process

This is not a traditional TAS. This could not work with a traditional replay device. For this run, I had the help of Decrazyo, who built a custom board capable of holding multiple cartridges.
Full disclosure, the custom board by Decrayzo would not be able to perform this specific run. This run requires 16 cartridges, which is a bit absurd. However, Decrayzo's board, which holds 5 cartridges, was able to perform the "from RESET" TAS I envisioned[9]. It's not the one I'm submitting (because here at TASVideos, we tend to prefer if our TASes began at power on), but I hope the fact that this other run works is enough to convince you that this one is also likely to work.
But while we're here, let's talk about how insane it is that the "from RESET" run in 9 CPU cycles (0.000005 seconds) has been confirmed to work. It's absolutely incredible. I feel like a mad scientist watching my creation come to life. This ridiculous theory I had to run arbitrary code by swapping cartridges actually works!
And while this method of electronically switching which cartridge is connected to the console exceeds in the whole "not destroying the neighborhood" aspect, and also the "This could be presented" aspect, I have to admit, it feels like something is missing.
Nah, just kidding. This is incredible. I still can't believe it works. I cannot thank Decrazyo enough for their efforts to do this. It's amazing.

Introduction to the Conclusion

After figuring out the exact code you want to execute, running a program to determine what address it can be executed at and what cartridges contain all the values you need, assuming it even CAN be executed... by swapping cartridges every CPU cycle, which requires moving them at over 500,000 Miles/Hour making this the most superhuman submission ever made, we are left with a run completing the game long before rendering is enabled, even before the scanning beam completes a single scanline, which is hidden by the NTSC overscan[10], effectively stopping input before the screen has done a single thing. Entertainment value is a thing of the past, as this run can only be found entertaining through understanding how it works, for the human eye couldn't even process the run occurring.
This is the the most absurd TAS I've ever made. This is one of the most absurd projects I've ever had. This is one of the most absurd ideas I have ever taken seriously.
I strongly encourage people to do silly things and document it. "Remember kids, the only difference between screwing around and science is writing it down."[11]
And so this journey comes to a close. I don't know if I plan to make any more Intercycle Cart Swap runs. It's certainly a gimmick, but it's one I'm really proud of. On top of this one I'm submitting, I also created the "from RESET" run of SMB3, as well as two runs of SMB1. The first one beats the game without fixing any of the graphics (which I later found out the PPU's reset flag prevented me from enabling the NMI, so that run actually wouldn't work, ha!), and a second SMB1 run that does fix the graphics. (That run would work, since I wait for the reset flag to clear before doing stuff.) It was suggested by ais513 that I try compressing the graphics of the SMB1 run and write code in unused areas of RAM to fix the graphics, allowing for the final input to happen before the reset flag is cleared while still fixing everything. I might revisit that some day, but I don't think it's the kind of run I'd submit to TASVideos. I'm not sure where I'll go from here. I mean, how could I one-up this?

Introduction to the Special Thanks

Thank you to Decrazyo for creating the custom board, console verifying the "from RESET" run, essentially verifying this one as well.
Thank you to everybody in the NesDev community, especially the people who maintain the NesDev Wiki, those who are active in the discord, and those who made accuracy test ROMs between now and over two decades ago. Especially Blargg for their work making test ROMs, as well as lidnariq and Fiskbit for answering countless questions in the NesDev discord as well as making test ROMs.
Thank you to to the TASBot discord server for your help answering any questions I had, specifically BigBass, TheDot, dwangoAC, CompuCat, and blastermak.
Thank you to ais523 for reaching out to me about a potential way to optimize a different Intercycle Cart Swap TAS. I'm glad to hear I'm not the only one taking this idea seriously.
Thank you to Kosmic, Bismuth, GTAce, ThaRixer, Adef, Big Ted, Pannen, and Lain for your occasional advice, and inspiring projects.
Thank you to Tony, Alli, Luke, and Grey for listening to my non-stop rambling as I developed this emulator for 8 months, then spent 6 months making a video about this TAS. I seriously cannot believe your patience, or thank you enough for it.
Thank you for reading.

Introduction to the References

Last Edited by OnehundredthCoin 1 day ago
Page History Latest diff List referrers