Tool-assisted game movies
When human skills are just not enough

Submission #5237: Tyler Kehne, MKDasher, sonicpacker, Snark, SilentSlayers, Gaehne D, Eru, ToT, Plush & sm64expert's N64 Super Mario 64 "1 Key" in 04:21.3

Console: Nintendo 64
Game name: Super Mario 64
Game version: JPN
ROM filename: Super Mario 64 (J) [!].z64
Branch: 1 Key
Emulator: Mupen64 0.5 Re-Recording v8
Movie length: 04:21.3
FrameCount: 15678
Re-record count: 164623
Author's real name: Tyler K, David R, Jeremiah T, S, Martin H, Eru, TT, Fabio G & sm64expert
Author's nickname: Tyler Kehne, MKDasher, sonicpacker, Snark, SilentSlayers, Gaehne D, Eru, ToT, Plush & sm64expert
Submitter: sonicpacker
Submitted at: 2016-09-29 16:31:39
Text last edited at: 2016-11-19 21:04:56
Text last edited by: Fog
Download: Download (3686 bytes)
Status: published
Click to view the actual publication
Submission instructions
Discuss this submission (also rating / voting)
List all submissions by this submitter
List pages on this site that refer to this submission
View submission text history
Back to the submission list
Author's comments and explanations:

(Link to video)
(Link to 4:21.40 previous submission video)
(Link to 4:21.67 initial submission video)

Super Mario 64, the classic platformer from 1996, stars that Italian red plumber dude named Mario. He gets an invitation to have a one night stand eat a cake that Princess Peach has baked for him. So, like any fat Italian man would, he jumps in a green tube then heads to the castle - and so the journey begins. Once Mario arrives, he realizes that there is no cake and there will be no cake. The princess wants him to lose some weight, and she has hired Bowser to be a fitness trainer for Mario. The plan is to motivate him with cake and make him to run around the castle for "power stars," but Mario, just as any fat Italian would, skips all of that, loses no weight and demands cake.

Game Objectives

  • Emulator used: Mupen64 0.5 Re-Recording v8
  • Aims for fastest time
  • Uses game-breaking glitches

NOTE: This submission text is still a work in progress. An immense amount of work went into this TAS, and there is a lot to discuss in order to do it justice. It will take some time to catalog all of it.


Swordless Link (23:22:41): You found the method for 1 star, I found the method for 0 star, there's nothing we can't do! ^_^

AKA (23:23:18): We can't theoretically improve the game anymore, apart from opening the door in the moat, which is impossible.

Two decades to the day have passed since the US debut of Super Mario 64, and the holy grail of sequence breaks has finally been achieved: The Moat Door Skip. Everyone who has ever owned this game has tried (in vain) to open the door to the basement while it is still underwater. The purpose in doing so is to obsolete the first of the two keys held by Bowser, and thus make Bowser in the Dark World and the 1st Bowser fight unnecessary. In theory, this should save considerable time in a speedrun. Since every major sequence break in SM64 any% led to a new category that reflects the minimum required collectible items (70 Star->31 Star->16 Star->1 Star-> 0 Star), the team thought it appropriate to name this run “1 Key”.

Many years ago, the SM64 community demonstrated with hacks that the moat door was fully functional underwater, and that there was a 1 frame window where Mario could run underwater and open it. Still, the uncooperative geometry of the Castle Grounds map seemed to indicate that the Moat Door Skip would always be a plumber’s pipe dream. However, in 2015 Tyler Kehne identified the parallel universe (PU) glitch, and he immediately realized that it could potentially be used to solve the problem. After overcoming several seemingly fatal issues with applying the glitch, he theorized a method for Moat Door Skip that takes a detour through Vanish Cap Under the Moat. Tyler and sonicpacker then collaborated to demonstrate this method, which required the collection of 10 stars.

But in order to make Moat Door Skip useful for a speedrun, the team had to find a way to quickly get into VCUtM with only 0 stars. The only known way to do this involves jumping with a very high negative speed, which results in a proportionally large vertical displacement that can traverse the depth of the water in 1 frame. In order to make the trick work, the team had to design a far more potent BLJ than anything ever seen before on Castle Grounds. SilentSlayers and sonicpacker set to work on surveying for potential BLJs, and eventually they found one on the far side of the lake that allowed several thousand speed to be accumulated. Tyler then figured out how to route the speed to an appropriate jump location, and the vision of the Moat Door Skip was fully realized!

Meanwhile, during testing Tyler Kehne had found a new glitch that exploited parallel universes, known as the Overflow Jump (OJ). This powerful glitch allows Mario to quickly move to arbitrary locations in the map provided he has PU speed. The team realized the OJ could be useful in other sections of the game, including Bowser in the Fire Sea. Tyler and MKDasher engineered a substantial improvement for BitFS using an OJ, which saved about 2 seconds.

The team began this TAS in earnest shortly after 0 Star VCUtM entry was accomplished. The 10 Star placeholder route was to be completely transformed. SilentSlayers, MKDasher and sonicpacker optimized the movement up until the lake BLJ, and Tyler Kehne improved the zigzagging route to the jump spot after the BLJ. MKDasher TASed the entire VCUtM section up until the elevator BLJ. By getting to the elevator faster, he was able to completely eliminate the PBLJ that had been in the 10 Star Moat Door Skip Demo. He also did the long basement section up until the wooden door (with some help from SilentSlayers). Gahne D chipped in with a creative method to quickly set up the basement SBLJ from the wooden door with a jumpkick, which MKDasher put the finishing touches on. Most of these improvements individually saved several seconds over what was shown in the 10 Star route. Tyler performed the VCUtM->Moat Door PU route, and reconfigured it so that Mario could surface at the nearby waterfall, instead of swimming all the way across the moat, which saved an enormous amount of time (double digit seconds). He also found a way to eliminate the fall time and instantly void out after the VCUtM BLJ, saving another couple seconds.

When this TAS was nearly complete, Tyler found a new, simpler way to setup the 0 Star VCUtM entry that starts on the near side of the lake, and BLJs directly toward the jump spot. This ended up being about 4 seconds faster than the previous method, and it is the strat that appears in this TAS. It was a bittersweet accomplishment, because the team had put an enormous amount of time and effort into the previous route, and had been very satisfied with it. Nevertheless, sonicpacker stepped up and produced a well-tuned BLJ setup, and together with Tyler Kehne optimized the BLJ.

While the new sections in the TAS were primarily a Western effort, TASers from Japan had some significant contributions as well. Since the last published any% TAS from several years agolink, snark, Eru and ToT have made gradual optimizations throughout the run, which led to a couple seconds cumulative improvement. They also engineered a BLJ in Bowser 2 that allows Mario to grab Bowser's tail much closer to the mine, which they optimized perfectly. While the sections prior to BitFS have been obsoleted, their work on B2, the Castle 1st Floor, Bowser in the Sky, and Bowser 3 remains, and they were gracious enough to provide the team with their input files. Some of their sections the team hexed in directly (part of B2 and B3), the rest they redid with different camera and/or equivalent movement. In the process, Tyler and sonicpacker were actually able to save a frame on the second floor by fully optimizing the angle and speed of the clock punch.

It is important to note that this run was optimized for performance on original N64 consoles, not for emulators. The parallel universe glitch has a tendency to crash on console if certain precautions aren’t taken, and those precautions use up a small amount time that could be avoided on emulators. However, the team doesn’t believe that emulation inaccuracies should be taken advantage of, since the ultimate purpose of a TAS is always to play back on console. They also tried to reduce console lag where they could, even if the lag wasn’t present in emulation. In fact, they had to scrap an excellent viewing angle for the BitFS BLJ because it introduced additional lag on console.

This TAS is performed using the Japanese version of Super Mario 64. Peach doesn’t talk in the intro, which makes it slightly shorter than the US version. 0 Star runs were performed on the US version because U catches up to J during the text in the 3 Bowser fights, and slightly outperforms it in lag (Although, MKDasher has recently questioned whether it actually outperforms on console). But since 1 Key skips Bowser 1, J has a clear advantage, although it is only a fraction of a second. No J-exclusive glitches or exploits are used in this TAS, and the movement on both games is exactly the same.

Our final time as measured by Mupen is 4:21.67 (15700 VIs). However, as stated before, the console time is more important to the team. However, when exactly console timing should start is up for debate. Ideally, it should start the moment the game starts to run, i.e. when the bootstrap code starts to execute after the console has DMA’d it to RAM. That’s difficult to measure, but the most consistent marker is console when the controller syncs with the console. Using that as the start, the run takes 4:22.60 (15756 VIs). However, it’s not currently known exactly when controller pairing takes place, although it is done by the game software and not the console. If it is found that controller sync happens after the first frame, then some time will need to be added. Also, the TASbot MKDasher constructed is, according to him, not perfect, and loses a few VIs due to delayed input. Still, the timing is pretty consistent, and his TASbot’s best performance was recorded.

No matter what timing method you use, this is the largest single improvement to SM64 any% since December 7, 2007, when Swordless Link and Mitjitsu released the first “0 Star” TAS. It may very well be the last major improvement to this category, although with this game, you can never be sure.

Improvement Table

Section Frames Saved Frames Saved Overall
Lakitu Skip null null
BLJ to BitDW doesn't exist
BitDW hah lol
B1 no more dance rng
BLJ to Basement not comparable
DDD Skip different strategy
BitFS 85 85
B2 11 96
To 2nd Key Door 1 97
Spiral Stairs to BitS 1 98
BitS 42 140
B3 2 142

Important SM64 Physics Concepts


Glitches Used in This Run


History of the Moat Door Skip


Specifics of the Parallel Universe and Overflow Jump Glitches


Section Analysis


Pipe to Sandbar BLJ (Castle Grounds)

0 Star VCutM Entry/Sandbar BLJ

Vanish Cap Under the Moat

Moat Door Skip


Bowser in the Fire Sea

Bowser 2

Castle 1st Floor to Key Door

Second Floor

For a long time, this strategy for this section has been unchanged:

-Perform a glitchy ledge grab to get up the spiral staircase -SBLJ on the stairs -Wall hug across from the 50 star door staircase (by WDW painting) and reflect speed with C^ -Use 50 Star Door rejection text to store speed -Convert stored speed by using the Z+C^->Punch method, known here as "Clock Punch" because it aims for a corner of the clock, so that Mario doesn't punch a wall and lose his speed.

The method is fast, but not quite ideal, because punching takes 16 frames, and the 50 Star Door text is longer than some other text sources in the map. A couple years ago, snark tried to get a Z+C^->Jumpkick method working, but couldn't get one that was faster than Clock Punch. Tyler Kehne and SilentSlayers also discussed and tested using the sign near the 50 Star stairs to convert speed, but that proved fruitless. So it didn't seem likely that this section would see any improvement. However, when sonicpacker was redoing the section, he had trouble replicating the most optimal Clock Punch that had been used in previous runs. Tyler also tried to replicate it with no success. Frustrated, Tyler decided to fully understand what made the Clock Punch work, in order to know what would make it unsuccessful. As it turns out, the mechanism is quite interesting.

The first important thing to understand is what conditions will cause Mario to punch a wall. When Mario punches a wall, he recoils backwards and loses his forwards speed, so it is necessary to avoid this. Surprisingly, the wall punch check isn't explicitly based on Mario's angle relative to the wall. Instead, the game performs a wall collision test on the point 50 units directly in front of Mario. If this point is within 5 units of a wall, then Mario's punch connects with it. The diagram below visualizes this. You can see that Mario's angle is implicitly involved, but the limit for how closely Mario must face the wall depends on the distance.

So, by considering the geometry, you can see that in order for a wall punch to occur, cos(Δθ) ≥ (d - 5) / 50, where Δθ is the difference between Mario's facing angle and the angle normal (i.e. perpindicular) towards the wall, and d is Mario's distance from the wall. A few things to note from this:

-You can calculate a critical distance D if you know Δθ, i.e. D = 5 + 50 * cos(Δθ). If Mario is within D then a wall punch will occur. -If Mario is more than 55 units away from the wall, the punch won't connect no matter what Δθ is. -If Mario is right up against the wall, i.e. d = 50, the critical angle Δθ = 4704 (2-byte angle, eq. to .451 rad or 25.8°) -It's possible to punch the wall even when Mario is behind it, if you space it correctly. Mario won't clip through it and will recoil back as normal.

The second thing to be aware of is when the wall punch test occurs relative to the other important Mario physics calculations. It happens as follows:

1. Initial Collision Test

  • Includes wall collision test, at Mario's position, that will push him 50 units away from any nearby walls.
2. Wall Punch Test

3. Quarterframe Movement

  • Mario's main movement calculations happen here. His speed is divided into 4 and applied in 4 "quarterframes".
  • Each qframe includes a wall collision test, but not at Mario's position. It tests the position Mario is targeting, which is essentially Speed / 4 units in front of Mario.
  • If the target position is in a wall that is being tested for collision, the wall will push the target position outward to the edge of the wall (50 units away). Subsequent collision tests with other walls on that qframe will use the new target position, not the original one. So the priority of which walls are tested first can matter.

A successful Clock Punch looks like this:

Only 2 qframes are shown in Step 2 for simplicity. Note that Wall B has priority over Wall A. Mario's target position during qframe movement is deep into Wall A, and since Wall B is tested first, it doesn't register a collision. Wall A then pushes the target position outward. If Wall A had priority, then Wall B would be able to push the target position away as well, which would prevent Mario from being able to move so far to the right of Wall A, and would make the clock punch impossible without a much larger Δθ.

Once the mechanism was understood, it was possible to calculate the actual limitations of the clock punch. In order for Wall B to push Mario past the critical distance for the Wall A wall punch, Mario has to move far enough to the right in Step 2. Well, how far is far enough? Let's look at the math:

To calculate how Wall B collision affects Mario's position in step 3, do the following:
Distance from wall: d = n ⋅ p + negdot
If d < 50, p += (50 - d) * n

-n is the unit normal vector of the wall. It points perpendicularly away from the wall and has a length of 1.
-p is Mario's position
-"⋅" indicates a dot product
-"negdot" is a property of the collision triangle (walls, floors, ceilings) data structure. It is equal to the negative of the dot product of n and any point on the wall. It represents the distance of the wall from the origin.

Wall A happens to align with the Z axis, which makes the calculation a bit easier, since we only have to consider Mario's how Mario's Z position is affected by Step 3.
Wall A stats:

  • X = 7130
  • 50 units in front: X = 7080
Wall B stats:
  • n (X,Z) = (0.8996062875, -0.4367020726)
  • negdot = 3113.685791

Putting it all together:
pNew = p + (50 - d) * n = p + (50 - n ⋅ p - negdot) * n
zNew = z + (50 - n_x * x - n_z * z - negdot) * n_z
zNew = 7080 + (50 - 0.8996062875 * x - -0.4367020726 * 7080 - 3113.685791) * -0.4367020726
zNew = 7067.7 + 0.39286 * x

Remember, to avoid the wall punch, Mario must be farther than the critical distance away from Wall A.
7130 - zNew ≥ 5 + 50 * cos(Δθ)

So we can now calculate the x position Mario needs to reach in Step 2, based on Mario's facing angle:
7130 - (7067.7 + 0.39286 * xMin) ≥ 5 + 50 * cos(Δθ)
xMin ≤ 145.853 - 127.272 * cos(Δθ)

But we're not done yet. In order to calculate Mario's minimum speed, we need to know how Step 3 affects Mario's x position when Mario reaches xMin. The difference of those coordinates determines what Mario's minimum X speed, and therefore his H speed, must be on subsequent frames.
xNew ≥ xMin + (50 - n_x * xMin - n_z * z - negdot) * n_x
xNew ≥ xMin + (50 - 0.8996062875 * xMin - -0.4367020726 * 7080 - 3113.685791) * 0.8996062875
xNew ≥ 25.3373 + 0.190709 * xMin
xSpdMin = xNew - xMin = 25.3373 + 0.190709 * xMin - xMin = 25.3373 - 0.809291 * xMin
xSpdMin = 25.3373 - 0.809291 * (145.853 - 127.272 * cos(Δθ
xSpdMin = 103 * (cos(Δθ) - 0.9)
hSpdMin = xSpdMin / sin(Δθ) (Note that -θ and Δθ are actually the same here, because Wall A is aligned with the Z axis, so Δθ = abs(θ - 0.
hSpdMin = 103 * (cos(Δθ) - 0.9) / sin(Δθ)

Bingo! We have the minimum H Speed. And the max H Speed is easy. It's just the minimum speed that will cause Mario to clip through wall A. To do that, he has to move 100 units in the Z direction on one qframe, or 400 per frame. So,
hSpdMax = 400 / cos(Δθ) (Since Δθ is small, this is barely going to be more than 400.)

Now, let's find out how low θ can go. A lower θ means you don't have to strain as much after the clock punch, so you can commit more input to gaining speed.

θ Minimum Hor. Speed Maximum Hor. Speed Possible?
-320 (previous best) 334.2003515 400.1883216 YES
-304 351.9466194 400.1699538 YES
-288 371.6562061 400.1525291 YES
-272 (used in the run) 393.67558 400.1360476 YES
-256 418.4378266 400.1205088 NO

The viability of -272 isn't obvious. At first glance it seems definitely possible, because the minimum speed is less than the negative speed. However, the clock punch occurs over about a dozen frames, and Mario loses 1 unit of speed each frame. Since he has to be at no more than 400.1360476 upon reaching the clock, his speed goes below 393.67558 well before the punch finishes. Luckily, Mario's punch can only connect with the wall for part of the animation, and when that "active" part of the animation runs out, Mario still has about 395 speed, assuming he reaches the clock at about 400 speed. So -272 is possible, but just barely, and it needs basically optimal speed to work. -256 is obviously impossible, and -288 is the best possible angle with the speed used in the previous strat (390.1311646 at clock corner).

There's one more important detail for getting Clock Punch to work. Even if the speed and angle are valid, it can still fail. Step 2 assumes that Mario gets 4 qframes worth of sideways movement along the wall. However, depending on Mario's positioning at the 50 Star Door, he may arrive at the clock corner and begin Step 2 with qframes to spare. In that case, he won't get the full 4qf of movement along the wall, and won't reach the target X position, causing a wall punch on the next frame. It's hard to predict whether this will happen beforehand, which is why it seems a little random. If it fails with valid speed and angle, you can mess with Mario's angle on the last frame to get a different position at the Star Door.

Getting close to 400 speed for the -272 angle was problematic. The previous route got to the 50 Star Door with -438.8380432 speed, but you lose too much speed when converting to forward speed with Z+C^->Punch to get to the clock corner with 400. In order to that reliably, you want about -458 speed at the star door. But the properties of decelerating in SM64 made this a challenge.

You can only decelerate rapidly a couple ways. Jumps and "pressed" jumpkicks cut 20% of Mario's H speed, and dust frames with input cut 2% of speed. "Held" jumpkicks don't cut Mario's speed. Dust frames without input don't cut Mario's speed either. Landing from a jump or either kind of jumpkick give Mario 3 dust frames, and for each one you have the option of making it "input" or "no input". So you can't just quickly decelerate to an arbitrary speed; you have to work with what the game lets you do. It's also worth noting that in order to cancel the C^ wallhug by the WDW painting, you have to do a pressed AB kick, so at least a 20% speed cut (with 3 option dust frames when you land on the stairs) is mandatory.

The previous route BLJ'd to about -710 speed, which is the best you can do without another BLJ. Based on this starting speed, it's actually impossible to decelerate to anything close to -458 without losing frames. The route takes 1 extra jump (20% speed cut) and arrives at the star door with -438 speed. If you don't take the extra jump, the next highest deceleration would be a held jumpkick and 6 dust frames (6 2% cuts = 11.4% speed cut), but that leaves Mario with way too much speed (about -485). So you need a different initial speed.

after a lot of testing with different BLJ speeds, Tyler realized that, counterintuitively, BLJ'ing to a significantly less speed didn't have much of an effect on Mario's ascent to the 50 Star Door. You would think that it would take longer, but a couple factors limit how fast Mario can perform the ascent. The time it takes to wall hug by the WDW painting is mostly limited by the time it takes Mario to turn around to an appropriate angle, which speed doesn't affect. Mario's movement up the stairs is also capped at about 410 h units/frame because the wall hitboxes on the steps push him away. So Mario can have less speed and still arrive at the Star Door at the same frame.

Taking advantage of this, Tyler tried a starting speed of about -670, and did a held jumpkick with 5 total input dust frames (9.6% extra speed cut). This led to a 50 Star Door speed of -457.9104309, and a clock corner speed of 399.3190613 (improvable by <1 unit). Not only is this 9 units better than before, but it lets the optimal -272 angle be accomplished!

After Tyler finshed that, it was up to sonicpacker to finish the job, which he did nicely. With the improved angle, he was able to complete the ascent to BitS with much less straining to counteract horizontal drift, which allowed for an even more improved speed. That little bit of extra speed adds up, and lets Mario reach the BitS warp 1 frame earlier. That's right, all of that work led to one magnificent frame being saved, which the vast majority of people didn't notice. But it was worth it, and now the clock punch is unimprovable.


Bowser 3

Console Accommodations

There are a number of precautions you have to take when using the PU glitch, to ensure that it does not crash on console: -Use of Fixed Camera: The camera must stay in the real map at all times, or the game will crash. This can be done by switching the function of the R button from “Mario” cam (close up) to “Fixed”, which can be done in the Pause menu when Mario is at rest in a level. When fixed cam mode is set, you can activate by pressing and holding R. There are some places in the game where the camera will get stuck in an area even without fixed cam (see the 2nd floor in this run), but these are uncommon. Switching to fixed cam takes 6f. -Avoiding collisions while running through PUs: If Mario runs into a wall, a ceiling, or OOB, the game will crash. This makes running into the moat door difficult, but there are two ways to avoid this. One is to only use QPU (quadruple parallel universe, i.e. every 4th PU) movement. This has the disadvantage of eliminating 93.75% of possible routes. The other is to use no-input running to return to the real map, which comes with a frame penalty. With perfect precision, this penalty can be eliminated to 1f, which is what was done in this TAS. -Having too much speed: This wasn’t a factor in the TAS, but it is good to know.

For most of the development of this TAS, the reason for these crashes were unknown, and Tyler had to find the workarounds by experimentation (with help from MKDasher, Kyman, Soulcal and other console testers). However, Peter Fedak did some excellent detective work, involving an ingenious setup with an EverDrive that exported crash data to a save file. He found that the ultimate cause of all PU crashes comes from the float-to-integer conversion instructions, usually the TRUNC family. If the float that is being converted is outside the range of a 4-byte integer, the console will throw an Invalid Operation Exception and the game will crash. On Intel processors however, conversions of that nature will simply overflow the value, so all emulators implement these instructions incorrectly. The contexts where TRUNC is invoked can vary a lot, but it is the ultimate culprit in every PU crash we have found.

Aesthetic Choices


Special Thanks

Nothing693 for helping with the first Bowser fight.

Suggested Screenshot


We all hope you enjoy this massive improvement!

Mothrayas: Judging!

Mothrayas: Delaying, pending a minor improvement in the works.

Mothrayas: Updated movie file with an 8-frame improvement.

Mothrayas: After roughly a decade since it originally was theorized, the moat skip has finally been realized in a Super Mario 64 TAS, and the end result looks amazing. The implementation of the skip is a nice change to the any% route, and allows for showing off some nice tricks, including of course the newer PU strategy that allows the moat skip in the first place.

On the whole, even compared to the already high standard of Super Mario 64 publications, this run appears more technical and more polished than ever. Accepting to Stars as an improvement to the published movie.

Fog: Processing.

Mothrayas: Delaying again pending another improvement.

Mothrayas: Updated movie file with a 3-frame improvement, and resetting to accepted.

Similar submissions (by title and categories where applicable):