Submission #7625: r57shell & Archanfel's NES Tetris "fastest 999999" in 02:53.13

(Link to video)
Console Nintendo Entertainment System Emulator FCEUX-2.2.3
Game Version USA Frame Count 10405
ROM Filename Tetris (U) [!].nes Frame Rate 60.0988138974405
Branch fastest 999999 Rerecord Count 999999
PowerOn Authors r57shell, Archanfel
Game Tetris
Submitted by Archanfel on 7/31/2022 5:54:41 PM

Submission Comments
This is improvement of 1121 frames over the Acmlm's currently published run of which 596 frames were saved during level 18 and also 525 frames were saved after level 19 speed transition.

Game objectives

  • Emulator used: FCEUX-2.2.3
  • Fastest 999999 score
  • Completes the game without any blocks left
  • Aims for shortest input time
  • Heavy luck manipulation
  • Genre: Puzzle


Alternative starting point

It is possible to take tiny advantage of using different starting points by waiting a few frames in menu. Including I-teromino gives starting height of 7 blocks while two normal terominoes gives only height of 6 blocks. This one additional block of height advantage extends the packing possibilities for Tree#1.

Chessboard build

Instead of making one long well at the central hole we divided it into four small knees. It enormously extends packing variability for level 18. Besides during Tree#2 building and also in the very end of the game it allows to burn faster because I-tetrominoes have to travel shorter distance to the ground.

Tetromino count control

During packing it is possible to place some blocks above the screen where they will be destroyed. So there is a degree of freedom in the number of tetrominoes is used.
X-mode - Extreme packing. Economical utilization of tetrominoes to reach the densest packaging ever possible. N-Mode - Normal packing. Utilization of tetrominoes is not so strict. In this mode 4 blocks can be destroyed above the screen. Four destroyed blocks is equivalent to a whole tetromino. For example one knee in N-Mode require 40 tetrominoes to build while X-Mode require only 39 of them. Also it is possible to build a tree with different amount of tetrominoes as well.

Plans within Plans within Plans

Primary optimization

At first we generated packs of solutions that allow to build glitched Tetris with triples at the top.

Secondary optimization

Then we organized these packs in groups that allow to form one whole knee.

Tertiary optimization

Then checked all sequences for building four knees in a row.

Quaternary optimization

And finally solving whole level 18 with different number of tetraminoes in the first and in the second parts.
Path for solving level 18 in this TAS looks like this:
Or slightly more exact:
Or even more exact:
H07_101 TREE_1_Y_ShiftM_102A 102Y_07A 07_10B T06_N T06_N T06_3X2 08_06B 10_10C 08_10A 08_10B 02_10A 02_04AA 07_04A 07_04B 04_04C
ERZ1_04T ERZ2_10Z ERZ3_06S ERZ4_10I-ZN TREE_2_TRUE_ZN_05_10A 05_04F10 04_02AA 02_02C 02_02C 02_02C 10_10C 10_10C 10_10C 10_10C 6780X 6780Q-80 10_06E08 08_06B

Broken knee

Last knee before Level 19 transition is divided in two parts. This trick is done to take two extra block of height advantage so during solving each tetris after transition, since it has two blocks shorter fall distance.

Tree#3 skip

Tree#3 is skipped completely. Instead we delay burning of four remaining Tetris until very end of the game. Additional benefit of skipping Tree #3 is the possibility to continue building immediately after transition to Level 19.

Post-transition play

At high speed upper left corner is no longer reachable. So we need to play honestly. It is a lot easier. With a little help of combinatorics and graph theory we were able to bruteforce through all possible intermediate incomplete states between tetrises and solve post-transition gameplay as a whole.
Path for solving level 19 in this TAS looks like this:
SHAPE_000_102 SHAPE_102_101 SHAPE_101_251 SHAPE_251_213 SHAPE_213_104 SHAPE_104_239 SHAPE_239_END1_08 END2_10 END3_02 END4_10


For the perfect end we need to get four I-tetromino pieces in a row. It is possible only under special conditions. Spawn counter, value of RAM address 0x1A must be divisible by 8 without a remainder. For our case it is 112/8=14. That's why during solving level 18 it was absolutely necessary to use tetromino count control.

End input earlier

Shortest input time is the most logical universal goal for all TAS. Tetris should not be an exception of this rule. In our movie end of input is done as soon as last I-tetromino rotated and prepared for free fall in desired place. Last tetris and reaching final maxed out 999999 score is done without any additional input.

Technical part

Local terms

a piece in the game Tetris consisting of four cells.
clearing 4 lines at once.
Occupied cell
cell which is not empty because it was occupied previously.
To lock tetromino
to make tetromino locked.
Locked tetromino
tetromino which is no longer movable by player.
Placing (placement)
a process or a sequence of key presses to lock tetromino in desired place.
Other local terms will be defined along the way.


3-line tetris

When three top lines of screen is cleared, due to an error within the code the game perceives this move as clearing four lines. And gives score as if tetris was made. Therefore let's call it 3-line tetris. Also all occupied cells are moved down by one cell. Actually clearing the line is just a copy other cells with an overwrite.
The first line is checked as cleared, then it mistakenly shifts all the cells down. Since the lines are moved instead of overwrite and since cleared lines are counted interleaving with clearing the lines, remaining moved 3 full lines are checked as cleared as well, but copy with overwrite works as intended for them so they are cleared correctly. For more info look for invisible blocks section.

DLR trick

Single frame of the game is separated into multiple sub-states. Pressing simultaneously keys Down+Left+Right in some cases stops the execution in the middle of whole the frame-cycle due to a bug within the game. On the next frame the process is continued from the place where it was stopped, so everything continue as intended. This allows us to intentionally insert something that looks like a lag-frame.

Pause trick

Pressing pause allows to release key during the pause. It require additional frames in the pause but allows to move the tetromino piece faster within the game mechanics.

LR trick

If any of Left or Right keys is just-pressed then the game thinks you just-pressed some sideways key, but which side move is performed is determined by held keys. In case LR pressed, both keys are held, and first checked key is Right, and move to the right is performed. Therefore sequence of " ", "R", "LR" does two shifts to the right within 3 frames, while normally you would have " ", "R" with one shift within 2 frames.

Fast drop

Two things in the game makes tetromino fall:
  • automatical fall according to fall speed
  • holding Down key
Only one of them may happen during a frame. Falling using Down key is ignored when automatical fall happens. Therefore making sure all falls triggered by holding down key happen between automatical fall results in fast drop.

Tricks summary

LR trick and Pause trick are used to move piece in places which are not reachable without those tricks. DLR trick is used to reduce time in pause while awaiting new tetromino pieces. Those waits required for manipulation of getting desired next tetromino pieces. Fast drop just faster. 3-line tetris trick is used for achieving maximum score faster. Remaining 4-th line should not be considered as a copy, because it was actually filled by pieces. So, the only bonus we get from 3-line tetris is additional score for cells we haven't filled.

Invisible blocks

Invisible block
an occupied cell above the screen, so naturally it's invisible.
How is it possible? Well, the game field naturally have 20 * 10 = 200 cells. But it is stored in free 256 bytes. Each cell corresponds to a single byte. In normal circumstances of gameplay only first 200 cells are used. But 3-line tetris trick changes everything. This bug happens because for each filled line game moves everything before it one line down. And for the first line from the top its first cell index is zero. So the first moved cell index should be -10, but the number is stored in a single byte as an unsigned number, so its value wraps around into value 246. So -10 is 246. And it eventually moves all bytes within range [0, 246] to [10, 256]. After several times, bottom occupied cells reach bytes [246-255] and during movement of current piece and during checks of cells above the screen, indices of cells also using unsigned numbers and cells above the screen are negative, so corresponding indices within range [246-255] are checked, if those are occupied, it results into phenomenon called invisible blocks.

Packings generator (C++)

a scheme of cutting of some part of the game field into pieces (tetrominos). Packing consists of borders between two different pieces (tetrominos), or cells already occupied.
Packing may have different representations. Here is an example of a packing as a picture:
Here is a text representation of the same packing:
In text representation different letters corresponds to different tetrominos, and dots represent empty cells. In other words, text is enough to represent a packing, no picture is required.
As an input for packing generator it receive the state of the field in text form, and ending state of the field in text form. Then it enumerates all possible packings. Among possible packings it also filters out impossible.
Impossible packing
a packing without solutions
a sequence of placements leading to desired packing
Also, for each possible solution of single packing it makes a solver.
a representation of all possible solutions for desired packing in convenient format.
Format which was chosen is a graph (more technically speaking: an automaton). Vertices of automaton are states of part of the game field, and edges are placements with required piece and key presses (placement of the piece).
After all solvers for each packing is made, packing generator then unifies them to make unified solver.
Unified solver
a solver which represent all possible solutions for some set of packings as a single solver
Those solvers were saved in human readable text format, friendly to load from Lua scripts. Main benefit was: easier to debug when you can read or edit solver from simplest tool: any text editor.
Same tool is able to perform next step: solver optimization. Why is it required? Well, many placements are suitable to find out all possible packings. But we want fastest speed, so we need to find placements with shortest input. And those require to use Down presses, which increase work load on the generator. Thus, during stages of finding all packings and making solvers and unification Down presses wasn't taken into account to reduce the load. And because of this, optimization was required as an additional step.


First implementation was making all possible Left parts and all possible Right parts then during iteration over all Right parts it was making complementary Left part pattern and looking for all Left parts with the same pattern among generated Left parts. This approach is called meet-in-the-middle. Later on, technique from dynamic-programming was used to enumerate all possible packings without repetitions in very efficient manner. Because it is guaranteed no duplicates it didn't require storage of unique packings found so far, so it required much less memory and checks for duplicates wasn't required anymore.
  • Cache all results for queries about placements. It speed up the process drastically because same queries about possible placements happens a lot because different packings have a lot in common. Also, finding solutions is most expensive step in the process.
  • When finding possible solutions for fixed packing do the following. For each pair of tetromino pieces answer on question: is it possible to place first tetromino after second? If the second is not possible to place after the first, then the first should be placed before the second. Use this information to skip futile ways to solve.
  • It was shown that the only pieces possible to finish 3-line tetris is L, J, I in four possible positions in total. Thus, one tetromino in one of those four configurations must be present in the packing.

Stand-alone brute force program (C++)

This tool as an input receive:
  • the current state of a part of the game field
  • current state of invisible blocks
  • sequence of solvers to perform (apply one on top of other)
  • max number of frames to wait (while manipulating new next tetromino)
  • the number of best candidates to keep
  • the number of candidates to try from the given input. this is setting for ability to feed more inputs than will be considered.
  • multiple inputs, each of them consisting of:
    • current state of important RAM variables: RNG, frame counter, spawn counter, previous piece...
    • previous keys up to state which is defined
    • placement for next tetromino (because we manipulate new next)
    • meta - any additional text which the user want to carry along with the candidate. In the end we stored here sequence of solvers and current state of the field.
Initial version was performing only single solver, and just keeping number of best candidates. Later on, sequence of solvers was implemented. For single solver all possible ways were enumerated. Common number of solutions within single solver is several millions, but some of solvers had hundreds of millions. Among all results only few best candidates were selected and considered as inputs for the next solver. The number of candidates to keep is configured by user.
There are many possible ways to get shortest input, they are easy overflowing any reasonable number of candidates to keep. So, if you set to keep 100 best candidates, it is fairly easy to make 100 of very similar inputs, so to find out best way after few phases it was required to set number of candidates to keep quite high. This was a problem. The solution is: we need to make code which able to detect equivalent inputs.
Equivalent inputs
Two inputs are equivalent if any other additional input added to both of them leads to same result. It's only applied to some fixed initial state after which those inputs are applied. If initial states are different then equivalent input is meaningless term.
Functionality of code responsible for detection of equivalent inputs was fixed multiple times up to very recent version of stand-alone brute. In the end it was a class which would hold pool of inputs with methods to insert input, and clear pool. Insert is doing nothing if we try to insert duplicate or worse input.
Later, main optimization was to store pool for each corresponding state of solver. It increased speed a lot.
One very important thing is handling invisible blocks. They block placements. And the main idea of optimized unified solver was to have premade placements. If some placement is blocked, it doesn't mean that it's impossible to place tetromino in desired position. Probably there still exists placement which is slower but it's still an option. As first solution to address this problem was just to filter out placements which are not suitable for current situation. Another feasible approach would be to look for the best placement in current circumstances. But it requires time of program to enumerate placements. It was postponed.
Later on it was discovered that sometimes shortest input for placement is not always optimal. Sometimes it's reasonable to perform placement one frame longer. One of ways to solve this issue was to generate all solvers with multiple possible placements, not only shortest. In other words, to perform new version of optimization. Fortunately, better way was found. Database of placements was made, with property that it tells what placements it stores if they exists, so it's easy to check: if placement we require is supposed to be in the database, then it should be in the database. If there is no placement we looking for then it doesn't exists. If placement is not supposed to be in the database then either look for placement yourself, or skip it.
So new configuration was added: the database of placements, and policy for placements which shouldn't be in the database: look for them or skip. With this change, all solvers from now on were able to choose best placements from the database.

Placement database generator (C++)

Database is stored in SQLite using official SQLite library. It saves all placements which ever might be used with the limit on maximum desired vertical position of tetromino. Along each placement stored inputs required and set of cells which should be empty. Then user of the database can find all placements for desired tetromino type and orientation and pick the most suitable among them.

Implementation of NES Tetris game logic (C++)

Parts of the game implemented in C++ and tested very carefully. A large number of outputs were compared using Lua from emulator with output from my C++ implementation to make complete match. Those outputs were produced from previous TAS, play by hands, other recordings.
From resulting C++ implementation by trial and error was narrowed down variables which is involved in delays mysteries. In addition to frame counter it turned out to be tetromino vertical position (at address 0x41) and thing which is at address 0x49 named vramRow by meatfighter involved. Then, complete automaton of states with those variables involved was made. Using automaton minification algorithm this automaton was reduced similar states. And confirmed following hypothesis regarding delays:

Tetromino vertical position

  • 0, 1, 2, 3, 4, 5 - equivalent
  • 6, 7, 8, 9 - equivalent
  • 10, 11, 12, 13 - equivalent
  • 18, 19 - equivalent


  • 0, 1, 2, 3 - equivalent
  • 4, 5, 6, 7 - equivalent
  • 8, 9, 10, 11 - equivalent
  • 12, 13, 14, 15 - equivalent
  • 16, 17, 18, 19 - equivalent
Using this knowledge, for each possible combination, ways to make any delay was found and info about its generation was hardcoded into generator of solvers. After this work was done, delay mystery was completely solved (for game A-TYPE).

Tetris lists (Python)

It is script which allows to run multiple stand-alone brute force tasks. As an input it receive list of sequences of solvers. This is basically just a way to try many different sequences of solvers for the same set of inputs. Each solver was given short name, so sequence of solvers is just few short names. Settings for this script is same as settings of stand-alone brute force just with additional list of sequences to try.
Small optimization was made: if there was sequence earlier which starts with same prefix, then take the results of those steps as is, and continue from the step where solvers differ.

Tetris graph (Python)

Evolution of Tetris lists. Instead of a list, it receive a graph of transitions. It allows to reuse outputs more efficiently than optimization with skipping common prefix.

list into graph (Python)

Script which automatically convert list of sequences into corresponding graph

graph to list (Python)

Script to verify that graph was generated correctly

tetris debug (Lua)

Script for FCEUX to display invisible blocks and other handy information

tetris check results (Lua)

Script for FCEUX to avoid copy pasting inputs by hands. It allows to visually check the results. Here is an example video

Additional notes

  • It was r57shell who wrote all those tools and was responsible for all technical aspects.
  • It was Archanfel who planned all strategies.
  • It took us about 2 years to complete this project.

Special thanks to Alexey Leonidovich Pajitnov

Last Edited by r57shell 2 days ago
Page History Latest diff