#4947: dwangoAC, Ilari & p4plus2's SGB Pokémon: Red Version "Pokémon Plays Twitch" in 08:11.42

This total control run of Pokemon Red played on a Super Game Boy by TASBot was presented at Awesome Games Done Quick 2015 and is widely known as "Pokemon Plays Twitch", presented here in a more illustrative way by adding extra button presses after the payload completes thanks to a Lua script from Ilari. If you are unfamiliar with Pokemon Plays Twitch, I recommend starting with this Ars Technica article that describes the event and what we were able to accomplish, as well as the videos showing the presentations we did in front of 100,000+ live viewers (archive.org or on YouTube for part 1 and part 2). As noted below, the submission text that would normally be found here can instead be found in issue 0x10 of the PoC - please consider the article to be the detailed technical description of this run.

Authors

While many, many people contributed in various ways, the movie file itself contains input from three people - Ilari, p4plus2 and dwangoAC. Before I (dwangoAC) go any further I must credit Ilari for all of his hard work, without which this whole project would never have been possible. Credit also goes to p4plus2 for his amazing payload. Masterjun contributed the item swap order converted over from Pokemon Yellow but none of his input ended up in the final run; his contributions are no less notable, however, so I've placed credit for his work in this section. See the end of these notes for the full list of credits.

Accomplishments

Successfully completes a Super Game Boy enabled game
Hijacks said SGB and deliberately crashes part of it
Transfers data at 3,840 bytes a second
Plays back the full screenplay presented at AGDQ 2015
Explains things better than this text can
Allows Pokemon to Play Twitch chat
Shows how TASBot wins the internet

Teaser poster from the article by Ange Albertini

Full size image: http://acbit.net/static/tas/PokemonPlaysTwitchPosterByAngeAlbertini.png

Comments

Usually here at TASVideos, authors use the submission notes to explain what's going on in the game. In this case, the run is going to describe *itself* at a datarate of 3.8k per second (in reality thanks to 5-bit lowercase letter encoding we manage to eke out more characters per second than we would otherwise, but I'm getting ahead of myself - see the description for full details on this and much more). In brief, this run starts with the full screenplay as shown in the second presentation from AGDQ 2015, followed by a rapidfire presentation of all printable characters, emoticons, and emoji, and finally a full description of how everything was accomplished.

To properly read the explanation by watching the movie you'll need to frame advance or pause and unpause frequently. While novel, this is perhaps not the best method to consume this information in part because things like B) are interpreted as emoticons and the script strips characters like < and >, but this is the full contents of an article that can be found in issue 10 of the International Proof of Concept or GTFO Journal which was distributed in print form to attendees at ShmooCon 2016.

The text in the article was written by Ilari, p4plus2 and I. The article itself references this submission by URL / submission ID and the article had to be completed early enough to allow the article to be sent to the printing press to be distributed at ShmooCon, meaning there was an intermediate time where this submission existed without proper documentation about it. During that intermediate period, FractalFusion took it upon himself to transcribe the entire article text from the movie which amused all of us greatly.

Also, a hardware verification encode is available at: https://youtu.be/NTzrbhCTEhw

Movie file and emulator notes

The original submission file was pretty evil; I uploaded it with the wrong gametype to get past the not-yet-complete (at the time) lsmv file check that cannot currently understand SGB games. Mothrayas was able to correct the problem and replaced the submission file with an updated version that can now be considered an intermediate file. The *actual* movie file that should be associated with this is issue 10 (simply rename the .pdf to .lsmv) but at 55 MB it is too large to submit given current site restrictions. To play the movie, the Gambatte core.so library must first be loaded via File -> Load -> Load shared object / Load linked library. This has been confirmed to work in both Linux and Windows with the lsnes build "Beta - rr2-β23 with SGB Core" (Windows build).

Full Credits (using the same format as the run)

dwangoAC: Main project organizer
and presenter, primary tester,
Stage 0/1 movie files, PR
Ilari: Emulator coder (lsnes),
developer of Stages 3-5, payload
tester, game mechanic researcher
p4plus2: Primary payload author,
encoding scheme creator, SNES
expert, general stage 3-x help
Masterjun: Stage 0/1 original
idea and research, SNES advice
micro500: Wiring harness build,
Python IRC to bot streaming,
poll speed firmware modification
true: Creator of NES/SNES Replay
device, reset handling updates
TheAxeMan: Python script support
ais523: Data encoding assistance
Nach: SGB docs, site defacement
Tompa: On-site hardware support
padz: Pokemon Red research
Vulajin: Camera serial interface
twm: Initial IRC Python script
Full article in next PoC||GTFO!

Noxxa: Updated submission file.

Nach: This run is quite a technical marvel. I want to stress that a ton of effort went into making this run itself, from improving emulators, exploiting mistakes in a game, working around limitations, using overlooked system features, writing new code in two different languages, to displaying a brand new program in an extremely creative and unexpected way. Despite what this run does however, it must be judged on its merits of itself, not the merits of what the same technology did for a demonstration at a popular gaming conference.

To start with, let me lay out any bias I may have due to my contributions before I get into specific analysis of this run itself. I was involved with the team to prepare for the aforementioned gaming conference. I got people together in an IRC channel to work together, I contributed some build scripts to make it easier for some collaborators to get software building, and I packaged up some binaries for various collaborators. I also did research on how to get an SGB to run SNES code from a DMG application which was necessary for this run. Lastly, I provided an interface to allow the aforementioned demonstration to deface the site. Due to my involvement, I am inclined and have incentive to want to see a run like this published. However, I am going to be focusing on what this run itself does and splitting that from the demonstration as a whole. While I assisted in the production, I do not believe my contributions are significant enough to bias any decisions I make, and we can agree that some of them like defacing the site doesn't actually occur with the run submitted here.

To begin analysis, one must realize what this run is not. This run does not deface any websites, it does not control any cameras, and it most certainly does not play Twitch. In fact, this run does no networking of any kind. As a Magician, I appreciate the intricacies of what this run portrays to the uneducated individual. However, in truth, this run is actually Pokémon Plays Notepad. It does nothing more with its payload than receive input which can be converted to text and displayed within a text-based output (emoticons and a brief screenshot aside). All the networking, parsing of IRC-based commands, controlling cameras, and everything else happened outside this run on some other computer during a demonstration. Nothing within the SNES itself was responsible for any of it.

As far as networking goes, this did NOT bring the Super Nintendo or Gameboy onto the Internet, nor introduce them to networking. The novelty of supplying hardware used alongside these consoles to supply network input for another player (or all the players) is hardly anything new. Emulators brought Internet play to games on these consoles long ago by providing their own network stack for input. Further, the XBand Modem precedes the creation of this run by more than two decades, and was officially sanctioned hardware for remote networked play. Various Gameboy consoles (including the SGB2) are also capable of real duplex networking on their own using the gamelink port or infrared communication. Making use of those could actually allow a future Pokémon hack to achieve true duplex networking without having the magician's assistant operating the mechanics behind the scenes.

Feedback for this run at the demonstration was wildly enthusiastic and extremely positive. Much of that enthusiasm spilled over into the discussion thread for this. For what the run itself actually does though (filtering those who were fooled), I could find little positive feedback (although there was some). Contrasting to some of our other total control runs, positive feedback for what this run actually does is quite slim in comparison. Judging this run solely on its own merits makes it hard to consider it significantly entertaining. From the standpoint of a viewer with no prior experience or expectations, this run does nothing more than print out some text somehow and show some imagery.

This run must also be considered against another Pokémon total control run. The precedent has been till now to consider DMG and SGB interchangeable as seen by numerous obsoletion chains. We've also toyed with the idea of drastically different cross-system obsoletions, although I'm not certain if we have a hard precedent to cite regarding this or not. As far as games go, we've had Blue and Red constantly obsolete each other, and we've even had one game obsolete a completely different game, making it difficult for me to consider one Generation 1 Pokémon game different enough from another. These runs don’t even make it out of the first room to a point where the gameplay itself significantly diverges (if we were even were to consider these differences significantly divergent). Internally, Yellow is actually a later iteration of the engine with considerable improvements removing bugs and a different compilation resulting in memory layout differences (like most other revisions in classic games). However, what can be seen in the run is ultimately what matters.

I've toyed with the idea of publishing this taking into account that it runs SNES code to differentiate it from the existing Pokémon total control run. However, after much deliberation, I find the argument lacking. From the viewpoint of creating this run, it could have been created on any SNES game with critical exploits, such as Super Mario World, and would even have had less limitations, but was intentionally not selected because the creators had something else in mind. Pokémon was intentionally selected because of a previous phenomenon known as Twitch Plays Pokémon, where players used an IRC channel to collaborate to play through Pokémon Red (and later, other games). The creators wanted to turn this on its head, and allow the game to take revenge so to speak. This is actually the most crucial determinant in which platform this game is for.

The creators also wanted to show off something more technically challenging than just controlling a DMG game. They wanted to show mastery of the SGB and SNES from a DMG game. However, that latter point is not a technical novelty, as Space Invaders already did this back in 1994, and it's even documented how to do so in the Super Gameboy programming manual, making it an officially sanctioned and somewhat expected feature of the SGB. This component was actually one of the least challenging in creating this run.

As far as TASing goes, we like showing off different techniques, and making use of SNES code is assuredly one of them. However, I have to consider this as subservient to the original game. Therefore, the fact that it runs SNES code is an optional technique to all runs in general for the original game, such as whether the run decides to make use of Fly or a glitch to acquire Mew in a random battle. At best we can consider this a different type of total control run, but it is still in fact a total control run for a first generation Pokémon game with gameplay beginning on a DMG processor.

Based on the lackluster positive feedback, and that publication would require another run to obsolete this on entertainment values, I cannot in good faith accept this for publication. However, the final nail in the coffin is actually that this run is not complete. All our runs complete something in some sense. They reach a game endpoint, or reach a point in the game where there is no new material, or they reach the end of a payload with has some kind of endpoint. All our total control runs with the exception of the previous Pi Day run reach a noticeable endpoint of a final cut scene or a special The End screen. Pi Day itself actually has a natural endpoint of exactly 3:14.15 seconds into the run (as well as other endpoints if dragged on to add additional digits to π). The conclusion to the input to Notepad in this run is entirely arbitrary. Therefore its very mechanics provide no natural conclusion and no definition for what kind of modifications to this run would be acceptable to obsolete it or not. I have no alternative but to reject this.

Nach: This is a bit unorthodox, but due to the nature of this run, dwangoAC asked me to include some groundwork on the kind of discussion and site improvements that would need to occur to make a run like this acceptable. This really isn't the place to discuss this, so if people want to do so, please start a new thread with the following material. Feel free to copy what you want, using each of the following sections as needed in each of their respective threads.

We have a two-fold problem in our current methods in classifying and organizing published movies which currently prevent us from publishing this and other runs. There are various loopholes and rule hacks we could employ to try and shove some runs into the existing framework. However, I think it better we fully analyze the issues at hand and properly deal with them, allowing the site to become better as a whole, inviting new contributions.

The first problem we face is a lack of conscious understanding of how to differentiate different kinds of total control or execution of arbitrary code runs. The site offers a categorization tool known as movie classes. Logistically, this tool allows site users to find runs of a certain genre or make use of certain techniques. More importantly however, it provides a conscious objective understanding of something certain runs do or don't do compared to others. Many of these movies classes share a direct relationship with the various branches of runs that exist for a particular game. Regardless of what the name of the branch is, which sometimes has identical categories under different names for different games (such as the different >100% 100% completion of the Donkey Kong Country series), there is generally that implicit relationship we have not yet made explicit. We really need some kind of effort to map all kinds of existing branches to movie classes to better consciously understand what objective criteria separates various runs. We then need to see if there are any would-be movie classes missing that our branching would indicate we should have.

The above point is all the more true regarding our various total control or execution of arbitrary code runs. They are clearly different from each other, however we haven't done anything to differentiate between them, and they are all lumped together in a single generic understanding. This lumping, or lack of sub-categorization, prevents us from accepting a wide range of these runs for a single game, as the underlying goal/objective or key differentiating factor is the same for them all. I already raised this point in another discussion nearly two years ago, actually predicting this and other recent runs, and perhaps future runs that we'll be seeing. I believe it's time we restart this discussion, and find some way to categorize different total control runs so we can accept multiple of them for existing games, where we find objective grounds to differentiate between multiple (not to mention organize them better and allow each genre to be searched for individually).

The second problem here is our overall classification of game runs or lack thereof. We relatively recently introduced a concept to the site known as tiers. However, we as of yet failed to make good use of this system.

The site was built upon showing off Tool-Assisted Superplays. The idea behind the superplay is a run which shows off something which is extremely difficult or impossible for a mere human to do, and doing it again and again in an entertaining manner. A by-product of this superplay is completing a game as fast as the game allows, often faster than is humanly possible. This is the superplay that is also known as the Speedrun.

The site originally accepted movies based on entertainment, but also acknowledged movies which went above and beyond in their superplaying, and best represented the TAS genre, and labeled such runs with a Star. The length aspect of each run was used to obsolete longer runs, and thus also showed the site was in some way a way to keep track of fasted possible records for a game. However, the superplay non-speedrun known as the playaround looks more at entertainment factors and techniques shown off, and actually longer runs are usually used in this case to obsolete the shorter. Since we wanted to find a way to allow any kinds of runs that were superhuman in completing games, even when lacking entertainment, tiers was born.

We invented the Vault in order to store runs which were not entertaining, but were the fastest completion of games in a mainstream manner. At the same time we shifted the concept of a Star to a pseudo-tier. I say pseudo-tier, because the sentiment on the site regarding Stars is still the pre-Tier definition. As a tier, there really should be an objective divider between what runs do or do not do in order to tier things one way or another. However, with Stars, we are still primarily looking at things at a percentage level which in no way objectively views the accomplishments of each run on its own. There is a sort of middle ground by stating Stars could be for top two or three runs which best portray their genre/platform/franchise, which on the one hand objectively views runs as the best of technique wise, but still pays some attention to outside factors. However, with various genres under-represented due to not being as entertaining to most of the audience compared to other genres, this hasn't really been accomplished either.

Returning to the main problem, we are not really tier-ing our runs, as we lack strong tier boundaries between much of them. We do view Stars as a tier above Moons, and Moons as a tier above Vault, something which I have no intention of changing, but we don't make use of this system to properly and totally differentiate between different runs and classify them properly. Tiers are directly tied to criteria that a run must conform to. Due to the open ended nature of the total control and other kinds of runs, we really do need a different set of established criteria on when to accept runs, how to view obsoletions and so on.

Resistance to creating new tiers generally has two arguments. The first is that we can find ways to ~~hack~~alter the existing rules to find a way to shove in runs we wish to accept. This approach sort of works, but does reject many runs we'd like to publish somehow, muddles some of our rules, and also lacks acknowledgment that some runs are drastically different from each other. This first argument by some is also actually predicated on the second argument, that new tiers which by definition are not Moon nor Star must be unentertaining, or minimally, fail to acknowledge the entertainment factors in runs in these new tiers. However, this lack of acknowledgment can be easily overcome if new tiers are created in groups. Meaning, that if we decided we wanted tiers to cover runs which are technically impressive, using techniques like new code entered via controllers which is impossible for a human, we could simply make two of them. Both cover the technically impressive of this nature, but only one of them conveys that the run is also entertaining. The various lists on the site are then modified accordingly to add the new Grade-A tier here to the entertainment qualifying lists.

To be sure, we are seeing runs which take very different approaches to games than others. Overall, when we look at a run, we look at how the run behaves in the gameplay portion of it. We reject runs for sloppy gameplay, and obsolete runs when a newer run shows it performs the gameplay better. We tend to ignore what happens during the non-gameplay segments which change due to different versions of the game (switching between USA and Japan), or mistakes made in utilizing the in-game menus, not proceeding through them to the fullest (forgetting to exploit wrap-around or figuring out the menu system's controller poll mechanics). Yet at the same time, somewhat hypocritically, we do allow for runs to take advantage of these auxiliary features, such as exploiting buffer overflows in menus, or finding corruption bugs in the built-in save management features. A critical look into this phenomena indicates we should perhaps be splitting these things into two tiers, the runs which focus solely on the gameplay (or try to anyway), and those which exploit non-gameplay features somehow. This way, each can have its own set of rules as to what is considered good or bad play, and how multiple runs for a game compare against each other across versions.

Returning to the problem at hand, in my opinion, we need a way to classify these total control and similar runs. They need to have their own rules regarding completion criteria, as well as overall criteria on what is allowed and disallowed. I find a key factor here that games which lack these exploits are finite. Baring loops, there are a (nearly incalculable) finite amount of ways to complete a game. However, in games that we can add our own code, once we do so, there are now an infinite amount of ways to proceed. These two groups are now objectively distinct, and we really do need to come up with a set of rules for each. These rules we develop should take into account how the different groups of these runs differ from each other as I entailed in my presentation of the previous problem, and allow for many runs per game, but limit the infinite.

An important point to consider in developing a tier and rules for it is how a run makes use of the existing game once its payload begins. This run to my knowledge is the first run to no longer make any use of the game whatsoever once the payload begins. Meaning that the final payload can be ran on any SNES game once controller input to RAM becomes possible. Since it's not tied to a game, do we accept different runs of Notepad attached to Super Mario World, Super Metroid, Kirby Super Star, and others? Not only do we have to categorize what kind of diversity we allow for any given game, we need to limit how we apply this diversity to many different games, as I do not wish to publish the same virtually identical payload over and over.

I already began work in trying to define a new tier for these kinds of runs in order to spark some discussion. However, I find this initial work to be lacking in that it does not make use of yet to be developed movie class/branch criteria, and it does not supply a series of rules on how to deal with the aforementioned sub-problems in this tier problem. If we can iron these issues out, I'd be happy to enable the site to support it, and then accept this run if it conforms to the new rules.

In closing I want to add that runs like this which include payloads and play them take tool-assistance to their epitome input-wise, and we should put the effort in to properly acknowledge them.

Samsara: It's been 10 years. Let's actually put in the effort to properly acknowledge them. Judging.

Samsara: I'll keep this one short, as there's already a lot of previous judgement text here and I said everything I have to say on #7726: Sauraen, dwangoAC & Savestate's N64 The Legend of Zelda: Ocarina of Time "Triforce% ACE Showcase" in 53:05.30. Please read through the judgement on that run for much more detailed information.

Nearly 10 years have passed since this run's submission, the judgement of which set an unfortunate precedent that caused a divide between TASVideos and what would eventually become the TASBot community. We lived with that precedent and that divide for far too long, and thankfully we now have a system that can finally recognize and support runs like this.

I'd once again like to formally apologize for the way this run, and subsequent TASBot runs, have been treated by the site in the past. Upset as I am that it took us a full decade to correct this gigantic mistake, I'm glad that we were finally able to find a way to correct it. I promise that going forward, whatever gigantic mistakes we may be making now won't take that long to correct!

Accepting to our new Events class!

Spikestuff: Publishing.

Submission #4947: dwangoAC, Ilari & p4plus2's SGB Pokémon: Red Version "Pokémon Plays Twitch" in 08:11.42

Submission Comments