Submission Text Full Submission Page
The classic Bad Apple, this time on the Game Boy Color (in Game Boy compatibility mode) with Pokemon Red.

Platform Choice

I've done a Bad Apple TAS on the Game Boy Color before with Crystal, and the Game Boy has already been done with Japanese Yellow. This run aims to use the Game Boy Color again, but this time in Game Boy compatibility mode. This mode is entered when the Game Boy Color bootrom detects that the game does not have Game Boy Color enhancements (indicated in the ROM header). As you'd expect, Game Boy Color enhancements (such as double speed mode and VRAM DMA) are locked out, however, various hardware quirks unique to the Game Boy Color are still present. Two (or arguably three) of these hardware quirks are abused in this TAS.
Note that the emulator is not set to emulate a Game Boy Color within a Game Boy Advance, as one of the hardware quirks used in this run does not work on the Game Boy Advance.

Game Choice

Red is used mainly as a Game Boy only game was needed, and for English Pokemon games, only Red and Blue fit the bill (English Yellow is GBC enhanced, unlike Japanese Yellow). Red saves a slight bit of time over Blue due to a smaller default player name and slight faster intro, and it's more thematically appropriate for Bad Apple (who's heard of a Blue apple?). Pokemon being used is the same sort of reasoning as my Crystal TAS, it's the kind of game you'd expect from an ACE TAS.

Video Basics

More detailed info can be found on the Pan Docs: https://gbdev.io/pandocs/Graphics.html
The Game Boy operates under a tile system like other retro systems at the time. VRAM holds 384 8x8 pixel tiles at addresses $8000-$97FF. Each tile contains 16 bytes, with each pixel using 2 bits to encode a color ID for each pixel (i.e. 2BPP, 2 bits per pixel). For the background and window layers, these tiles are selected for rendering using one of two 32x32 tile maps stored in VRAM at addresses $9800-$9BFF and $9C00-$9FFF. These tile maps contains 1 byte indexes for the tile to be displayed. Which tile map is used, the origin point for fetching tiles from the tilemap, and how these 1 byte indexes address tile data depends on the LCDC register and scrolling registers.

Video Rendering

More detailed info can be found on the Pan Docs: https://gbdev.io/pandocs/Rendering.html
The Game Boy does not render video at once, rather it rendered pixel by pixel by the PPU, directly to the screen. Each frame consists of 154 rows, or scanlines, the first 144 of which has the screen drawn, top to bottom, left to right. Each scanline consists of 456 "dots" of time. A dot is the smallest unit of time the PPU operates in, with 1 dot equaling 1 4MiHz tick. This means within 1 CPU m-cycle (assuming regular speed), there are 4 dots.
For each scanline rendered, the Game Boy switches between different "modes", as reported in the rSTAT register:
Mode 2, or OAM scan, is first mode within a scanline, lasting 80 dots. In this mode, OAM is scanned and objects which fall under the current scanline are inserted into a object buffer, which can fit up to 10 different objects. In this mode, VRAM can be freely accessed by the CPU, but not OAM.
Mode 3, or rendering, is the next mode within a scanline, lasting between 172 and 289 dots. In this mode, the PPU actually renders the scanline, and thus VRAM and OAM are not accessible by the CPU. To keep a rather complicated topic short, this render process involves two FIFO pixel fetchers (one for the background, one for objects), constantly fetching tiles and pushing out pixels. Certain things may cause the render process the stall, extending Mode 3's length. The primary cause of such stalling is an object, which requires waiting for the background fetcher to finish and the object fetcher to fetch a tile before more pixels can be rendered.
Mode 1, or HBlank, is the final mode within a scanline, lasting between 87 and 204 dots. Since the length of one scanline is fixed, Mode 1's length may be shortened if Mode 3's length is extended. In this mode, the PPU is inactive, so VRAM and OAM are freely accessible by the CPU.
For the final 10 scanlines, none of these modes are used. Mode 0, or VBlank, is used entirely for these scanlines. Similar to HBlank, the PPU is inactive in this mode, so VRAM and OAM are freely accessible by the CPU.

Palette

More detailed info can be found on the Pan Docs: https://gbdev.io/pandocs/Palettes.html
When the background FIFO pushes a pixel, it reads the BGP register to determine which color to use. Each 2 bits of the BGP register correspond to a color ID, the value of these 2 bits determines which color maps to the specified color ID. Unlike VRAM, the BGP register is never locked, and may be freely modified during Mode 3. This means for any pixel (subject to timing constraints), the BGP register may contain a different value, and thus potentially change the render output.
When the object FIFO pushes a pixel, it reads the OBP0 or OBP1 register, depending on the OAM palette attribute bit. This works like BGP, except for color ID 0, which is always transparent.
Note that this behavior isn't necessarily straight forward on the Game Boy. Due to the exact timing of the write and how BGP is implemented for the LCD, writing to BGP during rendering may result in the old palette and new palette values being OR'd together for one pixel on some Game Boys, or it might result in the old value still being used for a pixel, or it might result in the new value just being used as you'd normally expected. This is dependent on the LCD revision used, and of course changes if you were to say replace the LCD with some backlit one. On the Game Boy Color luckily, this doesn't matter so much, as the new color LCD used does not have this quirky behavior (the same goes for the Super Game Boy and Game Boy Advance anyways).

How To Render 64x144 Arbitrary Video On The Game Boy

All this information is nice, but how does it result in Bad Apple?
Since BGP can be changed at any point, in theory that could result in 160 pixels with different BGP values used. Except, it cannot in practice, as the CPU is not fast enough to do that. Additionally, with a "standard" unrolled write loop (i.e. ld a,[hl+] / ldh [c],a), you end up with 4 m-cycles per BGP write (i.e. 16 dots). Since BGP represents 4 different color IDs, this means you have 4 pixels per color ID (16 / 4), an effective 40 horizontal pixels (160 / 4). This is better than the Japanese Yellow Bad Apple (with its effective 40x36 resolution) despite having the same horizontal resolution, as the vertical resolution is able to increased all the way to the maximum 144.
However, the "standard" unrolled write loop is not the fastest way to copy data. A "popslide" technique can be used rather (i.e. pop de / ld [hl],e / ld [hl],d). This works as the pop opcode takes only 3 m-cycles, the theoretical fastest for reading 2 bytes in one opcode, as each memory access requires an m-cycle (1 memory access for the opcode, then 2 for the data read from the stack). This results in 1 byte written around every 3.5 m-cycles (i.e. 14 dots). However, this is an average rather, as the read and write times for each byte is not equal. The divide is rather a split between 5 m-cycles and 2 m-cycles (i.e. 20 and 8 dots).
This brings up another trick: purposefully using objects to stall Mode 3. An object is able to stall between 6 to 11 dots. If the delay is 8 dots here for each "long" write within the popslide, it reduces the amount of pixels drawn from 20 to 12. With this, the "long" BGP write could represent 3 pixels per color ID (12 / 4), with the "short" BGP write representing 2 pixels per color ID (8 / 4), thus each 20 pixels has 8 different effective pixels possible. Over all 160 pixels, that means 64 different effective pixels, giving a horizontal resolution of 64.
Of course, we wouldn't want objects to obscure Bad Apple here. Luckily, two options are available:
1. On the Game Boy Color, objects can simply be disabled via the LCDC register. Even when objects are disabled, the Game Boy Color will still proceed to fetch objects, creating Mode 3 stalls. Only when the PPU mixing pixels does it check LCDC. On the Game Boy, this option is unavailable, as disabling objects will skip the object fetch process entirely.
2. Color ID 0 is transparent. Therefore, if an object uses an all 0 tile, it will always be transparent.
For this TAS, option 2 is used, to reduce unneeded reliance on Game Boy Color hardware quirks.
A nice benefit of abusing BGP writes here is the tile data is rather simple. Only 5 tiles are needed for the background here, using the pattern 00112233|00011122|23330011|22330001|11222333. The tilemap is just filled with these 5 tiles looping.
As a note, using BGP writes as a raster effect isn't an entirely novel idea, it's actually a rather Prehistorik idea.

Frame Duplication

While rendering 64x144 arbitrary video is nice, it ends up taking up a ton of CPU time to do, and seemingly requires constant maintenance. There isn't sufficient free time to read the joypad, with the only free time given during VBlank. A single VBlank is not long enough to acquire all 2304 bytes needed to create a new 64x144 frame. In fact, multiple VBlanks aren't long enough to do so without severely cutting the framerate down to around 2 FPS.
The Game Boy Color offers a hardware quirk to resolve this issue. On the Game Boy and Game Boy Color, the LCD can be disabled. After around a frame's worth of time, no VSync signal will be given to the LCD by the PPU (as expected without the PPU active). The LCD normally reacts by "whiting out" with a "whiter than white" color. After enabling the LCD, the first frame rendered by the PPU is not actually displayed, as the PPU does not give a VSync signal to the LCD (like if the LCD was still disabled). This ends up having different results depending on the console:
On the Game Boy, the LCD will just proceed to white out if it hasn't already been whited out. There is no way to avoid the white out with the Game Boy.
On the Super Game Boy, the LCD will not white out ever (as there is no actual LCD here). The previous image will just stick around. While this seems like a potentially ideal platform choice, it isn't really. Subframe input isn't possible on the SGB, unless you just outright hijack the SNES side of the code with the standard JUMP command (at which point, you can just proceed to Bad Apple SNES style without the Game Boy being involved at all, besides with the initial payload).
On the Game Boy Color, if the LCD is enabled within the first 4 scanline's worth of time after the LCD is disabled, the previous frame will rather stick around. Any longer, and the white out will occur. Using this, the previous frame can be "duplicated." Abusing this quirk thus allows for free time to collect data from the joypad without maintaining BGP writes.
The Game Boy Advance operates similarly to the Game Boy Color, except the timing is rather around ~6.5 scanlines worth of time (the exact pattern for this timing isn't known). This doesn't really matter so much however, since the audio method used cannot be done on the Game Boy Advance.
Note that Gambatte currently emulates this case always as a Game Boy Color (except on the SGB, where no "blanking" happens at all like expected). As such, trying to do this trick with it set to Game Boy emulation can possibly result in frame duplication, which is not actually possible on real hardware, thus isn't valid to abuse while emulating a Game Boy.
As a note, abusing this quirk on the GBC to duplicate frames isn't a novel idea. It's used in various retail GBC games, such as A Bug's Life in its intro.

Audio Basics

More detailed info can be found on the Pan Docs: https://gbdev.io/pandocs/Audio.html
The Game Boy, like other systems at the time, produces sounds using multiple sound generation units. These sound generation units are denoted as channels. The Game Boy has 4 different channels, each with specialized usage.
This system allows for so called "8-bit" or "chiptune" music, but not easily for raw PCM playback that voices generally need. Luckily, one of the channels available, Channel 3, is a simple channel that allows for playing back arbitrary 4-bit PCM samples, with 32 total able to be stored at once (within "wave ram"), possibly set to forever loop.
Channel 3 can be set at any 2097152 / (2048 - x) frequency where x is 0-2047. In this case, x is set to 1934, making a frequency of 2097152 / (2048 - 1934), i.e. 2097152 / 114, i.e. ~18396.07Hz. 114 here is used as that allows for exactly 2 samples to be played back every scanline.
Only 4-bits of amplitude is fairly low quality. The quality however can be increased by changing the master volume for each sample. The master volume is a linear control that amplifies (i.e. multiplies) the mixed channel audio, with 8 different values per side (although, for this video, the music is kept mono regardless). This nearly provides roughly a 7-bit PCM quality playback (although not exactly, as multiple multiplications just result in the same value), just enough for fairly high quality PCM playback.

Audio Transfer

Obviously, Bad Apple will take way more than just 32 samples to play back. This brings in another quirk available on the Game Boy Color: writing to wave ram while channel 3 is active. Normally, you are not supposed to write to wave ram while the channel is active. However, on the Game Boy Color, this ends up just writing to whichever byte the last sample was read from channel 3. Using this trick, wave ram can be continously written to, effectively perfectly streaming PCM audio (although the bytes written will end up being 32 samples "ahead" as the written sample won't be read until the channel loops back around).
Note that writing to wave ram while the channel is active differs on the Game Boy, which prevents the write from going through, except on the exact cycle which wave ram is read by channel 3, which case the write does go through in the same manner as the Game Boy Color. However, this greatly increases the timing constraints for the payload, and would result in some minor audio artifacting as the volume write couldn't be aligned to wave sample changes. Certainly not an absolute deal breaker like frame duplication, but a notable improvement only available with the Game Boy Color.
On the Game Boy Advance, writes to wave ram while the channel is active will always fail. As such, this trick would not work at all on the GBA and does not work with the emulator set to GBA mode as it accurately emulates such. Due to this, the run must be done with the console mode set to GBC rather than GBA.
Note this audio quirk, along with the volume trick, is nothing new. It was already used anyways in MrWint's Yellow TAS (and the general idea has likely been used even before that TAS).

The first payload

The first payload is somewhat similar to MrWint's Yellow TAS and the old Pi TAS it based that on. However, the differences with Red/Blue's memory layout against Yellow means rather massive deviations are required.
In Yellow, hJoyInput (updated with joypad input at the VBlank interrupt) is located at $FFF5. This can be accessed with a ldh a,[$FFF5] opcode (i.e. $F0 $F5). While $F0 is not available in the Rival name, $F5 is (with the character), with $F1 being available (with the × character) for $F0, decreased by 1 using an item toss.
In Red, hJoyInput is located at $FFF8 rather. This is rather bad, as $F8 is not available in the Rival name, and neither is $F0, and only 1 can abuse item tossing. As such, a difference approach has to be used.
The . character corresponds to $F2, which is the ldh a,[c] opcode. If the c register contains $F8, then this will read $FFF8. This thus gives a way to read hJoyInput.
In order to do this, I've ended up using the following payload:
ld bc,$2E00
nop
ldh a,[$FFB4]
ld c,a
ldh a,[c]
halt
ld [hl+],a
jp hl
$FFB4 contains hJoyHeld. This is a joypad variable only updated in the overworld loop, not in the VBlank interrupt, thus for this payload it will stay the same, and is more or less fairly customization. The Rival name has the u character available for $B4. This byte also ends up corresponding to the glitch JOHN item.
For this payload, a jp hl is used as a loop point. At a glance, this seems fairly useless in the end, as with hl constantly increasing it will eventually just start eating into this meager payload and cause the ACE payload to just implode within itself. However, this is where the seemingly useless ld bc,$2E00 / nop comes in. These opcodes correspond to the bytes $01 $00 $2E $00. After $01 is eaten, you are left with $00 $2E $00. This corresponds to the opcodes nop / ld l,$00. This "resets" the l register back to 0, which will end up making the jp hl jump back to the next payload.
In order to create this payload, the Rival is named ×. u, and the trainer ID is manipulated to $F3E9. After such, these two party swaps are done:
10 <-> 11
10 <-> 9
This moves the rival name into items and places various $FFs beforehand into items. After these swaps, an item swap and various item tosses are done to further manipulate memory. Note the JOHN item is rather quirky in this regard. The game does not allow for tossing key items. The check for whether an item is a key item ends up being buggy if the item is some glitch item, as the game may read an out of bounds buffer address to determine if the item is a key item. For the JOHN item, this out of bounds buffer address happens to correspond to the same place where some text appears to be buffered. In this case, the last item toss amount character ends up going into this byte. Therefore, whether or not JOHN is a key item depends on the number of items last tossed.
After this item manipulation is done, a few more party swaps are done:
9 <-> 10
15 <-> 17
16 <-> 13
This ends up putting $F301 into wMapScriptPtr, thus making the game jump to $F301 (with HL set to such on entry). This allows up to 15 bytes to be used for the second payload.

The second payload

The second payload isn't anything rather impressive per se. With only 15 bytes available, there isn't enough space to have a "fast" payload writing the final payload. As such, this payload simply ends up writing a slightly larger payload. However, it is upgraded still to be subframe capable, although rather slow comparably (since it is calling into the game's slow ReadJoypad routine). Less than a frame is needed for the second payload to write and jump to the third payload.

The third payload

The third payload still isn't anything special. It simply writes the final payload over, using the fastest joypad routine possible. Bank 0 of SRAM is used to store this final payload, it is more or less free to plunder completely (only storing sprite buffers and Hall Of Fame data).

The final payload

The final payload is where things get interesting. At the beginning, it simply sets up the console state appropriately for the video and audio playback, then falling into a tight "main loop," performing jobs (reading the joypad, spamming BGP writes, adjusting volume, writing to wave ram, moving objects downwards) needed for video and audio playback. The code was mostly carefully written to allow for everything to fall into place.
After the video finishes, input ends, and the payload will setup RAM to allow completing the game, along with an extra surprise ;).

Source Code

Code for stringing this run together can be found here: https://github.com/CasualPokePlayer/BadAppleTAS
Note that a lot of conversions were actually just done using ffmpeg, the commands for which aren't listed anywhere (but would be relatively easy to re-create if need be).
The movie submitted isn't actually the full movie due to the full movie being too big, the full movie can be found here: https://mega.nz/file/8lFRHCrR#HN_Dm-4HexpJLjlmdN7r13psXUVy10T4b2veT_jwTj0


TASVideoAgent
They/Them
Moderator
Joined: 8/3/2004
Posts: 16012
Location: 127.0.0.1
Site Admin, Skilled player (1208)
Joined: 4/17/2010
Posts: 11635
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
LOL
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
MESHUGGAH
Other
Skilled player (1897)
Joined: 11/14/2009
Posts: 1366
Location: 𝔐𝔞𝔤𝑦𝔞𝔯
14.7 MB is too large to submit?
PhD in TASing 🎓 speedrun enthusiast ❤🚷🔥 white hat hacker ▓ black box tester ░ censorships and rules...
Emulator Coder, Judge, Experienced player (927)
Joined: 2/26/2020
Posts: 836
Location: California
MESHUGGAH wrote:
14.7 MB is too large to submit?
The limit is 2 MiB with standard permissions. In theory there is no limit for anyone with the "Override Submission Constraints" permission (in practice, there's a 20 MiB limit on anything that can be uploaded to the site, anything higher hits a 413 error).

1743544608