The classic Bad Apple, this time on the Game Boy Color with Pokemon Crystal.
The Game Boy and Game Boy Color are platforms I am intimately familar with already. The Game Boy has
already had someone done Bad Apple, and the Game Boy Color has various extra tricks available, allowing for substantial improvements for both audio and video quality. As such, the Game Boy Color (with a Game Boy Color game) is used.
Note that the emulator is not set to emulate a Game Boy Color within a Game Boy Advance, as one of the tricks used in this run does not work on the Game Boy Advance.
Game Choice
Crystal is used mainly as it was the
simplest movie to convert to a "total control" ACE TAS (as the movie is somewhat already a total control ACE TAS in a limited capacity). It is also used as Pokemon is generally the kind of game you'd expect these kind of ACE TASes, and Crystal (along with Gen 2 Pokemon) is more associated with the Game Boy Color compared to Gen 1 (which only has international Yellow gaining GBC enhancements, which were so shoddy Yellow was not even advertised as a GBC enhanced game). Gen 2 Pokemon is also just my preference against Gen 1 Pokemon (as evidenced by my Gen 2 TASes).
Video Basics
The Game Boy and Game Boy Color operate under a tile system like other retro systems at the time. VRAM holds 384 (768 in GBC mode) 8x8 pixel tiles at addresses $8000-$97FF. Each tile contains 16 bytes, with each pixel using 2 bits to encode a color ID for each pixel (i.e. 2BPP, 2 bits per pixel). For the background and window layers, these tiles are selected for rendering using one of two 32x32 tile maps stored in VRAM at addresses $9800-$9BFF and $9C00-$9FFF. These tile maps contains 1 byte indexes for the tile to be displayed. Which tile map is used, the origin point for fetching tiles from the tilemap, and how these 1 byte indexes address tile data depends on the LCDC register and scrolling registers.
On the Game Boy Color, in order to mostly reuse the Game Boy's graphics system, VRAM stores two 32x32 tile attribute maps at addresses $9800-$9BFF and $9C00-$9FFF. These attribute maps correspond to tile maps, giving extra info for the tiles pointed to by the tile map. One of these is selecting which background palette is used among the Game Boy Color's 8 4-color background palettes. This thus maintains the old 2BPP tile format while giving Color to the Game Boy.
This system is intended to allow reuse of various graphics, saving on precious VRAM and CPU time. However, Bad Apple is a full motion video. It does not fit neatly into simple 8x8 tiles (without heavily butchering the video anyways), and requires constant change, potentially for the entire frame of video. As such, the entire display should be filled unique tiles, in this case, 360 different tiles (20x18 tiles, for the 160x144 screen), while the tile map stays constant.
360 different tiles is a ton of memory, 5760 bytes, and that is assuming tile attribute maps are completely untouched. Since Bad Apple is largely black and white anyways, I opt to simply keep the video to only 4 colors, thus avoid ever needing to touch palettes and thus allowing the tile attribute maps to stay constant.
The Game Boy Color, as another enhancement, features a second VRAM bank, giving twice the VRAM as the original Game Boy, along with introducing banking at all for VRAM in the GB line. This is what holds the latter 384 tiles, along with the tile attribute maps (hence why their addresses "overlap"). Tile attribute maps also specify which bank tile data should be fetched from. Since only 360 tiles are needed here and the tile attribute maps will stay constant here, the second VRAM bank doesn't need to be touched for playback (outside of initially clearing it).
Tile Addressing Modes
As said before, a tile map contains 1 byte indexes for each tile. However, a single byte only has 256 values, which is certainly not enough for the 360 different tiles needed (nor the maximum 384/768 tiles available). So how can 360 different tiles be used?
The LCDC register contains bits which controls how tiles will be addressed. There are two addressing modes, the "$8000 method" and the "$8800 method".
For the $8000 method, tiles are stored at $8000-$8FFF, with the tile index being an unsigned index with $8000 used as a base pointer. This results in a formula of $8000 + index * $10 to get tile data, where index is 0 to 255. Thus index 0 has a tile at $8000-$800F, index 1 has a tile at $8010-$801F, and so on until index 255 with a tile at $8FF0-$8FFF.
For the $8800 method, tiles are stored at $8800-$97FF, with the tile index being a signed index with $9000 used as a base pointer. This results in a formula of $9000 + index * $10 to get tile data, where index is -128 to 127 (in standard two's complement). Thus index 0 has a tile at $9000-$900F, index 1 has a tile at $9010-$901F, and so on until index 127 with a tile at $97F0-$97FF. Then on the other side, index -1 has a tile at $8FF0-$8FFF, and so on until index -128 with a tile at $8800-$880F.
The LCDC register can be modified at any point, even mid-scanline. In this case, only 2 writes to LCDC are needed per frame, once at the beginning to use the $8000 method, then another time halfway through the frame to use the $8800 method, thus allowing 360 unique tiles to be used.
Video Transfers
The goal here is now to transfer 5760 bytes to change the entire frame. This presents two problems, reading 5760 bytes from the joypad, and uploading such to VRAM very quickly (preferably uploading within a single frame to avoid tearing).
The first problem ends up bringing up the first trick the Game Boy Color brings to the table: double speed mode. As an enhancement, games can switch to double speed mode, doubling the CPU's clock rate (while other components such as the PPU and APU still operate under the same frequency as before), doubling the amount of CPU cycles available within a frame. Using this mode effectively allows for doubling the amount of bytes that can be read from the joypad.
The second problem brings in the second trick the Game Boy Color brings to the table: VRAM DMA. The Game Boy Color introduces a new DMA unit for VRAM transfers. In this context, it is effectively a unit which transfers blocks of $10 bytes (i.e. a tile's worth of granularity) at 2 "double speed" CPU cycles per byte or 1 "normal speed" CPU cycle per byte. The DMA unit is clocked at the same rate regardless of if double speed mode is active, hence double speed mode taking twice as many cycles (although still the same amount of time as normal speed mode). While the DMA is active, the CPU remains halted.
Despite VRAM DMA not benefiting from double speed mode, it is still incredibly fast. On a normal Game Boy, the fastest way to transfer bytes to VRAM would be a popslide technique (e.g. pop de / ld [hl],d / inc l / ld [hl],e / inc l
), usually at best around 9 normal speed cycles for 2 bytes, or around 4.5 normal speed cycles per byte. This technique is much slower compared to a DMA (around 4.5x slower!), and still slower on the Game Boy Color with double speed mode (around 2.25x slower). This technique also has an incredible amount of register pressure (like any copy on the CPU would have), which a DMA does not have (as the source and destination for DMA are just I/O registers, which increment while the DMA occurs, so very little is needed in terms of CPU register usage to continuously apply DMAs).
With VRAM DMA, I aim to copy 2 tiles every scanline. There's plenty of time between PPU rendering (which locks VRAM access) to DMA way more tiles, but such isn't needed and DMA length has to be balanced out with audio handling regardless. In total, 180 scanlines are used to perform DMAs for all 360 tiles.
Of course, 180 scanlines is more than the 144 total scanlines rendered (and more still if including the 10 "scanlines" for vblank). So how does this end up not tearing? Quite simple: the DMAs start the frame before the new frame would be displayed, after 8 scanlines have been displayed. Once 8 scanlines are displayed, the first row of tiles can be replaced as they have already be displayed. Rendering will go faster than the DMAs will replace tiles, so there's no risk of the DMAs "winning" the race against the PPU reading VRAM. The DMAs will end up finishing before the PPU ends up reaching anywhere near the end of the new 360 tiles, thus allowing an new frame to be transferred over without any tearing.
Reading 5760 bytes from the joypad ends up being the main bottleneck however here. There simply isn't enough time to end up reading enough bytes from the joypad while also transferring them into VRAM at Bad Apple's original 30 FPS. As such, the framerate is sacrificed, going from 30 FPS down to 20 FPS. This extra frame is enough to gain enough time to read all the bytes from the joypad and transferring them into VRAM.
Audio Basics
The Game Boy and Game Boy Color, like other systems at the time, produces sounds using multiple sound generation units. These sound generation units are denoted as channels. The Game Boy and Game Boy Color have 4 different channels, each with specialized usage.
This system allows for so called "8-bit" or "chiptune" music, but not easily for raw PCM playback that voices generally need. Luckily, one of the channels available, Channel 3, is a simple channel that allows for playing back arbitrary 4-bit PCM samples, with 32 total able to be stored at once (within "wave ram"), possibly set to forever loop.
Channel 3 can be set at any 2097152 / (2048 - x)
frequency where x
is 0-2047. In this case, x
is set to 1991, making a frequency of 2097152 / (2048 - 1991)
, i.e. 2097152 / 57
, i.e. ~36792.14Hz. 57 here is used as that allows for exactly 4 samples to be played back every scanline.
Only 4-bits of amplitude is fairly low quality. The quality however can be increased by changing the master volume for each sample. The master volume is a linear control that amplifies (i.e. multiplies) the mixed channel audio, with 8 different values per side (although, for this video, the music is kept mono regardless). This nearly provides roughly a 7-bit PCM quality playback (although not exactly, as multiple multiplications just result in the same value), just enough for fairly high quality PCM playback.
Audio Transfer
Obviously, Bad Apple will take way more than just 32 samples to play back. This brings in another trick (or rather quirk) available to the Game Boy Color: writing to wave ram while channel 3 is active. Normally, you are not supposed to write to wave ram while the channel is active. However, on the Game Boy Color, this ends up just writing to whichever byte the last sample was read from channel 3. Using this trick, wave ram can be continously written to, effectively perfectly streaming PCM audio (although the bytes written will end up being 32 samples "ahead" as the written sample won't be read until the channel loops back around).
Note that writing to wave ram while the channel is active differs on the original Game Boy, which prevents the write from going through, except on the exact cycle which wave ram is read by channel 3, which case the write does go through in the same manner as the Game Boy Color. Of course too, if this trick is attempted on a Game Boy game running on the Game Boy Color, it will work just the same as on a Game Boy Color game, as this is just a hardware quirk rather than any explicit enhancement. As such, a potential improvement to a Bad Apple run on the Game Boy could simply be to use the Game Boy Color's Game Boy compatibility mode and take advantage of this quirk (although that could be sort of considered "cheating" perhaps, but I digress).
On the Game Boy Advance (running a Game Boy/Game Boy Color game), writes to wave ram while the channel is active will always fail. As such, this trick would not work at all on the GBA and does not work with the emulator set to GBA mode as it accurately emulates such. Due to this, the run must be done with the console mode set to GBC rather than GBA like my existing Crystal TAS does. Some minor work was thus done to "resync" the existing Crystal TAS to the GBC (which in practice was just finding a different TID manip setup).
Note this audio quirk, along with the volume trick, is nothing new. It was already used anyways in
MrWint's Yellow TAS (and the general idea has likely been used even before that TAS), although that used ~18KHz instead of ~36KHz as I've used.
The first payload
The first payload is just exactly the same my
existing Crystal TAS. It simply executes opcodes according to the current joypad value. The game's own joypad routine is used, so it is rather slow compared to a hand crafted joypad reading function, and only the
HL
register is free to be be used. This is enough however to write the second payload.
The second payload
The
second payload is nothing special, it simply swaps over to bank 3 of WRAM (unused by Crystal), and fills the entirety of that bank with data from the joypad, comprising of the final payload.
The final payload
The
final payload is where things get interesting. At the beginning, it simply sets up the console state appropriately for the video and audio playback, then falling into a tight "main loop," performing jobs (reading the joypad, adjusting volume, writing to wave ram, DMAing over buffered joypad reads) needed for video and audio playback. The code was mostly carefully written to allow for everything to fall into place (although there's some sloppy work present mostly due to not bothering to clean up some iteration cruft).
Source Code
Note that a lot of conversions were actually just done using ffmpeg, the commands for which aren't listed anywhere (but would be relatively easy to re-create if need be).