Welcome, this is the discussion board of TASVideos.
Inject this code into D364. The pointer above will fall on D36D, as desired. This code relies on the fact that the registers have preset values at the moment of PC hijack:
24 INC H input: 76 HALT E2 LDH (C) A F2 LDH A (C) 2F CPL A 32 LDD (HL) A 02 LD (BC) A (item hack) 18 F8 JR input 64 D3 (not executed) pointer
In particular, HL is whatever D36D points to (D364), so INC H (making it D464) to obtain a sane starting point for HL (byte sequence will be written backwards and will overwrite the JR opcode thus allowing execution of the byte sequence). C already has 0, and A already has a number whose 4th bit is 1 and 5th bit is 0, so there is no need to initialize them. Finally, DI is not required for this, and HALT controls it so bytes are written per frame. This code also relies on the fact that a patching program can be written entirely in the 1x and 2x opcode rows. Note that the output byte sequence is limited to 1x and 2x opcodes, and 1x alternates with 2x (this is from "LDH A (C)" and "CPL A" above). The idea is to load an address 1x2x or 2x2x into DE, load A with the value at that address (which comes from the ROM) and put it in the address pointed by HL (which should be Dxxx-something). A can also be modified using CPL, DAA, RR, and RL, and the address for HL can be set up quickly because of "ADD HL,HL" (essentially HL:=HL*2), among H and L-modifying opcodes. All these opcodes come from the 1x and 2x rows. A self-contained program cannot be written entirely in 1x and 2x rows (because no backward branches are possible) but since this patching program need not be self-contained, just write a new program after the byte sequence. I should do more testing.
AF: 5880 BC: AF00 DE: 0D5D HL: D364 (or whatever D36D points to) SP: DFF9 PC: D364
If the desired number is in the 1x2y table, use the two-byte load (0x11):
xxyy yy=20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F xx= 10: FA E0 D4 A7 C8 47 0E 00 11 10 C1 2A 12 14 7B C6 11: 0A 5F 3E 6B 22 7B A7 28 06 18 F1 3E 63 83 77 D1 12: D1 3C 22 F1 77 C9 3E 01 EA 4A CC 3E 40 EA 9A D0 13: D1 06 41 C3 5B 3C 21 DB D0 4F 06 00 09 7E C9 F0 14: 3E 09 38 17 78 FE 4A 3E 0A 38 10 78 FE 74 3E 0B 15: D4 CB BE E1 C9 E5 21 2F D4 CB 9E E1 C9 E5 21 2F 16: 1E F1 CD 7E 3E F1 E0 BA C9 3E 08 E0 C6 CD 64 1E 17: 20 FC C9 E5 1A FE 50 20 04 44 4D E1 C9 FE 4E 20 18: C3 24 17 93 8C 50 93 91 80 88 8D 84 91 50 8F 82 19: F0 F9 AB EA 57 D3 79 EA 3A CC 78 EA 3B CC 2A FE 1A: 28 10 FE 16 28 0C 7E CD 38 22 CD 3E 37 E1 C1 C3 1B: 2A 12 7B 3C E6 1F 47 7B E6 E0 B0 5F 0D 20 EE C9 1C: C9 F0 C6 A7 C8 08 BF FF F0 C7 6F F0 C8 67 F9 F0 1D: E0 4A E0 06 E0 07 E0 47 E0 48 E0 49 3E 80 E0 40 1E: 00 20 CD 9B 49 CD F5 1E CD 6D 3E CD C8 01 F0 D6 1F: 3C C0 AF EA 43 DA FA 42 DA 3C EA 42 DA FE 3C C0 20: CD 93 20 20 DE C3 9B 20 F0 FF E6 0F FE 08 20 D3 21: 33 21 FA 42 CC C6 60 E0 AC F0 AA FE 02 20 04 3E 22: 78 18 15 AF EA 2A C0 EA 2B C0 EA 2C C0 EA 2D C0 23: C0 F0 B8 F5 3E 01 CD 7E 3E 3E FF EA CA CF CD B7 24: 3E 03 EA A6 D0 AF EA A0 D0 EA A1 D0 EA A7 D0 CD 25: D0 67 23 C3 8D 27 E1 AF EA A0 D0 FA A7 D0 CB 4F 26: FA A3 D0 85 30 01 24 EA AC D0 7C EA AD D0 FA A0 27: D0 47 FA A2 D0 B8 20 BD AF EA A0 D0 C9 11 5D 27 28: 84 3E 21 11 CF CB 46 CB 86 20 06 FA 5D D3 CD 8B 29: 29 3E 02 EA 93 CF F0 B8 F5 3E 01 CD 7E 3E CD A5 2A: D7 CB 6F FA 26 CC 0E 07 20 01 0D B9 20 C6 AF EA 2B: 93 CF A7 20 03 CD 1C 23 3E 01 EA 37 CC FA 29 D1 2C: 2F D7 CB B6 C3 E8 35 CB 4F C2 33 2D CB 57 C2 C9 2D: C9 AF EA 35 CC 3E FF C9 F1 F6 F7 50 7F 7F 7F 7F 2E: CD 03 13 F1 EA 1D D1 E1 D1 13 FA 93 CF FE 03 20 2F: F1 06 F6 80 12 13 3E 50 12 F1 EA 1D D1 C1 D1 E1 yy=20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
If the desired number is in the 2x2y table, use two one-byte loads (0x16, 0x1E):
11 2y 1x *2F 1A 22
It is somewhat more complicated if the number is not in the table. I won't say more on that now. Here is the patching program. It begins at D37D and runs for 232 bytes. It sets the HL pointer to D465, then writes the stage 2 program (bytes indicated on the right, with assembly translation). Starred (*) opcodes in the patching program are hack bytes.
16 2x 1E 2y 1A 22
About the stage 2 program: The initial part (above HALT) sets the size of the stage 3 program (0x10000-0xFEFF = 257 bytes) and where it is written to (0xDAAA). The part under input uses bank switch to 3 and call routine at 0x4004 (thanks p4wn3r) to get input value to FFF5. I also did some study on VRAM and experimented. Since there was a little room left to fit in a few more bytes into the stage 2 program, I inserted instructions (above HALT) to write ぇ tiles on the screen. The tiles already exist from before the PC hijack. I'm currently planning what the stage 3 program should do. I'll probably need to study sound as well. I'm not sure how the music keeps playing or where the music data is stored. (I used HALT, unlike bortreb's submission at least for his first 3 stages).
21 10 2F *1F HL=2F10 29 *1F HL=5E20 29 HL=BC40 11 25 18 *2F DE=1825 19 *2F HL=D465 11 25 1A *2F 1A 2F *13 22 F3 DI 16 21 1E 21 1A 22 21 LD HL 98EE 11 2E 1B *2F 1A 22 EE 11 2D 1C *2F 1A 2F *13 22 98 11 22 11 *2F 1A 22 3E LD A 77 11 2E 11 *2F 1A 22 77 11 24 11 *2F 1A 22 22 LDI (HL) A 11 24 11 *2F 1A 22 22 LDI (HL) A 16 27 1E 2D 1A 22 11 LD DE FEFF 11 27 1C *2F 1A 22 FF 11 25 14 *2F 1A 22 FE 16 21 1E 21 1A 22 21 LD HL DAAA 16 21 1E 2A 1A 22 AA 11 25 1F *2F 1A 22 DA input: 11 2C 19 *2F 1A *24 17 *25 *13 22 76 HALT 16 21 1E 2F 1A 22 3E LD A 3 16 24 1E 21 1A 22 03 16 22 1E 24 1A 22 EA LD (2000) A 11 27 10 *2F 1A 22 00 16 20 1E 22 1A 22 20 16 20 1E 20 1A 22 CD CALL 4004 16 21 1E 2E 1A 22 04 11 2C 12 *2F 1A 22 40 16 20 1E 28 1A 22 F0 LDH A (FFF5) 16 23 1E 23 1A 22 F5 11 24 11 *2F 1A 22 22 LDI (HL) A 16 2E 1E 22 1A 22 13 INC DE 11 2E 10 *2F 1A 22 7B LD A E 11 2A 17 *2F 1A 2F *13 22 B2 OR D 16 23 1E 2C 1A 22 CA JP Z DAAA 16 21 1E 2A 1A 22 AA 11 25 1F *2F 1A 22 DA 16 22 1E 21 1A 22 18 JR input 16 2E 1E 22 1A 2F *13 22 EC *1F *2F *1F
... provided that the code is followed by a bunch of 0s or statements that have no significant effect when executed. The jump to D3xx goes into these 0s, and the code is located so that xx D3 falls on D36D, which is the pointer to this code. Of course HL is set to D3xx as well, and the program writes there. The mind-numbing thing is that the program continuously executes (every cycle) whatever is written there. This can be bypassed using some well-placed spaghetti code, provided there is enough room to do so. I'll get to that later. I now change the code to make use of rival name, and avoid switching parity. Here is the 11-byte block (all NOPs are hack bytes):
76 HALT F0 F5 LDH A (FFF5) 22 LDI (HL) A C3 xx D3 JP D3xx
Note that the program will always call routine since carry is always clear. The initial placement of LDI (HL) A means that the program will start by writing 0x50, and fortunately this is a harmless instruction (LD D B). In terms of taking D343-D34D and swapping it to D322-D32C, this is:
00 NOP 00 NOP 22 LDI (HL) A 00 NOP 76 HALT 00 NOP F0 F5 LDH A (FFF5) D4 50 D3 CALL NC D350
where $ is money byte, N is rival name byte, * means quantity (when switched to D322-D32C), and . means item ID. Some * bytes must be reduced through item tossing, but . bytes are exact, so parity need not be switched. I then place this 11-byte block at D364-D36E. When executed, this program writes 0x50 at D350, then writes in forward direction whatever input you give it. D350 is also executed every frame (this is why the writing program must be preceded by harmless opcodes such as 0s). Using input, I write the following at D351 (remember this is located just before the writing program):
00 30 00 sp fm pk ed 0 $1 $2 $3 N1 0 0 N2 N3 N4 0 * . * . * . * . * . * 00 00 22 00 76 00 F0 F5 D4 50 D3
Here's an explanation of why this works: - The program first writes 0x18 so the instruction reads "jump relative by 0". When executed, this doesn't branch anywhere (think about it), so execution normally falls back into the writing program. - Next time through, the program writes 0x0F so now the instruction reads "jump relative forward by 15". Provided there is enough space, execution still falls back into the writing program. - Because of the jump instruction, the program will not execute the following 15 bytes through the next 15 cycles, so the program can write whatever it wants there. - At the destination of the jump, the program then writes 0x18. The instruction there reads "jump relative by 0", and again no harm is done. - Finally, the program writes 0xEF so now the instruction reads "jump relative backward by 17". When execution hits this instruction, it jumps into the new program written in the 15 bytes between the jumps, and stage 2 begins.
18 0F JR there there2: 16 80 LD D 0x80 (limit) 21 6F D3 LD HL D36F (target) input: 76 HALT F0 F5 LDH A (FFF5) 22 LDI (HL) A 15 DEC D CA 6F D3 JP Z D36F (target) 18 F6 JR input there: 18 EF JR there2
We'll get into how it works later. For the purposes of this TAS, the following 9-byte sequence takes less time to enter (and requires D350-D365 to be all 0):
D368: 76 HALT F0 F5 LDH A (FFF5) 22 LDI (HL) A C3 5B D3 JP D35B
Why? Well, go back to when the rival was named (space) (female) (PK) (END). Memory in D343 at that point is:
D366: 22 LDI (HL) A 00 NOP 76 HALT 00 NOP F0 F5 LDH A (FFF5) D4 50 D3 CALL NC D350
We want to change it to the 9-byte sequence somehow, so we do this: * Reset in the middle of saving. This overwrites D162 with FF and the game thinks you have 255 Pokemon. More importantly, you can now switch Pokemon below the 6th level. This will swap huge chunks of memory. * Switch any Pokemon 1-9 with the 10th Pokemon. This overwrites D31C with FF and the game thinks you have 255 items. * Switch the 17th and 20th Pokemon. Apart from other stuff going on, this will swap D322-D32C with D343-D34D. So now D322, which is close to the beginning of the item list, reads:
D343: 00 00 00 00 30 00 7F F5 E1 50 00
or from the beginning of the item list:
D322: 00 00 00 00 30 00 7F F5 E1 50 00
None of the "items" cause problems when trying to toss them (some values crash the game, or are untossable). The addresses in the even spaces (D31E, D320, D322, ...) can all be reduced through tossing items, and 00 is treated as 0x100 so tossing one gives 0xFF. We now do this: * Toss D321 completely. This shifts it so it looks like this:
D31D: FF FF FF FF FF 00 00 00 00 30 00 7F F5 E1 50 00
* Toss 14 of D323. This changes 0x30 to 0x22. * Toss 9 of D325. This changes 0x7F to 0x76. * Toss 13 of D327. This changes 0xE1 to 0xD4. * Toss 45 of D329. This changes the 0x00 directly after 0x50 to 0xD3.
D31D: FF FF FF FF 00 00 00 30 00 7F F5 E1 50 00 00 00
* Switch D32B and D329. This switches the last 00 00 with 50 D3. * Switch D329 and D327. This switches the same 00 00 with F5 D4.
D31D: FF FF FF FF 00 00 00 22 00 76 F5 D4 50 D3 00 00
* Toss 16 of D327. This switches the last 0x00 with 0xF0.
D31D: FF FF FF FF 00 00 00 22 00 76 00 00 F5 D4 50 D3
Notice that we have our 9-byte sequence above starting from D324, but it needs to go to D366. So we do some Pokemon switches: * Switch the 19th and 17th Pokemon. This swaps D322-D32C with D338-D342. * Switch the 12th and 11th Pokemon. This swaps D322-D34D with D34E-D379 so now D338-D342 ends up at D364-D36E, exactly where we want it. Notice that the switching ensures that D350-D363 are all 0. Now close the menu. This is what happens: * At some point, the game reads D350 off of address D36E, which is supposed to hold a ROM address, and jumps to it. The value in register A is 0x50, and in HL is 0xD350, the location of the jump (note that the instruction 00 is NOP, which does nothing).
D31D: FF FF FF FF 00 00 00 22 00 76 00 F0 F5 D4 50 D3
The instruction LDI (HL),A (22) writes whatever is in A to the address pointed by HL. When it reaches HALT (76), it waits for the next frame.
D350: 00 00 ... 00 22 00 76 00 F0 F5 D4 50 D3
Now HALT runs a routine that puts key input into FFF5 as a number. The instruction LD A, (FFF5) (F0 F5) places this number into A. Then CALL D350 (D4 50 D3) jumps to D350 as a subroutine (in which we do not care about the "subroutine" part of it) and execution cycles again. Thus, by using LDI (HL), A over and over along with LD A (FFF5) to read input, we can write a program at D350. However, since D350 is executed every cycle, we must be careful. The program at D350 reads:
D350: 50 00 00 ... 00 22 00 76 00 F0 F5 D4 50 D3
which is harmless. We feed the input 0x18 and now it reads:
50 LD D B
JR 0 means "jump relative by 0", which doesn't go anywhere. We feed the input 0x0F and now it reads:
50 LD D B 18 00 JR 0
Notice the significance of the last instruction. It jumps 15 forward but still prior to the instruction LDI (HL),A at D366. That means that now we can input anything for the next 15 bytes and execution will ignore them. We now write:
50 LD D B 18 0F JR D362
Feed the input 0x18:
50 LD D B 18 0F JR there 16 98 LD D 0x98 21 6F D3 LD HL D36F input: 76 HALT F0 F5 LDH A (FFF5) 22 LDI (HL) A 15 DEC D CA 6F D3 JP Z D36F 18 F6 JR input there:
Again, when jumping to the label "there", the instruction 18 00 does nothing because it is jump by 0. Now feed the input 0xEF:
50 LD D B 18 0F JR there 16 98 LD D 0x98 21 6F D3 LD HL D36F input: 76 HALT F0 F5 LDH A (FFF5) 22 LDI (HL) A 15 DEC D CA 6F D3 JP Z D36F 18 F6 JR input there: 18 00 JR 0
Execution now jumps to "there", which then jumps to "enter", executing the program we set up. This is the stage 2 program, a simple RAM writer that writes 152 bytes to D36F and then executes it. It is left to the reader to verify this. Stage 3 is now constructed, and it is large. Here it is:
50 LD D B 18 0F JR there enter: 16 98 LD D 0x98 21 6F D3 LD HL D36F input: 76 HALT F0 F5 LDH A (FFF5) 22 LDI (HL) A 15 DEC D CA 6F D3 JP Z D36F 18 F6 JR input there: 18 EF JR enter
It works as such: * First, it draws out the 8x8 tiles for pi, pi upside down ("pirev"), decimal point, 0, and 1. All other digits are already in the tileset. * Then, it takes input and changes the VRAM accordingly. Input is as follows: ** If input is from 0x00 to 0xED, the program draws to a cache which can later be dumped into VRAM at 9982. The drawing field is 8 rows by 16 columns. The input is considered as aaacbbbb, where aaa is the row, bbbb is the column, and c=0 for the pi tile and c=1 for the pirev tile. The choice of format is motivated by the fact that going down a row in VRAM is the same as adding 0x20=32 to the address. ** If input is 0xEE, the program dumps the cache into 9982, and then clears the cache using the black tile (value 0x10). ** If input is 0xEF, the program executes its ending sequence. ** If input is 0xF0-0xFF, the program writes directly to 9802, with the drawing field being 12 rows by 16 columns. The input's own number is written in as the value of the tile; all tiles of interest are in the Fx row. Tiles are written serially from left to right, then from top to bottom. Technically, the program can draw into the the bottom drawing field used by the first case of input above. * Its ending routine is a "fake crash" that disables interrupts, floods the bottom screen with pi tiles, then gets itself into an infinite loop (18 EF, or JR -2). And that's about it. There is no guarantee that VRAM writing occurs in a safe (to a real GB) manner; it does not check for status of FF40-FF41 (for reference, pages 51-53 of http://marc.rawer.de/Gameboy/Docs/GBCPUman.pdf ). ---- The C++ parser works as follows: * It recognizes the characters "0123456789abcdef|*s@." as well as uppercase variants. All other characters are delimiters. * If it first detects a character from "0123456789abcdef", then it expects immediately a second character from "01234567". This is for drawing to the cache representing VRAM at 9982. The first character represents the column, the second character the row. If it is immediately followed by '*', then it uses the pirev tile. Otherwise it uses the pi tile. * If it first detects the character '|', it counts to 15, inserting 0xED inputs along the way (the do-nothing-visible input), then inserts 0xEE (dump cache). It also resets the count to 1. * If it first detects the character '@', it counts to 8, inserting 0xED inputs along the way (the do-nothing-visible input), then inserts 0xEE (dump cache). * If it first detects 's', then it expects immediately a second character from ".0123456789". This is for drawing directly to VRAM at 9802. It uses the second character's corresponding tile. The parser does not insert the ending code (0xEF); it must be manually hex-edited. Also, the number of inputs between 0xEE (dump cache) commands should not exceed the counts to 15 or 8, whichever is appropriate. ---- Edit: Fixed pointer to data.
21308F LD HL 8F30 //where to place data in Tile Data Table 01D2D3 LD BC data //where 8x8 data is 1628 LD D 28 //size of 8x8 data loop: 0A LD A,(BC) 22 LDI (HL),A 22 LDI (HL),A 03 INC BC 15 DEC D 20F9 JR NZ loop 2100D6 LD HL D600 3E10 LD A #10 loop1.5: 22 LDI (HL),A //clear mirror CB44 BIT 0,H 28FB JR Z loop1.5 //loop until D700 1E00 LD E 00 // register DE is offset of BTM outerloop: 76 HALT F0F5 LD A, FFF5 FEEF CP A,#EF 2869 JR Z exit FEEE CP A,#EE 282B JR Z refresh 3016 JR NC numbers 4F LD C A CB61 BIT 4,C 2804 JR Z sprite1 3EF4 LD A #F4 //pirev 1802 JR skip sprite1: 3EF3 LD A #F3 //pi skip: CBA1 RES 4,C 0600 LD B 0 2100D6 LD HL D600 //start of lower BTM mirror (starting from 9980) 09 ADD HL,BC 77 LD (HL),A 18DD JR outerloop numbers: 210298 LD HL 9802 //start of BTM+2 19 ADD HL,DE 12 LD (HL),A //write "Fx" number to BTM 13 INC DE CB63 BIT 4,E 28D3 JR Z outerloop 7B LD A E C610 ADD A,#10 5F LD E A 30CD JR NC outerloop 14 INC D 18CA JR outerloop refresh: 218299 LD HL 9982 //start of lower BTM 0100D6 LD BC D600 loop2: //loop for 256 bytes 0A LD A,(BC) 22 LDI (HL),A 3E10 LD A #10 02 LD (BC),A //clear mirror 03 INC BC CB40 BIT 0,B 28F6 JR Z loop2 //that means B is still D6 18B8 JR outerloop data: FF 81 5B DB DB DB DB B9 //f3 pi FF 3B B7 B7 B7 B7 B5 03 //f4 pirev 00 00 00 00 00 30 30 00 //f5 . 00 38 4C C6 C6 64 38 00 //f6 0 00 18 38 18 18 18 7E 00 //f7 1 exit: F3 DI //disable interrupts, such as current music 218099 LD HL 9980 3EF3 LD A #F3 //pi loop3: 22 LDI (HL), A CB54 BIT 2,H //loop until 9C00 28FB JR Z loop3: done: 18FE JR done