Hm, I somehow missed the part of p4wn3r's post that said that after a HALT, the game writes input to FFF5. Well then, that makes it even easier.
76 HALT
F0 F5 LDH A (FFF5)
22 LDI (HL) A
C3 xx D3 JP D3xx
... provided that the code is followed by a bunch of 0s or statements that have no significant effect when executed. The jump to D3xx goes into these 0s, and the code is located so that xx D3 falls on D36D, which is the pointer to this code. Of course HL is set to D3xx as well, and the program writes there.
The mind-numbing thing is that the program continuously executes (every cycle) whatever is written there. This can be bypassed using some well-placed spaghetti code, provided there is enough room to do so. I'll get to that later.
I now change the code to make use of rival name, and avoid switching parity. Here is the 11-byte block (all NOPs are hack bytes):
00 NOP
00 NOP
22 LDI (HL) A
00 NOP
76 HALT
00 NOP
F0 F5 LDH A (FFF5)
D4 50 D3 CALL NC D350
Note that the program will always call routine since carry is always clear. The initial placement of LDI (HL) A means that the program will start by writing 0x50, and fortunately this is a harmless instruction (LD D B).
In terms of taking D343-D34D and swapping it to D322-D32C, this is:
00 30 00 sp fm pk ed
0 $1 $2 $3 N1 0 0 N2 N3 N4 0
* . * . * . * . * . *
00 00 22 00 76 00 F0 F5 D4 50 D3
where $ is money byte, N is rival name byte, * means quantity (when switched to D322-D32C), and . means item ID. Some * bytes must be reduced through item tossing, but . bytes are exact, so parity need not be switched.
I then place this 11-byte block at D364-D36E.
When executed, this program writes 0x50 at D350, then writes in forward direction whatever input you give it. D350 is also executed every frame (this is why the writing program must be preceded by harmless opcodes such as 0s). Using input, I write the following at D351 (remember this is located just before the writing program):
18 0F JR there
there2:
16 80 LD D 0x80 (limit)
21 6F D3 LD HL D36F (target)
input:
76 HALT
F0 F5 LDH A (FFF5)
22 LDI (HL) A
15 DEC D
CA 6F D3 JP Z D36F (target)
18 F6 JR input
there:
18 EF JR there2
Here's an explanation of why this works:
- The program first writes 0x18 so the instruction reads "jump relative by 0". When executed, this doesn't branch anywhere (think about it), so execution normally falls back into the writing program.
- Next time through, the program writes 0x0F so now the instruction reads "jump relative forward by 15". Provided there is enough space, execution still falls back into the writing program.
- Because of the jump instruction, the program will not execute the following 15 bytes through the next 15 cycles, so the program can write whatever it wants there.
- At the destination of the jump, the program then writes 0x18. The instruction there reads "jump relative by 0", and again no harm is done.
- Finally, the program writes 0xEF so now the instruction reads "jump relative backward by 17". When execution hits this instruction, it jumps into the new program written in the 15 bytes between the jumps, and stage 2 begins.