Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I took another look at 'world_map_gba' to try to resolver the issue. The first thing I did was try out the ROM on GBP to see the expected result: The interesting scanline is the one with a single brown pixel that sticks out on the right side of the castle into the grass approximately at the height of the player's eyes. This scanline is indicative of audio DMA delaying execution of the code that updates the reference regs when HBL IRQ happens. GBP seems to internally blend frames together, and this is why the effected scanline is slightly duller and blurrier than the others, as it is flickering between frames. In fact this is exactly what NanoboyAdvance gets if you turn on LCD ghosting (without it the flickering is noticeable, but this is a display issue not an emulation issue.) Then I looked at the code the game was using to see what the driver of the issue is and why GBAHawk wasn't matching hardware. It turns out pretty much everything except FIFO DMA is a non-factor. The timer that ultimately runs the FIFO is turned on after a VBlank IRQ, and the prefetcher isn't turned on at the time, and as these are well studied things, the absolute timing of when FIFO ticks happen is set. The latching of various involved ppu parameters is also well tested. So really the only movable parameter is timing of the FIFO DMAs. Before I could mess with that though I had several small bugs in the audio code (off by one errors and not clearing fifo on reset) as well as the ppu (a mish-mash old and new code from when I was working on reference point latching.) Once those were fixed it was just a matter of checking which FIFO ticks the DMA needed to run on to match hardware. Having done so however, the results are kind of awkward, new DMAs occur when the sample buffer has 14 samples left, instead of the expected and more natural 16. So something still likely isn't right. I implemented what I had for now though, as it was still better than before and fixes some definite bugs. A more thorough testing of FIFO DMA is still needed.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I've released version 1.7 of GBAHawk which has more stable linking and improved emulation accuracy, particularly for prefetch. It also cleans up numerous minor bugs. With this release I've finally reached the point where I'll have to develop some of my own tests to make more progress. Things are finally starting to come together and it's refreshing to be on the frontier of emulation for a change. Audio DMA tests will definitely need to be first, so progress will slow down for a while as I figure out how to put things together.
Dimon12321
He/Him
Active player (480)
Joined: 4/5/2014
Posts: 1128
Location: Ukraine
Thank you very much for your contribution! I have an abandoned project of Doom for GBA, but I think I'll redo it on your emulator when it's capable of console verifying a typical non-homebrew GBA game
TASing is like making a film: only the best takes are shown in the final movie.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
Making test ROMs is going pretty well. I found a new (but obvious in hindsight) audio FIFO behaviour (can't add samples to the FIFO manually while audio power is off) and nailed down exact timing of FIFO DMA. I also found an interesting prefetcher behaviour. When the prefetcher is full, it seems to stall until it is empty again, and then continue with non-sequential timing on the next access. This behaviour is actually important for Shrek 2. That game uses EWRAM a lot, and also runs a lot of code from ROM, so the prefetcher has a lot of time to work. When I implement this behaviour in GBAHawk, I get 0x25FC on the final time, only 21 frames from the correct value of 0x2611. I imagine the remaining issues with Shrek are prefetcher related, but I currently have ideas of what to test. I may just make tests running long lists of various instructions and see if anything unusual pops up.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
With a bit more work on the prefetcher I was able to get correct values in RetroEdit's Skrek 2 ROM Hack: I just needed to be a bit more careful with the full prefetch buffer behaviour to get things to work. Metroid Fusion currently desyncs on the Nightmare fight. Probably something is still missing with the prefetcher not covered by Shrek 2, possibly LDM / STM as I noticed those used quite a bit in the trace log. Getting really close to console verification now.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I've released GBAHawk v2.0.0 now that Metroid Fusion is finally console verified. This also contains misc. other bug fixes, including to EEPROM timing where I thought I was following the GBA Tek spec for timing but I made a mistake and things were happening instantly. I'll be focusing on resyncing and testing various other games now. There are a lot of cool GBA games, but also a lot of them use Flash or EEPROM which may have timings imprecise enough to cause problems in verification. For now I'll focus on SRAM games or games without saving to make sure I'm not missing anything common enough to effect a lot of games, then I'll try some of the other save types. In terms of emulation there are still a multitude of untested / unimplemented edge case behaviours, but I'm content for now to take a break and work on verifications. If past verification efforts have taught me anything its that unexpected emulation typically show up in testing, so I'll probably have plenty to keep me busy for a long while to come.
Editor, Publisher, Player (46)
Joined: 10/15/2021
Posts: 371
Can you put an "About" tab that shows the version number so it will be easier to keep track of various versions for movie playback purposes?
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
despoa wrote:
Can you put an "About" tab that shows the version number so it will be easier to keep track of various versions for movie playback purposes?
Yeah I'll do that for next rtelease.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
Unsurprisingly I'm facing a desync on console on the very first game I tried after Metroid Fusion (which happened to be Flintstones.) So far I can't figure out what the problem might be. The up side of game testing is that it stresses the console much more than crafted tests do, but the down side is that when a problem shows up, it's very challenging to figure out what it might be, since it could be anything in the system. At first I expected it to still be the prefetcher, and I made several new tests, fixed a few bugs, and cleaned up prefetcher code quite a bit. This didn't fix the desync though. Currently I am out of ideas. I think all I can do for now is keep testing with other games and see if any patterns emerge. So, testing continues.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
It's pretty surprising how easy it still is to write test ROMs that fail in GBAHawk but pass on hardware with just a few assembly instructions. Some things are nonsense, like only writing the first half of a BL instruction in thumb mode, which hopefully no games do. Other things are important, like writing modes to CPSR with upper mode bit clear, which is well known in various emulator issue trackers to break Grunty's Revenge if you don't emulate keeping the bit set. So far I haven't been able to conjure up the correct test that fixes Flinstones though. I've tried to resync other games that look short and glitchy enough to be interesting test cases, but without much success. Bionicle desyncs about half way through the game in a way I can't figure out how to fix. But on console it desyncs even earlier than that, I think due to inconsistent input read times. Ty 2 is very sensitive to EEPROM timings, may or may not be workable depending on EEPROM consistency. I still think the most likely problem for Flintstones is somehow the prefetcher, I just have no idea how. The next game I will try is DK: King of Swing
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
Today I found a new way to crash the GBA. For some background info, the GBA has a prefetch unit that can read instructions from the cartridge ahead of time and store them in a buffer that can be read quickly by the cpu. Also, the ARM7TDMI has two modes of operation, ARM mode which uses 32 bit instruction, and Thumb mode which uses 16 bit instructions. Finally, the cpu has a 3 stage pipeline, so the currently read instruction is 2 ahead of the one being executed. The prefetch unit doesn't care what mode the cpu is in, it just reads 16 bit values. If the prefetch buffer has only 1 16 bit value available, the next fetch just carries on as normal to complete the 32 bits needed. You can branch back and forth between ARM and Thumb modes with a single instruction. So, if you branch two instructions ahead, the next value you have to read will also be the next value in the prefetch buffer, because of the 3 stage pipeline (it if has that many values available of course.) In this case you can continue to read values from the prefetch unit even if you changed execution modes from 16 to 32 bits. Except, the prefetch unit will halt if its limit is reached, which is 8 16 bit values. In this case it waits until it is empty to do anything else, and it also waits for the start of the next instruction read to be re-enabled. So, if you were to fill the buffer in Thumb mode, then branch to ARM mode 2 instructions ahead with an odd number of values in the prefetch buffer, you will still read from the buffer but run out of values half way through reading a 32 bit ARM instruction. In this case, the console will crash, as it cannot the read for the next instruction. (At least this is my understanding of what I'm seeing.) This is not such a contrived setup as it may seem. Games branch 2 instruction ahead pretty commonly, they just don't do so between cpu modes that I've observed. Also the prefetch buffer can be quickly filled up if repeatedly reading values from EWRAM, and this also happens in games, and is important to emulate for console verification. I'd be curious to know if Nintendo was aware of this possibility. Obviously this isn't very helpful, but it's cool to see something new.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I FINALLY figured out how to pass the 'cancel-irq-ie.gba' test. It seems that when enabling the timer after it was previously disabled while it has an internal value of 0xFFFF, then an interrupt can occur if they are enabled. I think internally the timer ticks for one cycle before getting reset when it is enabling, which means it is ticking from 0xFFFF to 0 and overflowing, causing the interrupt. I made a simple test ROM that verifies this behaviour on hardware, and it indeed fixes the issue. So far I only have a rough draft of this fix, I need to think a bit more about how exactly to implement it in a clean way, but at least now the issue is known. In hindsight this probably could have been found sooner since a very similar thing happens with a reload value of 0xFFFF, which is well known since it's part of the mGBA test suit tests. But, lots of things seem simple in hindsight. So once that is fixed, the only failing test will be one that does a decrement DMA in ROM area near a 0x20000 boundary. I think I know what to do with this one, but it's messy to implement and needs more testing. This still doesn't fix Flintstones though. That game turns on the timers right after boot and never touches them again, so couldn't really be a fix anyway. No idea what is happening there.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I'm starting to make some more involved tests that test interactions amongst system components. The first thing I wanted to do was test an assumption I made a long while ago about how DMA occurs in BIOS. This assumption came about from weird behaviour in 'haltcnt' NBA test ROM that triggers a halt via a DMA started with an SWI. The result on emulator is 3 cycles off from hardware. My original fix was just guessing that maybe DMA started earlier in BIOS. So I made a test ROM to run the same thing the halt test does but start a timer instead. Results were as expected with normal DMA start time. SO what is the proper fix? If you assume: -Halt only stops the cpu on instruction boundaries: you get 3 cycles too few, 2 of them because DMA stops cpu on memory access boundaries, and the boundary here is after the first cycle of branch. So two more cycles (pipeline refill) would occur before the halt. -Halt immediately stops clocks to the cpu: you get 1 cycle off, the 2 refill cycles that need to run happen after halt instead of before as above. This is why i assumed the fix I did, because it was the only way I could think to get the extra cycle. So the actual fix, whatever it may be, is more complicated than these 2 options. I have no more guesses what it might be. For now I simply reverted the incorrect fix and I'll just have to take the failure until I can more carefully test what's going on.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I've released GBAHawk v2.0.1 as a cumulative bug fix / emulation improvement release, it also adds an about tab. Originally I was going to wait until I fixed Flintstones for this, but right now I'm honestly stumped. The issue is quite simple, a lag frame appears in emulator that doesn't on console. I can see where it is. The game decides what's a lag frame by whether or not the previous frame entered the halt loop. On emulator, this is missed by < 20 cycles, (out of 280896 in a frame.) The problem is the error isn't necessarily in that frame, as FIFO DMAs and a timer interrupt effect the timing, and those are started at the very beginning of the game loading and not changed. So the first thing I need to do is audit the timing of all the initialization code the game does until starting the timers and FIFO DMA, and if that doesn't fix anything I have to audit the timing on the error frame. So, I'll be doing that in between continuing verification testing on other games, maybe one of the Castlevanias.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I was able to splice one of my timing test ROMs into Flintstones to try to track down the source of the timing error. With this I could break into the test ROM at any point and compare against console. The resolution of this particular test is 14 cycles, and the reported error was 28 cycles. I narrowed down when the error appeared to one loop of code where nothing interesting seemed to be happening. This also happened to be when audio is turned on. The game resets audio by turning on the system, then resetting the FIFO unit, then manually writing 8 words to the FIFO buffers to fill them up before enabling auto filling via DMA. The amount of error seemd like the time one FIFO DMA would take, but I had tested pretty thoroughly and nothing seemed off. Except it seems something was off. It turns out that writing a full 32 bytes to the FIFO seems to actually reset it back to zero. A quick test ROM verified that this seems to be the case and it fixes Flintstones. This is very unexpected. I think I'll have to make a few more tests before committing this fix. On the right track though.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I've released GBAHawk v2.0.2 with improvements to Flash RAM, halt, and audio emulation. Improvements to Flash RAM timing are particularly notable as I was able to get stable sync on Sonic Advance (my copy of it anyway, could vary cart to cart.) This could potentially lead to verification of all 5 runs of that game. Not sure how well this will carry over to other games with Flash RAM, but either way it's much better than what it was. Improvements to halt emulation finally allow me to pass the halt_cnt tests, and my own halt tests that I made specifically to test the behaviour. Now the only known test failure I have is the 128k boundary test using decreasing DMA. This change did effect sync on Shrek 2 though, this seems to be another case of the line between lag and non-lag being only a small number of cycles, similar to Flintstones, so some small error is happening somewhere. no idea where. I have one new lead though. The game Incredibles, rise of the Underminer desyncs at the very first start press on console. This is a good candidate for the test ROM injection strategy I used to track down the issue with Flintstones. This will be the next thing I check once I have a big block of time to focus and put things together.
Dimon12321
He/Him
Active player (480)
Joined: 4/5/2014
Posts: 1128
Location: Ukraine
I wish I could have a similar console testing tool. How useful would it be for the emulator development to have reverse-engineered TASes (when the input is built blindly and is iteratively loaded and played back on console to see if TAS works alright)?
TASing is like making a film: only the best takes are shown in the final movie.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
Dimon12321 wrote:
I wish I could have a similar console testing tool. How useful would it be for the emulator development to have reverse-engineered TASes (when the input is built blindly and is iteratively loaded and played back on console to see if TAS works alright)?
Not useful for GBA. You have to work through desyncs in order, one at a time. If there is going to be a desync anyway, then you might as well use the emulator to create the TAS which is orders of magnitude faster and easier. There may be a point where the approach you suggest is the only viable one (and it was already done for the Mario Maker switch demo, so is certainly doable.) I would imagine that consoles up to and including DS and N64, or consoles of similar complexity, can be emulated with cycle accuracy eventually. For those cases it makes sense to develop TASes on emulator. Past that though, who can say, the complexity of modern consoles quickly skyrockets. And, even if an emulator can get games running, the gap between running games and making console verifiable TASes can be very large. If some one can get a decent workflow for modern-ish cartridge based consoles, like PSP or 3DS, to make TASes on hardware without emulator, it might be the best approach.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I've been pretty busy lately and haven't been able to focus on emulation, but now I am back on the accuracy grind. Figuring out Incredibles is the first thing I have to do. I made some test ROMs in the same style as Flintstones, and so far I am about half way through the game load process with no errors found. Then I made another test towards the end of the loading process that had a very large error compared to console. It turns out though that this test inadvertently uses accesses to un-mapped memory, so it wasn't really testing game loading, or at least it was mixing game loading and other edge cases of open bus accesses. It should still work in emulator as in console of course, so the first thing I have to do now is overhaul my open bus emulation to make sure I have all the edge cases correct. Looking over the available documentation it seems I definitely missed at least some details.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
i improved open bus emulation enough to fix my current tests. After that they pass on emulator the same as on console. So this leaves me with no leads as to why Incredibles doesn't work,. After initial loading takes place there is a lot of down time where the cpu is halted so there shouldn't be any timing sensitive errors, Only vblank and timer 1 IRQs occur, with audio DMA and not much else happening. So, I don't know, test rom injection was my only real strategy for isolating errors, back to the drawing board, I'll need more examples I guess.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
So I finally managed to track down the source of desyncs on console for Incredibles, and it turns out to be something related to gamepad IRQs. I don't have all the details yet, but what I know so far is that if I modify the ROM to not enable them, the existing test run syncs just fine. Gamepad IRQs are tricky because they are triggered at different times on console compared to emulator, since the input resolution is only 4 scanlines. But it also seems clear that my current implementation is somehow wrong. in the end I may need proper sub-frame input timing so that I can match input to the emulator where it would be console to get sync. I need to do several tests first, at least the problem is known now.
Editor, Experienced player (608)
Joined: 11/8/2010
Posts: 4012
Congrats on finding the cause of that long-standing issue. Keep us posted!
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I have released version 2.0.3 of GBAHawk. This version adds back in ffmpeg encoding, which was requested by several people. For emulation improvements, I added basic emulation of stop mode, which I suppose isn't very helpful for TAS, but is a usable hardware feature so is good to have implemented. In the process of adding it I realized that stop mode is disabled on GBP, so when I get around to Gamecube detection I will have to account for that. I also fixed the open bus issues and gamepad IRQ issues mentioned in the previous posts, along with some other minor things. With this, the only known console verification issue is with Shrek 2. I have no leads on it and no way to test anything. I think I will have to simply set it aside for now and test other games. I recently went back to double check Bionicle, and it was indeed just an input timing issue, so nothing fruitful came of it. There is a new test ROM available for Sprite VRAM access timing that GBAHawk currently doesn't pass. It looks like maybe I just have to cut off VRAM access a bit earlier when HBlank is used for VRAM access, but I'm not yet sure. Aside from that the only known failing test is due to decrementing DMAs in the ROM region, which I might fix for next release but isn't really important. I will be focusing on console verifications now, starting with Metroid Zero Mission and associated ROM hacks. With little else to go on, I kind of have to hope to see some desyncs on console that look approachable.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I've been testing Grunty's Revenge, and am getting desyncs that look like they could potentially lead to diagnosing some emulation issues. It appears I am getting bad RNG, and after g0goTBC helpfully told me where to look, I found that the RNG is initially tied to one of the GBA hardware timers for initialization. I confirmed that EEPROM is not an issue, by testing with both an original cart and with a dev cart that doesn't have EEPROM and I got the same results. This is very promising, as it makes this a perfect candidate for some more test ROM injecting. I'll be diving into this over the next couple of weeks, hopefully something interesting shakes out.
Alyosha
He/Him
Editor, Expert player (3535)
Joined: 11/30/2014
Posts: 2732
Location: US
I've made a lot of progress understanding Grunty's Revenge. The game doesn't use halt, and bases RNG off a timer it starts shortly after power on, so everything needs to be perfect to get correct RNG. It's a really good test case. Debugging this was difficult because I was getting confused by 2 unrelated errors that very coincidentally cancelled each other out. What are the odds! One error was in multiplication timing (6 cycles too long) and the other was a new behaviour in the prefetcher that I just discovered thanks to this game (which ended up resulting in execution being 6 cycles too short.) On ROM boundaries of size 0x20000, the prefetcher completely stops. These boundaries are known to generate forced non-sequential accesses, but it wasn't known that this causes the prefetcher to halt. I guess the prefetcher can only do sequential accesses. There is apparently also an extra cycle penalty when this happens. After fixing these I was still off a bit, so something else was wrong. It turns out that the prefetcher still runs even when execution is happening from RAM or BIOS. Originally I had this as disabled. This usually doesn't effect anything because execution in these regions is almost always longer than 16 cycles, which is the time it takes the prefetcher to fill its 8 byte buffer and halt. When execution returns to ROM it is either at a different point, or the address before the first one in the prefetcher (since the pipeline is two instructions long) triggering a reset. However Banjo Kazooie jumps to BIOS from ROM and runs only 2 instructions after the end of IRQ code, which is short enough that having the prefetcher still active triggers some bus access delays. With all of that fixed, now I can get as far as initial loading with exact execution timing. I can do a couple more tests up to first input, but after that it will just be a matter of running the TAS and seeing if it works. Unfortunately I can't get good RNG to resync the current run yet, so I'm a bit stuck. Hopefully I can guess and check my way to success. All of this also improved sync on Shrek 2, which started desyncing in mid-run around GBAHawk 2.0.1 due to IRQ changes. Console was known to sync further than this so something was up but I didn't know what. With these fixes it now goes back to desyncing at the next to last level, where it originally desynced. Unfortunately these improvements STILL didn't seem to fix it. The search for a solution continues.