Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I'll be using this thread to keep track of work I'm doing related to GBA emulation and testing with the goal of console verification of GBA TASes. Code: https://github.com/alyosha-tas/GBAHawk This summer I had spent a lot of time working on GBA emulation. I figure this is the next most likely console to see new console verification progress, so I decided to make an emulator core myself to understand the console and hopefully use to make console verified TASes. Originally I wanted to experiment with a modern time keeping / task scheduling style used in other GBA emulators, but after looking over the specs on GBA Tek, there looked to be too many moving parts for me to easily keep track of. So, I went with my usual approach of emulating the whole system one clock cycle at a time. I'm glad I did, as there a lot of nuances, and I spent pretty much all my time focused just on accurate timing of the basic building blocks (cpu, timers, dma, prefetcher) and I haven't even gotten to the ppu or sound much yet. So far though I have really enjoyed the process, this console does a lot of cool things, and I'm really impressed with how everything is put together, and it has been fun learning about a slightly more modern architecture than what I am used to. Anyway, I have just enough working to present some early results. So far I can pass the Aging Cart tests, get 2020 on mGBA suite ( https://github.com/mgba-emu/suite ) timing tests, and 1256 on the DMA tests, but other emulators can already do that. Additionally, I can get 936 on the mGBA suit count up tests, pass the irq_delay and isr tests ( https://github.com/destoer/gba_tests ) and pass the AGBEEG cart tests ( https://github.com/GhostRain0/AGBEEG-Aging-Cartridge ) all of which I believe no other emulator can do so far: PPU emulation is my next goal, I only have enough timing details down to pass the basic tests, a lot of work left to do. I haven't properly started rendering yet either, so I have a long road ahead of me. That's about it for now, my next immediate test goal is the misc. edge case tests on the mGBA suite, which aside from PPU timing I also need to sort out a few details on open bus emulation for. I'm hoping to be able to make a decently detailed post every month or 2 with progress and ideas.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I haven't been able to focus on this too much over the past few months, but I have a bit of time now and am back to making progress. The biggest thing I've done is port the core over to C++. The C# core was just way too slow, and was falling under 60 fps even in simple cases. I expected I would have to do this so built the C# core to be conversion friendly from the start, but it still took a bit of effort. Converting to C++ and making a few basic optimizations about doubled the speed, and now I at least have a bit of headroom to work with. I expect more complicated scenes to still fall below 60 fps until I optimize a bit more, but that's a concern for further down the road once the rendering pipeline is finished. Aside from that I made a bit of progress with rendering timing and open bus emulation, so I can get past the misc edge case timing test on the mGBA test suite: I still plan to develop the rendering pipeline in C# as it's just much easier to bug fix in it, so for now both cores will be developed together. Sprites are the next thing to do.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
Made some good progress over the past few weeks. Most of the rendering effects are done now mainly thanks to the Tonc demos (although I probably missed some alpha blending cases somewhere.) EEPROM saving is done, and various bugs are fixed. Now I can pass the entire mGBA test suite (with the caveat that Misc edge tests -> H-Blank bit -> Flip #1 matches hardware but not what the test says) and all of the NBA tests. The only major element left to work on is sound. In the past this has always given me the most trouble, but I already have the FIFO channels working and the others are just slightly modified GB audio channels, so I'm hopeful I can make relatively quick work of it. I also haven't done VRAM access timing stuff yet, I'll hold off on that until I see how many games really need it. Anyway, I know of a few bugs I need to sort out besides audio, after that console testing can start, hopefully in a month or two.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
One of the things that keeps me interested in emulation is just how close you can get to being right while still ultimately being wrong. I was recently working through the newer NBA ppu tests and somehow things just were not working out. It seemed like there was some off by one issue with DMA even though many other stringent DMA timing tests worked. Several hours of tinkering got me nowhere, then by happenstance I was scrolling through my code and noticed my execution loop, where it turns out I had a made a very basic error. Here is what things looked like before:
dma_Tick();   // DMA
pre_Tick();   // prefetcher
ser_Tick();   // serial port
tim_Tick();   // timers
cpu_Tick();   // CPU
The problem here is that both the CPU and the DMA channels can read and write to the timer registers and access ROM, but the DMA tick is happening before the timers and prefetcher while the CPU is happening after. Somehow I had gone through all the grueling timer and prefetcher tests (some of which do use DMA) without realizing this. It really is amazing that it all worked. Once I put things in the correct order, a cascade of other small errors showed up. Fixing everything up took a while, but now I can get through the NBA tests that were giving me trouble while everything else still works. I still have a small list of things to do before tackling VRAM access timing, with the top of the list being video capture DMA, but I'm pretty sure most everything else is solid. There are also some untested cases that could effect TAS console sync, like exact timing of audio FIFO DMA and DMA IRQs, multiplication timing, and probably various horrible prefetcher edge cases, but those are things I'd worry about after VRAM. Still a lot to do.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
Making good progress towards accurate VRAM access timing. Backgrounds and palettes are pretty much done, so all of those access timing tests work now. This allows stuff like the infamous coin flip in the Madden NFL games to work. It also makes the Tonc tte demo profiling work correctly (after I fixed a bug in cpu timing that had me stuck for quite a while.) The only one not correct is the obj one, which I haven't gotten to yet: With numerous other small bug fixes happening along the way, things are shaping up pretty well. I expect sprite access timing to be a fair bit more difficult than the others, so it will probably take a while. I also have a fair bit of refactoring to do before the ppu can be really considered cycle accurate, but I plan to that incrementally rather than with a sweeping re-write, it's been working out well so far. PPU emulation aside, I still need to do things like Flash save ram, gyro controls, RTC, and possibly solar sensor, I might do some of these before tackling sprite timing just to change things up a bit. Link cable emulation would also be cool, but I don't think I can make it run full speed.
GJTASer2018
He/Him
Joined: 1/24/2018
Posts: 303
Location: Stafford, NY
Alyosha wrote:
Link cable emulation would also be cool, but I don't think I can make it run full speed.
How would that even work in theory? Is it something that would happen online, or between two instances of BizHawk running on the same machine?
c-square wrote:
Yes, standard runs are needed and very appreciated here too
Dylon Stejakoski wrote:
Me and the boys starting over our games of choice for the infinityieth time in a row because of just-found optimizations
^ Why I don't have any submissions despite being on the forums for years now...
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
GJTASer2018 wrote:
How would that even work in theory? Is it something that would happen online, or between two instances of BizHawk running on the same machine?
It would be the same as for Gameboy, just two copies of the system running at the same time, on the same instance, with some linking logic between them. Currently for very complicated games, such as the quake demo, my core only runs at 65 fps, so for two systems with linking logic it would probably be around 30 fps. Compare to mGBA which can run around 400 fps for the same game, it has plenty of room to run 2 systems.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
After my previous post about VRAM access timings, it occurred to me that there were a lot of edge cases that my current code would not be able to handle. It seemed I would have to do a cycle accurate background pipeline sooner rather than later. So I went ahead and implemented it. Now every VRAM and palette fetch for backgrounds occurs exactly when it should. This allows me to be able to correctly display the 'mode 7 demo' ( https://github.com/ladystarbreeze/mode7demo ), which I think is pretty cool: The resulting rendering code is considerably longer and more complicated than it was before though, and it comes at a pretty steep performance cost of 10% (or more for complicated scenes.) Some games now run under 60 fps at times. I still have to rewrite the sprite pipeline too, but I don't think that will have as big an impact, so things shouldn't get much slower. but I'll definitely need to focus a lot on optimization now to claw back some of that lost fps.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
With some basic optimization I was able to recover about half of the performance I lost with the rendering rewrite, Still needs a lot of improvement needed, but its getting there. For now I'm taking a bit of a break from rendering stuff and working on some other features. I added basic Flash RAM support. There is a lot of variability in Flash RAM, and what I have now is only rudimentary but it's a good start. I also added in tilt control support, so now Yoshi Topsy Turvy and Koro Koro Puzzle work. I'll probably add in solar sensor etc. for the Boktai games next.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I decided to implement cycle accurate sprite emulation since I had some free time. As expected it wasn't that difficult or impactful on performance since it was only a small modification from what I already had. Now all the TTE demo profiles are timed correctly: Additionally all the NBA RAM access tests work correctly. I fixed numerous other bugs along the way as well, so now some of the more complicated homebrews like Vroom3D work correctly. Unfortunately, all of this still isn't enough to make Metroid Fusion sync on console. There are still a few more things I need to work on, so hopefully one of those fixes it.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I haven't had time to focus on this recently but going into the summer I hope to make a big push towards getting accuracy high enough to get first party games console verified. I realized I had skipped some cases in my prefetcher emulation. which was causing some accuracy issues. Fixing this made the Inside-cap visual novel (Higurashi no Nakukoroni (J)) finally pass the emulator check. Note to actually play this on GBAHawk you need to fix the header, which requires adding the Nintendo logo and fixing the checksum (address 0xBC to be 0x23.) At the same time this also helped Fire Emblem sync pass the title screen (because I wasn't making accessing SRAM impact the prefetcher.) It still desyncs a little ways into the first level, but its getting further at least. Surprisingly this didn't effect Metroid Fusion at all, oh well. I'm not quite done with prefetcher yet, the new version of AGBEEG cart tests adds some tests that I currently fail, but these seem to effect enabling and disabling the prefetcher, (which games don't really do, they tend to just turn it on and leave it) so I don't think that that will effect much. I'll still fix it anyway though. So the accuracy grind continues.
Editor, Experienced player (570)
Joined: 11/8/2010
Posts: 4036
Sounds like promising progress. Thanks for the update, Alyosha. I like keeping up with your efforts to improve emulation accuracy (with GBHawk, Atari7800Hawk and others too), even if I don't usually comment about it.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
Alright so I improved prefetcher emulation so that the new AGBEEG cart tests now pass, improved sprite horizontal mosaic based on a new test rom by fleroviux, and fixed latching of rotation parameters so that Gadget Racers now displays correctly. Plus I fixed some other miscellaneous bugs. Making steady progress. I looked around at the various emulator issue trackers and looked over the test roms I have, and it looks like I fail the following tests: world_map.gba - some kind of hblank timing issue maybe EDIT: The issue here is Audio DMA happening at the end of a scanline, delaying the HBL IRQ code from starting, which in turn delays updates to rotation and scaling parameters until after the start of the next scanline. Need exact audio DMA timing and FIFO emulation. win_demo.gba - I don't yet implement the glitchy vertical window effect vram-mirror.gba - some kind of check for upper VRAM memory map mirrors, haven;'t figured it out yet. test_obj_window.gba - glitchy effects of the OBJ window that I haven't looked at yet sbb_reg.gba - I need to emulate open bus on the ppu vram accesses for OOB addresses So if I can get these working I will be passing all test ROMs that I know of. Aside from that I need to emulate STOP mode and RTC / light sensor for basic functionality (ignoring things like e-reader and other peripherals.) Finally getting close to the frontier of GBA emulation.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I fixed two of the above test roms (vram mirrors and window test.) PPU open bus shouldn't be too hard, just need to be careful with keeping track of accesses. OBJ window looks like just understanding internal state updates, but I haven't looked at it carefully yet. World Map seems the most difficult, it relies on audio FIFO DMA timing not interfering with HBL IRQ code execution. I haven't looked at audio since I first got it working, so I imagine there are numerous timing errors to sort out still. Taking a break from the accuracy grind, I decided to at least put the infrastructure for linking in place, even though there is no linking logic yet. This was surprisingly not too difficult and for now at least even can run at above full speed on a modern desktop cpu. I haven't even started serial port emulation, so this is a long way from functional linked play, but it's a start. Three or Four player at full speed is probably out of reach (unless you are a professional overclocker or something) so I don't think I'll implement that like I did for GBHawk. I'm not sure what the TAS possibilities are here, maybe all the interesting ones would be 4 players. Anyway at least for 2 player it's something I will work on in between accuracy improvements.
GJTASer2018
He/Him
Joined: 1/24/2018
Posts: 303
Location: Stafford, NY
Alyosha wrote:
Three or Four player at full speed is probably out of reach (unless you are a professional overclocker or something) so I don't think I'll implement that like I did for GBHawk.
Maybe out of reach with today's technology, but I still think that it would be possible sometime in the near future (late 2020s)...
c-square wrote:
Yes, standard runs are needed and very appreciated here too
Dylon Stejakoski wrote:
Me and the boys starting over our games of choice for the infinityieth time in a row because of just-found optimizations
^ Why I don't have any submissions despite being on the forums for years now...
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
GJTASer2018 wrote:
Alyosha wrote:
Three or Four player at full speed is probably out of reach (unless you are a professional overclocker or something) so I don't think I'll implement that like I did for GBHawk.
Maybe out of reach with today's technology, but I still think that it would be possible sometime in the near future (late 2020s)...
Well I should say that it's out of reach for GBAHawk. Surely with mGBA you could do 4 linked instances at double or triple speed right now without a problem, even on a weak computer. Even a more accuracy focused emulator like Nanoboy Advance could do full 60 fps with 4 linked instances pretty easily on a strong modern day cpu. My core architecture is just really slow, but it also makes actually implementing linking particularly easy.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I've added SubGBAHawk which has (experimental) support for subframe resets. It should work exactly the same as for GBHawk, pick a cycle in the frame you wish to reset at and hit the power button. At the moment I don't plan to add subframe inputs, unless someone comes up with a specific use case. Subframe resets for save ram abuse seem to have the most utility, so hopefully this opens the door to some neat examples on GBA. I've also been fixing various bugs, though none of the remaining 3 not working test roms yet. I'm hoping in the next few weeks to do some work on actual linking, it seems like multiboot is an easy place to start.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
Making some progress on linking. Here is a short video demonstrating single pack linking for Mario Kart Super Circuit: Link to video EDIT: Mario Kart multi pack linking also seems to work, as does advance wars 2 single pack linking, so it's a bit more robust than just this one example, though I didn't try anything else yet. I made this using TAStudio, so at least under initial testing sync is stable, also movie playback and recording was at least at 60 fps, so surprisingly still full speed (on a modern desktop cpu anyway.) So far what I have is only very rudimentary and takes a lot of shortcuts, but its a worthwhile proof of concept. Linking for GBA is pretty well documented in GBA Tek, so getting things working is really just a matter of carefully sorting through everything there, but it is much more complicated than ordinary Gameboy as there are several modes and more shared state to keep track of. Now that I have a basic proof of concept working, refining it and improving it over time is a more approachable task. Back to accuracy improvements for now though.
fleroviux
She/Her
Joined: 6/3/2023
Posts: 1
Hello, good job on this emulator so far. I have been following this thread for a while. It's exciting to finally see another GBA emulation project focusing on cycle accuracy.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
fleroviux wrote:
Hello, good job on this emulator so far. I have been following this thread for a while. It's exciting to finally see another GBA emulation project focusing on cycle accuracy.
Hello and thanks! Your test ROMs have of course been a big help in this process, so thanks for your efforts there as well. Still a lot of work to do but I have enjoyed the process so far the GBA is a pretty neat piece of hardware.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
The initial release of GBAHawk led to the uncovering of a variety of bugs. Several small bugs were cleaned up straight away. There was however one very serious bug in the cpu emulation that somehow eluded detection until now. This bug made it so that unsigned bytes could sometimes be loaded as signed bytes due to not properly clearing the flag that decides which to do. This occurred partly because I was sloppy in managing the flag, but also partly due to a much more subtle error. I was setting the flag on instruction decode, but clearing it on instruction execution. This is normally fine, but ARM instructions can be executed conditionally. So the flag could be set and then never cleared. Surprisingly this didn't cause chaos everywhere, as I only noticed this by happenstance when looking over the unreleased Another World. It could have been effecting RNG or enemy behaviour in other games which wasn't so far detectable, so I'm glad I stumbled upon this before making too many TASes. Anyway I finally got around to implementing VRAM open bus for BG fetches, resolving another test ROM, leaving only 2 that fail. It seems I should do something similar for OAM accesses, but it is not clear to me how several edge cases should be resolved, so I'll hold off for now. The next immediate thing to do though is work on the RTC so gen 3 pokemon can be played. This will probably also lead to more work on linking. EDIT: also fixed obj window test, and associated demo, so that leaves only world map not working.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I released version 1.5.1 of GBAHawk which contains numerous bug fixes including initial compatibility for pokemon (though RTC still doesn't work.) I also realized I missed the 'archived' NBA tests in my accounting, where I currently fail some of the IRQ tests, seems to be some non-trivial things happening there I haven't looked at yet. I think the ppu tests are working, so current number of failed tests is 3. For now though it's finally time to finish RTC and solar sensor.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I released GBAHawk v1.6 with support for RTC and solar sensor. So now Pokemon RTC events should work and Boktai games can be played. I also added a setting for disabled RTC, which i guess is what pokemon speedruns use. I haven't tested pokemon linking yet, that's one of the next follow up things to work on. I believe with this the only other incompatibility is in WarioWare which needs a different kind of gyro sensor than Yoshi uses. I'll implement that eventually. Of course there are many peripherals but those hold no interest to me, so probably I'll leave it at that. Maybe the only nice thing to have would be the e-reader, but that's in the future. Time now to look at those IRQ tests, then I'll finally be caught up at the point of having to write my own test ROMs.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I released GBAHawk version 1.6.1 mainly as a bug fix release. A major bug was saving cart RAM twice, which obviously bloats the state size needlessly, especially when playing games with 256k RAM. Besides bug fixes I also added support for Warioware Twisted. I use a different control scheme than mGBA uses, at least in the BizHawk implementation. There, mouse position corresponds to angular rate (which is what the cart ultimately reads.) In GBAHawk mouse position corresponds to angular position. So you control the game with speed of mouse movement. I actually got a cart (JPN version) just to test this out. I think I captured the feel of the real game pretty well, hysteresis and all. That brings game compatibility up to where I want it for now. Next up is a rewrite of the IRQ handler.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3821)
Joined: 11/30/2014
Posts: 2829
Location: US
I finally redid my IRQ code, and the results are much cleaner and resolve a couple of issues. One issue is when IME was being checked which would have been hard to fix with my old code but is pretty naturally following the correct timing in the new code. The other is an edge case encountered in trading in pokemon where a serial IRQ was triggered at exactly the wrong cycle to skip an IME enable. The old code had no way to follow up on this and trigger it later, but its not a problem for the new code. Below is a video of pokemon trading in action. Link to video The linked core is able to keep up at 60 fps everywhere except saving. Other games like Advance Wars 2 also don't make it to 60 fps with multi-pak linking at all, as there is simply too much code running, but at least there are no linking errors. I have some more clean ups to do before making an official release. So far in my testing there are no regressions in test ROMs, games, or TASes, so that's good (you never really know what will happen when changing something fundamental like IRQs.) The last thing I want to do is figure out cancel-irq-ie.gba, which seems to be an issue of figuring what happens when flags/enables are being set/cleared at the same time.