One of the things that keeps me interested in emulation is just how close you can get to being right while still ultimately being wrong. I was recently working through the newer NBA ppu tests and somehow things just were not working out. It seemed like there was some off by one issue with DMA even though many other stringent DMA timing tests worked. Several hours of tinkering got me nowhere, then by happenstance I was scrolling through my code and noticed my execution loop, where it turns out I had a made a very basic error. Here is what things looked like before:
dma_Tick(); // DMA
pre_Tick(); // prefetcher
ser_Tick(); // serial port
tim_Tick(); // timers
cpu_Tick(); // CPU
The problem here is that both the CPU and the DMA channels can read and write to the timer registers and access ROM, but the DMA tick is happening before the timers and prefetcher while the CPU is happening after. Somehow I had gone through all the grueling timer and prefetcher tests (some of which do use DMA) without realizing this. It really is amazing that it all worked. Once I put things in the correct order, a cascade of other small errors showed up. Fixing everything up took a while, but now I can get through the NBA tests that were giving me trouble while everything else still works.
I still have a small list of things to do before tackling VRAM access timing, with the top of the list being video capture DMA, but I'm pretty sure most everything else is solid. There are also some untested cases that could effect TAS console sync, like exact timing of audio FIFO DMA and DMA IRQs, multiplication timing, and probably various horrible prefetcher edge cases, but those are things I'd worry about after VRAM. Still a lot to do.