Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Ah, I replied to feos about this on reddit before noticing these posts here, my bad.
count_errors/fast should fail on real hardware with a regular controller attached. You and feos have already had that conversation in the thread he linked, though.
Like you said, the dmc_dma_during_read4/dma_4016_read does technically test for the same behavior, so I'm not sure count_errors has much purpose, though.
I'll try to type up a list of tests + known duplicates of those tests + useless tests and maybe we can see which ones are worth keeping and which should be thrown out.
For read2004 - I failed NesHawk because the test ended with $00s instead of $FFs, I'm unsure if that's supposed to happen or not, but the test results given by Quietust originally said this should end with $FFs?
I think is is still a bit misunderstood, but having '*' in the count error tests is passing, not failing. FCEUX is failing the test by having no '*' . The problem is that the test is inconsistent on things like powerpack since it DMC counter is always running.
I still suspect that if run from power on, and on a cart, there will be consistent (or only a small number) of actual test results.
If anyone is able to do this I think it will really clear things up and help understand DMC start up state.
It took a few hours, but I managed to go over most of the test roms I have collected and put them together.
download link
There's a rather lengthy readme.txt file I've written with a good amount of details about what some of the tests are supposed to display to pass, which tests I think should be excluded, etc.
It contains 289 rom files, of which I'd expect maybe 60+ to be excluded from a test result grid.
Hopefully this helps - let me know what you guys think.
Re: count_errors
The exact result depends on CPU/PPU alignment - a reset on a NES/Famicom will always change the alignment. Emulators always emulate a single possible alignment, typically. Mesen has an option to disable the PPU's reset when resetting the console (like what a Famicom actually does), which technically lets you change the alignment to some extent, and changes the test's result on each reset.
On a north american NTSC NES, count_errors should technically only have a maximum of 4 different results (there are only 4 possible alignments CPU/PPU alignments) - but I could be wrong on this.
Joined: 4/17/2010
Posts: 11492
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
I'm pretty sure we should host these roms in a repo and give acces to it to those who know what the tests are doing. I'll ask adelikat if it makes sense to host them in the TASVideos organization.
I'll go through the tests and what I used as an expected result for them and provide my thoughts on the readme... in a few days.
And I request people who know 6502 asm to help with this as well. Together we should make a list of the ROMs people should obey :D
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
A few months ago, when I reworked the test list at http://wiki.nesdev.com/w/index.php/Emulator_tests , I suggested the same to people on NesDev, but some people didn't seem to like the idea of hosting test roms that other people had made without their permission. That's why I ended up linking directly to blargg's website, or forum attachments for the links on that page.
Ideally for the tests, we'd want to reconfirm some of them on a NES (preferably with a flashable NROM cartridge rather than a powerpak..)
Joined: 4/17/2010
Posts: 11492
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
Yeah, except some of them are already hosted in a repo. And if we aim for accuracy, we shouldn't leave it scattered throughout the entire nesdev forum, relying on someone not deciding to wipe all his posts for whatever reason.
If you wish I can also go through them in order to probably find some legal notes, and otherwise contact the authors directly. Yet it's a silly reason they've come up with to prevent making it useful for the community.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Yeah having an updated test rom set hosted in one place would be really valuable. Stumbling across test roms by chance in the forums is frustrating.
Also, we really need to move beyond these fundamental test roms , and the first step in that is just accounting for all of them. True's console testing shows that we still don't have sufficient information to TAS games that require ppu tick level accuracy but there is nothing new to test to point out what is wrong.
I agree it'd be a lot more convenient to have them all in one place. It's just a matter of trying not to upset anyone in the process - asking people would be best when possible. I'd imagine there are only about 5-6 different people to ask to get permission for the majority of all test roms floating around.
Finding test roms in random threads that I wasn't aware of is pretty much why I updated the wiki's test rom list earlier this year. And even that is most likely missing a handful of roms still.
I can't say I remember the specifics too well - I think a lot of it had to do with the APU frame counter.
They're from here:
http://forums.nesdev.com/viewtopic.php?f=3&t=11174
They've been tested by tepples on a NES, iirc, but apparently #2 and potentially #6 can either pass or fail based on cpu/ppu alignment.
I wrote this backwards - NesHawk displays $FFs at the end of the test, whereas Quietust's post says it's supposed to be $00 for the last values. Also, one thing I noticed by looking at it again, is that there are 5x $06s at the end, before the $00s. Mesen 0.8.1 had 3, the latest BizHawk release has 4. I've managed to tweak it to be 3x $FF at the beginning, and ending with 5x $06 + 5x $00 - nothing seems to have broken in the process, so I'll stick with that for now.
The values at the end are read from un-initialized RAM, which NESHawk happens to set to FF, so at least that much I'm not worried about. I'll look into the $06's a bit more though.
Some console testing was done on this and the results were highly variable.
The vrc6 tests at the end are based on my best understanding of a translation of some Japanese documentation, but to my knowledge they've never been verified on real hardware. Due to the nature of the tests, they'd need to be done on a real vrc6 chip, not an everdrive or similar.
Yea, I just learned this from trying to add support for BK2 movies to Mesen.
Actually, it looks like someone eventually ran it on a real VRC6 chip and got the "All tests passed" screen:
https://forums.nesdev.com/viewtopic.php?p=138055#p138055
Yea, I just learned this from trying to add support for BK2 movies to Mesen.
natt wrote:
The vrc6 tests at the end are based on my best understanding of a translation of some Japanese documentation, but to my knowledge they've never been verified on real hardware.
bk2 support in mesen would be amazing! It should improve the verification and emulator improvement process significantly.[/size]
It's somewhat done: commit
There are a number of sync issues, though - and limitations: reset/power is ignored, only supports standard controllers (up to 4 players). I've also ignored the SyncSettings.json file completely for now (no need for basic tests)
Things I recorded in NesHawk and played back in Mesen:
-Super Mario Bros: OK
-Super Mario Bros V.S: OK
-Akumajou Dracula: Desyncs - the FDS implementation between Mesen/NesHawk is pretty different with regards to how the drive is emulated, so this is mostly unavoidable unless we agree on very specific timings.
-Contra: Desyncs near the end of level 1
Things I grabbed off TasVideos:
-Balloon Fight by Weegeechan: OK
-Castlevania 3 (Warp glitch): Desyncs, but seems to desync in NesHawk too?
-Addams Family by ventuz: Desyncs, but seems to desync in NesHawk too?
The main thing I had to modify in Mesen was the moment the frame number is changed (scanline 241 for NesHawk, was scanline -1 for Mesen), else almost everything was desynced.
One thing I noticed in NesHawk, after a reset (Scanline = 0, Cycle = 0), FrameAdvance is called, which calls this:
runppu(postNMIlines * kLineTime - delay);
This might mean that the CPU and PPU are running for ~20 scanlines, before resyncing back to scanline 0? I'm assuming I'm wrong since that would probably screw up a lot of stuff..
At this point, I guess I would need to use NesHawk's trace logger & compare it with Mesen to see why Contra desyncs - the rest is hard to judge since FDS will desync no matter what at this point, and the other 2 also desync in NesHawk.
EDIT: Forgot to mention, this only supports reading bk2 files, not creating them.
Things I recorded in NesHawk and played back in Mesen:
-Super Mario Bros: OK
-Super Mario Bros V.S: OK
-Akumajou Dracula: Desyncs - the FDS implementation between Mesen/NesHawk is pretty different with regards to how the drive is emulated, so this is mostly unavoidable unless we agree on very specific timings.
-Contra: Desyncs near the end of level 1
Things I grabbed off TasVideos:
-Balloon Fight by Weegeechan: OK
-Castlevania 3 (Warp glitch): Desyncs, but seems to desync in NesHawk too?
-Addams Family by ventuz: Desyncs, but seems to desync in NesHawk too?
The main thing I had to modify in Mesen was the moment the frame number is changed (scanline 241 for NesHawk, was scanline -1 for Mesen), else almost everything was desynced.
One thing I noticed in NesHawk, after a reset (Scanline = 0, Cycle = 0), FrameAdvance is called, which calls this:
runppu(postNMIlines * kLineTime - delay);
This might mean that the CPU and PPU are running for ~20 scanlines, before resyncing back to scanline 0? I'm assuming I'm wrong since that would probably screw up a lot of stuff..
At this point, I guess I would need to use NesHawk's trace logger & compare it with Mesen to see why Contra desyncs - the rest is hard to judge since FDS will desync no matter what at this point, and the other 2 also desync in NesHawk.
EDIT: Forgot to mention, this only supports reading bk2 files, not creating them.
Contra uses the DMC so any stray controller read glitches will throw off the run. I imagine we don't emulate that exactly the same so I'm not surprised that one fails.
After reset, BizHawk runs a ppudead frame similar to power on, so no it doesn't jump into a frame in that awkward way, but it also isn't exactly the same as hitting reset either. This is a work in progress.
I only slightly modified FDS code to improve sound emulation and otherwise have not looked at all into how it is emulated, yeah probably hopeless to sync for now.
I'll try to post some BK2's that should sync in mesen in the next couple of days when I have a bit more time.
On a north american NTSC NES, count_errors should technically only have a maximum of 4 different results (there are only 4 possible alignments CPU/PPU alignments) - but I could be wrong on this.
We tested PPU alignment on the frontloader I have. From either hard power or reset, I got something like 8 or 10 different alignments. Unless we didn't understand the test...
I can test what is needed on real hardware, but I am running low on EPROMs and my eraser may have been stolen.
Sour, here are a couple of bk2's to test against Mesen, They are for Bionic Commando. I'll get a few more editted into this post by the end of the day too.
This is a good test since it requires fairly careful timing but not quite down to the ppu tick.
http://tasvideos.org/userfiles/info/37881733414401898http://tasvideos.org/userfiles/info/37881743719985076
If you are feeling ambitious, this battletoads run is known to sync completely through on console:
http://tasvideos.org/userfiles/info/38673693612623545
Battletoads requires exacting ppu level timing from power on to sync, so this run is probably the single strongest piece of evidence that BizHawk is actually doing something correctly beyond what FCEUX is able to do for actual games. You might need to fiddle with power on timing to get it to sync though.
EDIT:
Here are 2 more test runs for Streemerzzz. They sync on BizHawk but console testing indicates they both should desync. I'll be very interested to see what Mesen does with them.
http://tasvideos.org/userfiles/info/36472647975351756http://tasvideos.org/userfiles/info/36428673262038989
I tried testing myself but after I compile Mesen and try to run it I get errors (both in x86 and x64):
Thanks, here are the results:
-Both Bionic Commandos sync to the end.
-With a small modification (altered the PPU/APU sync after reset in CPU::Reset), Battletoads syncs at first, but then desyncs a bit after the warp.
-Both streemerz runs desync. streemerz_joe desyncs almost right away. The other one gets relatively far (maybe ~2 mins in?) and eventually desyncs in a room that had ~10 clowns.
For the bugs after compilation, the build process is silly and you need to build 2x for it to work properly (otherwise the resource files will be missing from .exe) - it shouldn't be an issue except the very first time you build it.
Also, I just commited support for MD5 hash checks in bk2 files (had only implemented SHA1 ones since it seemed like they were the only thing used), you'll need that if you want to try the Bionic Commando/Battletoads movies.