1 2
12 13
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3827)
Joined: 11/30/2014
Posts: 2834
Location: US
I'm making this thread similar to the AtariHawk one so I can sort things out easier. This thread is for general bug fixes in NESHawk, please post any issues you have. Things to emulate: DMA on multi-write instructions Exact power on and reset behaviour FDS / NES audio filtering / mixing second look at sprite limit VS Dual System microphone Things to investigate: FDS Metroid FDS Yume Koujou Doki Doki Panic chinese translation for TMNT3 VRC7 sound (ex Lagrange Point) Chaos World (CH) Save Ram Tests to Pass: scanline/scanline tvpassfail/tv Games that don't work: unsupported low priority Lots of pirate and multi cart stuff Test build: https://ci.appveyor.com/project/zeromus/bizhawk-udexo/build/artifacts
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
RDY wasnt added until after the relevant code in neshawk was made. most of the bugs in neshawk can be solved by burning it down and replacing it with code based on more up-to-date knowledge. it was accurate at the time but is outdated now.
Skilled player (1743)
Joined: 9/17/2009
Posts: 4986
Location: ̶C̶a̶n̶a̶d̶a̶ "Kanatah"
Well, I know Blues Brothers has different lag on the 2nd/third stage that varies from BizHawk, FCEUX, and a version of FCEUX that fixes a bug in Mahjong. That fix also desyncs a number of other runs too, linked in the post.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3827)
Joined: 11/30/2014
Posts: 2834
Location: US
I have been (very) slowly working on NESHawk PPU. So far I have rewrote sprite evaluation to take place simultaneously with background generation, as it is with a real NES. Together with proper read behaviour of 2004 during this time, I finally managed to fix those annoying horizontal lines and shaking in Micro Machines. This is still very early WIP work and generally breaks other things, but the timing is correct so i should be able to start slowly fixing other timing issues and bugs and such. Right now it also slows down emulation by a noticable factor as well. I don't think my code takes considerably longer to run then the original, so this is the first major issue to be worked out.
adelikat
He/Him
Emulator Coder, Site Developer, Site Owner, Expert player (3576)
Joined: 11/3/2004
Posts: 4754
Location: Tennessee
wow! You are a beast, sir. Good job.
It's hard to look this good. My TAS projects
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
Here's which tests it doesn't pass so far: http://tasvideos.org/EmulatorResources/NESAccuracyTests.html And here's the reference how accurate an emulator should be: https://github.com/punesemu/puNES
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Editor
Joined: 3/31/2010
Posts: 1466
Location: Not playing Puyo Tetris
Alyosha wrote:
I have been (very) slowly working on NESHawk PPU. So far I have rewrote sprite evaluation to take place simultaneously with background generation, as it is with a real NES. Together with proper read behaviour of 2004 during this time, I finally managed to fix those annoying horizontal lines and shaking in Micro Machines. This is still very early WIP work and generally breaks other things, but the timing is correct so i should be able to start slowly fixing other timing issues and bugs and such. Right now it also slows down emulation by a noticable factor as well. I don't think my code takes considerably longer to run then the original, so this is the first major issue to be worked out.
Now this is a good thing to have. Been bothering me for a while with that game. Glad to see it becoming 100% perfect/accurate.
When TAS does Quake 1, SDA will declare war. The Prince doth arrive he doth please.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3827)
Joined: 11/30/2014
Posts: 2834
Location: US
I greatly improved my code and committed to my fork for testing. So anyone who wants to try it out please do and let me know if anything is amiss. I have tested with several games so far (SMB, SMB3, Battletoads, Micro Machines) and found no deficiancies. https://github.com/alyosha-tas/BizHawk This version also passes 2 accuracy tests that previously failed sprite_overflow_tests/3.Timing sprite_overflow_tests/4.Obscure
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3827)
Joined: 11/30/2014
Posts: 2834
Location: US
NesHawk now passes: apu_test/rom_singles/7-dmc_basics Figuring out what was going on here took some effort but I now feel the implementation is pretty accurate. Unfortunately it now hangs completely on sprdma_and_dmc_dma. I haven't the slightest idea why. Previously the test resulted in answers off by a factor of about 5, but at least it finished. I'm not sure what to make of this and posted on NesDev looking for some insight. Safe to say it's still a work in progress. But other games i tested that use DMC still work fine, so prehaps I'm just off by a couple of cycles, sprdma_and_dmc_dma is a very exacting test. EDIT: AFter a bit of work filling in some of the undocumented opcodes, I am now able to pass instr_misc/instr_misc So 4 extra passing tests means we should be tied for 2nd place with MyNes, alright!
Joined: 7/29/2009
Posts: 55
This may be unrelated, but do any of these tests correlate with how accurate lag emulation is or could something like this only be really tested by comparing movie files on nesbot and on emu?
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
accurate lag is directly correlated to accurate timing. any time timing is made more accurate, lag is made more accurate. Whether a particular change makes nesbot sync is irrelevant unless that's the only data concerning whether or not the change is accurate. In the case of tests, whether the tests pass is data; nesbot is irrelevant.
Editor
Joined: 3/31/2010
Posts: 1466
Location: Not playing Puyo Tetris
Potato Stomper wrote:
This may be unrelated, but do any of these tests correlate with how accurate lag emulation is or could something like this only be really tested by comparing movie files on nesbot and on emu?
In theory, the more accurate the NESHawk core gets, the more likely NESBot will be able to playback more TAS. However, the TAS MUST be done on the Changed and Improved NESHawk core. Old TAS will still Desync/not work.
When TAS does Quake 1, SDA will declare war. The Prince doth arrive he doth please.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
What about attempting to speed up the core too? FCEUX is so fast because it uses a separate function for every memory address for read and write, so it has to do zero checks regarding regions (and I think Nestopia too). Considering that reading is done at least every instruction, and sometimes more, and quite often it also writes, doing this stuff millions times per second with all the region checks must be causing quite some slow down. I also tried manually inlining the opcode functions right into the switch, but that gave zero speed up.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
creaothceann
He/Him
Editor
Joined: 4/7/2005
Posts: 1874
Location: Germany
feos wrote:
FCEUX is so fast because it uses a separate function for every memory address for read and write
For every byte of the CPU address space there's a function for reading and a function for writing?
feos wrote:
all the region checks must be causing quite some slow down
CPUs have branch predictors, so if there aren't too many tests (and they follow predictable patterns) it should still be fast enough.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
creaothceann wrote:
For every byte of the CPU address space there's a function for reading and a function for writing?
Yes.
creaothceann wrote:
CPUs have branch predictors, so if there aren't too many tests (and they follow predictable patterns) it should still be fast enough.
How can you predict what memory address a game will need?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
creaothceann
He/Him
Editor
Joined: 4/7/2005
Posts: 1874
Location: Germany
- reading blocks of data: predict it's sequential and read ahead - loops: predicting that the CPU jumps back to the beginning of the loop - switches: remember the most-travelled path
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3827)
Joined: 11/30/2014
Posts: 2834
Location: US
feos wrote:
What about attempting to speed up the core too?
Well, my current improvements to accuracy slow down the core by a noticable amount. At 400% speed I can run SMB3 on my laptop at about 180 fps on 1.11.6. The current Bizhawk master build can do about 160. My current test build can do about 152. So... we're kind of going in the wrong direction there 8D These represent almost entirely ppu changes. So that might also be a good place to look for performance improvements. EDIT: after spending a long time to find a very small bug, oam_stress/oam_stress now passes. I'm also making progress on sprdma_and_dmc_dma, I'm hoping maybe by the end of the month those tests will pass.
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
Good work! 180 to 152 is fine with me. CPUs speed up over time, accuracy improves over time. NESHawk isnt made to be fast. But it's a bit worrisome if it continues. The architecture may not support this level of accuracy without the speed degrading to obscene levels. I hope we don't need to burn it all down and rebuild just for speed reasons. By the way can you please install https://visualstudiogallery.msdn.microsoft.com/c8bccfe2-650c-4b42-bc5c-845e21f96328
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
zeromus, what do you think about the fceux approach I mentioned?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
creaothceann
He/Him
Editor
Joined: 4/7/2005
Posts: 1874
Location: Germany
zeromus wrote:
But it's a bit worrisome if it continues. The architecture may not support this level of accuracy without the speed degrading to obscene levels.
How obscene?
zeromus wrote:
I hope we don't need to burn it all down and rebuild just for speed reasons.
I had a look at the source after feos' post, and while I didn't find all these functions (macros?), I saw lots of "hack", "probably shouldn't do it but we do it anyway!" etc. comments. It doesn't really make for an impression of good and stable architecture.
Site Admin, Skilled player (1255)
Joined: 4/17/2010
Posts: 11495
Location: Lake Char­gogg­a­gogg­man­chaugg­a­gogg­chau­bun­a­gung­a­maugg
Language: c

// function pointer type typedef uint8 (*readfunc)(uint32 A); typedef void (*writefunc)(uint32 A, uint8 V); // macro to define the body #define DECLFR(x) uint8 x(uint32 A) #define DECLFW(x) void x(uint32 A, uint8 V) // example region static DECLFR(ARAM) { return RAM[A]; } void SetReadHandler(int32 start, int32 end, readfunc func) { // do all needed checks here, when emu starts // go through all cells for (x = end; x >= start; x--) // declare a function ARead[x] = func; } // example call for a region SetReadHandler(0, 0x7FF, ARAM); // and this is used in opcodes static __inline uint8 RdMem(unsigned int A) { return(_DB=ARead[A](A)); }
It doesn't really make for an impression of good and stable architecture.
It's probably written oddly, the core is old and not many people wanted to refactor it, but its just regular C, and I see no harm in having an array of functions. And if it speeds things up, that's actually a benefit, since this is probably not the case when being slow is critical and required.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
Feos, be my guest. I don't like the way it looks, and it's a speedhack, but that part of the emulator core is not likely ever to change again, so its a fine time to speedhack it. Just don't do it for the mappers first, that's too big a job.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3827)
Joined: 11/30/2014
Posts: 2834
Location: US
creaothceann wrote:
zeromus wrote:
But it's a bit worrisome if it continues. The architecture may not support this level of accuracy without the speed degrading to obscene levels.
How obscene?
No where near that bad. The performance hit so far came from code that runs every pixel (3 times per cpu tick) but that is done now. I think only accurate APU emulation remains that will a signifcant negative impact on speed (my guess is another 2-3%.) After that I do plan to go over everything and make obvious optimizations to gain back some of the performance losses, but I doubt I can reach parity with 1.11.6 again in terms of speed. I don't really understand what feos is suggesting with the cpu refactor, so I'll leave that up to him, seems like a huge undertaking.
zeromus wrote:
By the way can you please install https://visualstudiogallery.msdn.microsoft.com/c8bccfe2-650c-4b42-bc5c-845e21f96328
Sure thing I installed it, but I can't tell what, if anything, it is doing, how do I use it? _____________________ I've been testing various games known to be tricky to emulate. The first game that was previously incompatible with BizHawk now works, Fire Hawk! Curiously this game runs on FCEUX but went to black screen on previous versions of BizHawk. If anyone knows other games which are just plain incompatible please let me know (I am aware of Time Lord currently) having games to test on really helps. EDIT: oh, the recent fixes to OAM reads also fixes cpu_dummy_writes/cpu_dummy_writes_oam, so, scratch another one off the list!
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
it makes your bizhawk code contain tabs instead of spaces
Skilled player (1743)
Joined: 9/17/2009
Posts: 4986
Location: ̶C̶a̶n̶a̶d̶a̶ "Kanatah"
I got a suggestion: Try running the console verified runs on BizHawk and see if they sync. Most of them were done on FCEUX, but if for whatever reason it does not sync maybe something is wrong?
1 2
12 13