Micro Machines depends heavily on PPU revision. It reads from $2004 to perform some of its graphics routines. However, on early NES's this address wasn't readable, so this would result in heavy graphical glitches.
Incidently, I also rented this game as a kid and don't remember any graphical glitchs so I guess you just got unlucky, too bad since its such a fun game.
Found on Wikipedia this information of NES Micro Machines:
''Micro Machines was completed in September 1990. The game did not go through quality assurance, and as a result, a major bug that caused the game to crash was discovered near completion. The bug occurred when the player tried to reverse on the first race, and escaped notice because none of the testers thought to do so as they thought it was so easy. It was determined that just one binary bit was wrong, and a "mini Game Genie" was installed on the cartridges to correct it, as by the time the bug was discovered, there were plenty of ROM chips containing the bugged version.''
My homepage
--Currently not much motived for TASing as before...--
But I'm still working.
Joined: 4/17/2010
Posts: 11480
Location: Lake Chargoggagoggmanchauggagoggchaubunagungamaugg
No.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
I've been making some progress on NES audio lately. NESHawk now sounds correct in the APU_mixer tests (which previously sounded very poor) and I implemented correct behaviour to pass a few more of the test_apu_2 tests.
It doesn't really sound any different to me in actual gameplay, but the tests say its much closer to the real hardware, so maybe someone with a good ear might be able to tell the difference.
Currently only test_apu_2 tests 2 and 6 fail, but it's not clear that both of these should pass in the default ppu-cpu alignment. Tepples on NesDev had tested these previously and found that they randomly pass or fail on a real hardware, strongly indicating an alignment problem. Mesen passes both of these tests incidently, so this might be a good opportunity to test for emulation differences.
I've also fixed a number of buggy games recently, although probably no one is actually interested in playing them but hey they work at least.
There is still quite a bit left to do for test ROMs and mapper compatibility, but it's getting there.
I've made a few more improvements to NESHawk. I cleaned up the code a bit and refactored some execution, so it got a little faster, and the code base is also in a better form for future testing as needed.
Also I noticed a bug in my ppu code, which was apparently impacting test 2 and 6 in test_apu_2. Having fixed that bug NESHawk now passes all 11 of those tests, so that's good.
I continue to look for things that can possibly impact console testing as well, but so far nothing promising has come up.
Recently I've been trying to improve NESHawk's performance. Working on other cores has really helped me see inefficiencies in code and target places to improve. Unfortunately, most of my ideas don't pan out. But, one obvious place to improve that did result in big gains was fetching sprite pixels.
Previously, the routine in BizHawk checked every sprite on a scanline for every pixel. This is very inefficient, especially when the information about the sprites doesn't change during a scanline (since it was compiled the previous scanline. )
Instead, I made a buffer to contain sprite data for sprites at the time the sprites are processed. This sped up this part of the code considerably, resulting in an overall speed boost of ~5%.
There is still a long way to go though. To give some context, right now NESHawk can run Battletoads at about 130 fps on my laptop. Mesen meanwhile can do around 160 fps. Since so far I'm out of ideas on how to speed things up anymore, that 30 fps difference won't be getting closed anytimg soon. Still, at least fps is slowly going up now instead of down!
I'm curious why your results are so different from my own tests. e.g on my computer:
MM2 Stage Select - Mesen: 320fps, Bizhawk 2.2: 110fps
Battletoads pause screen - 315fps vs 110fps
Super Dodge Ball title screen - 440fps vs 113fps
Not sure if something is causing Bizhawk to cap around 110fps on my computer (first gen i5 @ 3.4ghz), but it is running at 100% cpu (and the QuickNes core does go up to 2000+fps). On the other hand, maybe something is causing Mesen to be slow on your computer? What's your laptop's CPU?
Also, just a note, if you're using a build of Mesen that you compiled, then it'll probably be ~10-20% slower than actual releases, since those use PGO to boost performance some more.
Oh you're right , by hitting the increase speed hotkey I can go up to ~ 350 fps on Mesen.
I didn't bother ever doing that because when I put it on 300% speed it was only giving me 160 fps instead of the expected 180 so I thought it was maxed out. Weird. 0_0
Any idea why the fps indeicator is off?
Well that honestly makes more sense, Mesen should be way faster given it's C++ core. But NESHawk should still be much better then it is, I wonder if I'm missing some really big slowdown somewhere.
That's weird - is it giving 160fps at all emulation speeds from 300%+ until it hits "Maximum speed"? If so, it might be something to do with the emulation thread sleeping too long on your computer (e.g the thread asks to sleep for ~5ms, but Windows doesn't wake it until 7+ms have passed).
Once you hit "Maximum speed", the thread doesn't sleep at all, so that would fix that.
Have you been using the built-in performance profiler in VS? It's really easy to use and pretty great at finding bottlenecks, too.
Nope, I tried all speed settings 100%-Max, and each setting gives an fps consistently under the expected except for 100%. 350% for example gives me 190 fps.
Yeah I've checked performance profiler several times, but the only thing it really tells me is that everything is really slow. Different parts of the core run as fast as I expect compared to other parts (like relative time running ppu operations compared to CPU,) it's just that each of those things is very slow. I'm not sure why compiled cores like QuickNES run so much faster. Maybe it is something in BizHawk's architecture or in the compiler that makes running compiled DLLs way more efficient, I can't imagine the code itself is that much worse.
As an example, even if I never run the CPU (and as a consequence the ppu never gets turned on) , I STILL only get 180 fps max.
That's just one of the limits of managed code, sadly. Despite what everybody likes to say, for these kinds of things, managed code or scripted languages like JS are definitely slower than compiled code. Just compare the Visual 2C02/2A03 and my C++ port - they're 10-20x slower and all I did was pretty much copy/paste the javascript code & adjust it to make it compile in C++. (Obviously JS is a good amount slower than C# in general)
I took a quick look at Neshawk and it looks like these 3 lines on their own are ~8% of the cpu usage:
That seems odd, but the managed array accesses might be slow in this case (e.g due to bound checks, etc.) - maybe try using unsafe code to access them? Unsafe code is already used elsewhere, so no real reason not to if it's causing a bottleneck.
Also, since your condition seems to me sl_sprites[1, ...] != 0, I don't actually think you have any reason to reset the other 2 indexes? That would probably help too.
PpuOpenBusDecay being called in runppu is also eating up a lot of processing (6%) - is there a reason you can't calculate the decay only when needed? (e.g when reading a register)
There are most likely some more ways to speed things up, but it's unlikely you'd break past 200fps or so, I think. e.g if you compare to MyNes (C#) and Nintaco (Java), I get 170fps/210fps in those.
And about the weird FPS in Mesen, could I ask you to check the Console::GetFrameDelay() function and see what values you're getting for frameDelay & emulationSpeed in it for a few different emulation speed values in the UI?
I'd be happy if I could get to 180 fps (without making the code unreadable.)
I tried not resetting the other 2 array values but it didn't seem to speed anything up.
I'm not familiar with what 'unsafe' code means. I'm not a programmer in real life so to me that sounds ... unsafe.
Good idea about the decay function though I'll definitely look into that.
Sure I'll report back tomorrow with frameDelay and emulationSpeed values. I'll edit the results into this post.
thanks for the help!
unsafe means the bounds wont be checked, and the managed code could now corrupt or crash the process. This would ordinarily be a bad idea for nice, supposedly bulletproof cores, but that ship has already sailed
That's for sure, that sounds awful. D:
@Sour: I don't know what happened, but I downloaded fresh from github and rebuilt (since my current local build wasn't running anymore for some reason after i compiled it) and now I'm getting the proper fps. It's a bit frustrating that I'm never able to get any evidence of any of these anomolies, but whatever. My only theory is that maybe something with the auto updater is impacting things, but that's just a guess.
I'd be happy if I could get to 180 fps (without making the code unreadable.)
I tried not resetting the other 2 array values but it didn't seem to speed anything up.
I'm not familiar with what 'unsafe' code means. I'm not a programmer in real life so to me that sounds ... unsafe.
Good idea about the decay function though I'll definitely look into that.
Sure I'll report back tomorrow with frameDelay and emulationSpeed values. I'll edit the results into this post.
thanks for the help!
C# doesn't have macros or that's how I'd be doing performance optimizations. As it stands, the C64 core is hardly optimized at all (mostly because it's still got some ways to go before it's decently compatible.)
C# does favor type and memory safety above performance, but it does offer "unsafe" access via pointers at least. I'm quite tempted to use them myself after the C64 core matures. I would refuse to do any sort of addressing with unsafe pointers that I couldn't calculate with bit masks, though: these tend to be the least crash prone if the array is large enough to hold all the possible addresses.
The code above uses multidimensional array access. That's expensive because of all the additional checks it must do to retain safety. int[][] is said to work better than int[,]. It also allows memory management to split up the inner arrays better in memory as opposed to allocation one massive chunk. Downside is each inner array needs to be initialized individually. Not a huge deal if only ever done once. Alternatively, if you wanted to throw readability right out the window, you could flatten it to one giant array and consider the sprite number as part of the address.
Those are my two cents.
One thing I never noticed before is that the method where that comes up is already in 'unsafe' mode. 0_0 So I guess array accesses really are that expensive.
I did try flattening out the array thogh and that did noticably speed things up so thanks for the tip!
I also have an array of structs in there and I read before that structs are slow as welll, maybe I'll try mangling that into a flat array and see how much / if it helps. not sure I want to sacrifice the readability that much but if the performance gain is high enough I might try to make it work.
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
@Sour: I don't know what happened, but I downloaded fresh from github and rebuilt (since my current local build wasn't running anymore for some reason after i compiled it) and now I'm getting the proper fps.
No problem, let me know if you ever get the same issue again.
About macros, while they do tend to improve performance, this is usually because they are the equivalent of inlining the whole code (instead of making potentially costly function calls). They are a pain to debug though (and sometimes understand), so I tend to avoid them whenever possible. Normally in C++, you can use __forceInline (or similar, depends on compiler) to force the compiler to inline functions. You can also do something similar in C# (as of .NET 4.5) by adding this attribute to a function: [MethodImpl(MethodImplOptions.AggressiveInlining)]. This is not exactly the same, but it will allow the compiler to inline the function in most cases, no matter the size of the function. Without this attribute, only very small functions will be inlined. Whether inlining will make it faster or not, though, really depends on the scenario - only testing will tell.
That being said, it looked like most of the PPU emulation is in a single function already, so there is probably relatively little to gain from function inlining there.
I don't think structs are slow to access in general, they aren't really any different from any other variable (e.g ints or structs are both value types, and memory-wise are stored the same way). But because they are value types, it means a copy of the struct is made every time it is passed from a function to another as a parameter (the function uses a copy of the struct, not the original). This is opposite of classes, which are always reference types, which means a reference is given to the function, not a copy of the class itself. (This is all C#/.NET-only stuff, by the way, structs & classes in C++ are virtually identical in behavior)
So while passing a struct as a function parameter in C# could potentially be slow (this can be avoided by using the "ref" keyword on the parameter though), accessing it in itself shouldn't be slower than accessing other variables.
e.g:
struct MyStruct
{
int a;
}
class Test
{
int b;
MyStruct data;
}
In this case, I would expect that accessing "this.b" would be pretty much as fast as "this.data.a", unless the CLR is doing something pretty funky with the structs that I am not aware of (which is entirely possible - I'm very familiar with C# itself, but have not ever really needed to optimize C# code on such a low level)
Ended up writing a wall of text, but maybe something in all of this will be of use.
Sorry if I just ended up saying a bunch of things that you already know!
About macros, while they do tend to improve performance, this is usually because they are the equivalent of inlining the whole code (instead of making potentially costly function calls). They are a pain to debug though (and sometimes understand), so I tend to avoid them whenever possible. Normally in C++, you can use __forceInline (or similar, depends on compiler) to force the compiler to inline functions. You can also do something similar in C# (as of .NET 4.5) by adding this attribute to a function: [MethodImpl(MethodImplOptions.AggressiveInlining)]. This is not exactly the same, but it will allow the compiler to inline the function in most cases, no matter the size of the function. Without this attribute, only very small functions will be inlined. Whether inlining will make it faster or not, though, really depends on the scenario - only testing will tell.
Specifically, it can needlessly fill up the CPU's instruction cache.