1 2 3 4 5 6 7 8 9
Joined: 6/18/2015
Posts: 54
Thanks for the notice on the client.speedmode() function, I'm going to use that from now on. I've been running at the default 200%, but it's too slow :P. BWAHAHAHAA I found my issue. Apparently I accidentally removed "removeWeakSpecies()" from my script. It's back now and the script works better. I'm also going to try forcing a garbage collect every time there's a new generation. Maybe that will cut down on memory usage in general. With removeWeakSpecies() back, it seems to hover around 600 MB and then drop to 500 or so. I'll leave it running and see if it keeps going higher and higher. If the memory balloons out of control I'll be sad. I've also found that for a starting generation we should specify an equivalent number to the number of inputs at least. What would be REALLY interesting to me is if we had a computer vision script running which would then output to the various controls instead of this specialized set of inputs. That way our game playing neural net would be able to be trained on other games in a more generic manner as well.
Joined: 6/30/2015
Posts: 7
client.speedmode seems to be capped at 6400, at least according to the code in emulualibrary.client.cs, but I guess its not as big of an issue. You're right, at the moment I doubt anything I have outside of my main rig has a chance of reaching that if its tied to frame rates. I was assuming it was somehow tied to the clock on an emulation of the NES CPU itself, which is 1.79mhz and could have been sped up considerably more. My main concern still is I want to completely strip the display off it. (I'd love to run it via command line at some point, so I can send it off to some of my servers for even more distributed instances, but for right now I just want to eliminate any overhead).
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
You're pretty confused and it's a bit baffling to me why you'd be worried about barriers you havent hit yet. The speed rate controls the THROTTLED speed. If you're trying to run a script as fast as you can, you wouldn't be throttling; therefore the speedup level is irrelevant. You make more sense talking about stripping the display off. When I hack the code to do that I go from about 3300fps in SMB to 6600. With SMW on bsnes I go from 195 to 215. Not worth worrying about too much. Disabling the display should be a feature of whatever headless script/bot feature we may or may not still be working on.
Joined: 6/18/2015
Posts: 54
zeromus, this script is heavily dependent on the speed of the emulator, because it uses frameadvance and not some other method. He's definitely hit the speedup setting cap at 6400%, because he can't set the speed faster than that. I have hit that as well. Askye, if you get a custom build going to remove the cap and adding other 'headless' features, i'd be down on working with you on it, as I'm interested in running the simulations much faster. Also askye, if you want to REALLY speed it up, remove all console.writeline calls, and all gui graphics drawing calls. I have been unable to ascertain whether or not the script is too slow at 6400% or what. Also you could look into modding the rom to nop out all the draw calls used by the rom.This may reduce the computational load by a lot, but I'm not sure how much.
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
I still think you're confused. If you set the speed, you're doing it wrong. Setting the speed sets the throttle speed. Disable the throttle, and the speed you set it to is irrelevant. I just nopped the whole frameadvance on quickNES and got 27000fps.
Dwood15 wrote:
modding the rom to nop out all the draw calls used by the rom
In order to de-garble this, you would want to hack the rom to turn off the BG and OBJ display. This might or might not send bsnes down a faster codepath. It probably won't make any difference for NES. With SNES you could also SNES > Display menu item and it may or may not have the same effect as hacking the rom
Joined: 6/30/2015
Posts: 7
I don't entirely understand the internals of the emulator, which is why I was looking to be pointed in the right direction. I didn't realize disabling the throttle invalidated the speedup :) Thanks for clarifying, and pointing me to FrameAdvance in QuickNES. I still want to get rid of the rest of the GUI, but thats definitely the most important bit. Dwood, yes, anything that has to be drawn should be avoided. Updating the fitness header every frame and drawing all the neurons/lines/boxes cuts my performance in half.
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
Nopping the frameadvance is a terrible idea. With frameadvanced nopped the NES core won't do anything at all. It just proved that the emulator frontend could run faster.
Joined: 6/18/2015
Posts: 54
zeromus wrote:
I still think you're confused. If you set the speed, you're doing it wrong. Setting the speed sets the throttle speed. Disable the throttle, and the speed you set it to is irrelevant.
All right, I removed the throttling. (that gave a significant performance boost, nice!)
I just nopped the whole frameadvance on quickNES and got 27000fps.
What do you mean? Like if i run my script without frameadvance in lua to move the gameforward, how am I supposed to tell the emu my script is done, and pick up where it left off?
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
You're not supposed to do anything. It's a terrible idea.
Joined: 6/18/2015
Posts: 54
Ah. Well I don't know how emulators work so it's good to know. I'll just stick with what it's at now then, and add a check to make sure cpu on the emu isn't being throttled.
Joined: 6/30/2015
Posts: 7
zeromus wrote:
Nopping the frameadvance is a terrible idea. With frameadvanced nopped the NES core won't do anything at all. It just proved that the emulator frontend could run faster.
I didn't kill the entire frameadvance, just the part that renders stuff (it looks like there was a config option to turn off rendering anyways, but I can't find anywhere to set it). I also killed all the other GL rendering, so right now its a blank window (removing that to full command-line is my next step), but profiling shows that lua is using almost all the execution time so I have it good enough for now. Thanks a lot for pointing me in the right direction here. I confirmed its still running everything properly despite not displaying anything, a console log in the script is showing that each trial is using the same amount of frames and coming up with the same fitness as it was before. But without running all the unnecessary rendering procedures I have 5 instances running at ~1000-1200 fps as opposed to the 150 I was getting before when having 5 open. For one window, its not really worth the effort, but with multiple running dividing the pool into batches, its working great. I appreciate your help :)
Joined: 6/18/2015
Posts: 54
That's... really clever askye. Mind sharing the exact details on how you did that? I'm interested in squeezing more fps out of my emulator, since it is using up 90% of my cpu (it's kind of a cheapo cpu) On another note, i'm now forcing Lua to garbage collect every time initializeRun is called, using the generic collectgarbage(). It has dropped the working memory from 400 average to 300 MB. Considering that it's ballooned out of control but not any more, I'm very hopeful I can just leave this ai running over the next 2-3 days and have it putting out some good results. Next up there's going to be a few problems I need to fix or add: 1. Fitness function is completely borked, is less than half as useful as it should be, so I need to rethink what failure and success means for the ai. This is despite my fitness function and timeout functions being 10x better than the default version, imo. Along with that 2. Need to account for Mario in the overworld. what i've got so far is much, much better than any other person has for fitness in SMW, that I am 100% certain. On the other hand, It still has some painful points where the second Mario enters teh overworld, it's jacked up. I've got to get all the values (x, y, etc) for the overworld 3. Missing sprites. I have all the sprite info for the basic characters and the extended sprites, but I don't think I have any for the bosses or any castle specific sprites. 4. Various other inputs: help boxes, save menus, yoshi, better/more fine-grained enemy sensing, etc etc, slope detection, timer, and the direction, mario is facing, as well as velocity are all inputs the neural net needs. Here's an open invitation, however: To any person that wants to see MarI/O in another game, if you get me the sufficient documentation of a game and its ram addresses I will port MarI/O over to that game so you can watch the ai play, or even play with it if it's a Multiplayer game of some kind.
Joined: 6/30/2015
Posts: 7
I just built it, profiled it with visual studio, and used the profiler to figure out what was eating up cpu besides LUA, and turned off anything that wouldn't break things (stuff related to GL/rendering mostly, I think I started at the render function in mainform.cs). I'm doing SMB1 instead of SMW almost entirely because it eliminates dealing with an overworld, and those colored blocks/switch palaces screwing with things. But you could check if its the overworld, and then grant it fitness if the world/level it goes to is higher than the last one it was on. I let it see what world/level it was on so the inputs for level 2 are less likely to mess with the already optimized inputs for level 1. Keep in mind changing the fitness function could ruin any previous training you've done, and adding inputs almost certainly will. My best suggestion is if you do change it, to "play top" and make sure its still performing fairly well; this will also make sure the top genome is also getting its fitness updated. Also I've noticed that additional input seems to increase the training time needed for similar results by more or less a square function. As in, every doubling of the inputs results in 4x increase in time to make similar progress, but it could certainly be even worse than that. For an IPC, every generation I'm having each instance grab a block of untested genomes (if its to the point that its testing 15 species, you could have each one grab the next available species), and synchronize the results via a file, then spin if no blocks are left until the other instances finish. Whoever finishes last generates the next generation, and then they each grab the next block. Something I've considered doing is making a function to pretrain the network based on actual gameplay. My initial thoughts are to record the inputs/outputs during a playthrough of a level, and then train the network until it perfectly mimics your outputs for those inputs. You could do this in a separate program outside of bizhawk once the run is recorded, which would be immensely faster since its not using LUA. Once the network is trained/built, cull any species that don't match your outputs, load it into bizhawk and let it play on its own.
Invariel
He/Him
Editor, Site Developer, Player (169)
Joined: 8/11/2011
Posts: 539
Location: Toronto, Ontario
askye, you might want to watch this: http://techcrunch.com/2013/04/14/nes-robot/
I am still the wizard that did it. "On my business card, I am a corporate president. In my mind, I am a game developer. But in my heart, I am a gamer." -- Satoru Iwata <scrimpy> at least I now know where every map, energy and save room in this game is
Joined: 6/18/2015
Posts: 54
Askye: I was actually thinking about doing the same thing where we dump the state every so often and save the inputs and the outputs, and 'train' the network based on those inputs and outputs. I think that would be a faster and a smarter way of doing things. It would also allow us to use OpenCL and take advantage of gfx processing. This would make the training 10x+ faster I'm sure. At that point, however, I wonder if we would be better off training a neural network on images instead? We could prt scr every 5th frame and dump the inputs, then train the network based on that. Then, we could use our RAM addresses to further evaluate fitness functions. And yeah, I've restarted every time i've made a major fitness formula adjustment. It's a pain, but when the formula is bad, Mario has a tendency to oscillate and not progress in the level. So i've made sure to add a fitness = -1 if the game timer runs out. :P
Joined: 6/30/2015
Posts: 7
Invarial, I loved that presentation. A lot of his techniques wold do well, he's definitely thought up the training portion well, but he could definitely do better with fitness in some of his examples (stopping breaking a block for 50 points is a possible net loss over the extra score boost by finishing the level earlier). But I like the portability he has. Dwood I don't understand enough of the NEAT internals yet to attempt making a gfx card run it :/ but it would be great at some point, especially if you can do that! I know CUDA better, but haven't the slightest how I would get NEAT to run on it. Even with C, its going to be a lot faster than LUA. A lot of speedup could also be achieved if you built a library with C that handles all the NEAT internals during unsupervised training, and limit LUA to scanning the inputs and calling that. (CL/CUDA probably won't be much speedup here, since you only have one frame's worth of input at any given time; its usually only worth gpgpu processing when you have a lot of data, such as possibly in the supervised training case) The way sethbling was doing handling timers (which I think is a decent approach) is to just terminate if mario doesn't make progress in a while. Since most of the time, he's either running right, stuck, or stopped, its pretty easy to detect when he's hung. I increased my timeout a bit, and kill him off if he loses any lives as well (timeout, falling down a hole, goomba; make sure to update the check if he gains a life)
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
you guys should be able to select other C emulator cores to combine with C bot work. You can still use bizhawk for visualizing your most recent results, but it wouldnt be convenient to visualize it as it runs. If you choose the right core, you could even parallelize it across CPUs (if the core is what we call multi-instance capable). I'm not sure if any useful snes cores are multi-instance or structured well enough for this task at all. There should be something okay NES-wise but I'm not sure what it is offhand.
Site Admin, Skilled player (1236)
Joined: 4/17/2010
Posts: 11269
Location: RU
Since botting SMW isn't going to wane as this idea gains popularity, maybe it's about time to add snex9x to bizhawk?
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Joined: 6/18/2015
Posts: 54
If someone wants to add the snes9x core to bizhawk, I'd be down to use that. If you want to craft a neural net bot in c++ for max performance, you'll need a C/C++ based Open Source emulator. There's: higan, which is GPLv3, my cmd prompt isn't (ugh) reading the PATH variable to run make, so it's being a pain in general. If you get higan to build let me know if you had to do anything special. Askye: I think we can get by using C# instead of Lua by using the C# interface of the scripts. It would probably also make bizhawk more extensible if we could use .NET languages to create scripts instead of Lua by making our own plugin interface. This way, we could write a .dll in C++ or C# that uses functions we expose. It gives us way more control and better debugging if we can debug our plugins and scripts using Visual Studio. C# will be faster than LUA, but it also could be used to create an API that C++ implementations of any of these languages. If we go with C# there's a library that allows us to program like we're using CUDA but build to OpenCL or CUDA depending on the targeted gfx card called cudafy. That would be nice for me because i'm running AMD in my PC, which obviously doesn't support CUDA. Also, using Visual Studio to create these allows us to navigate other's code so much faster and easier. I'm also thinking that we create a more abstract set of functions for getting the neural net's inputs so we can train the network using different mario games easier, by pre-defining the memory space in the inputs we use. On another note, if there's a NEAT/Neural network where someone's done the legwork for us, we should use it. There's es-hyperneat in c#, website, based on: sharpneat. These places kind of just throw software to the wild without documentation, however.(lol) I'm going to take a look at these different NEAT implementations. My real question is: what does it mean to train a network on static inputs/outputs, and how is that different from a standard network?
Masterjun
He/Him
Site Developer, Skilled player (1971)
Joined: 10/12/2010
Posts: 1179
Location: Germany
Is this bot better than TASers yet?
Warning: Might glitch to credits I will finish this ACE soon as possible (or will I?)
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
dwood15, if youre thinking about using higan, stop and use lsnes instead
Joined: 6/18/2015
Posts: 54
Masterjun wrote:
Is this bot better than TASers yet?
hahahaha that's funny. nope! Still lots of work to be done. The network is heavily limited by its inputs and the fitness function as well as the variables used for training it. TAS-ers don't have these limitations. That is, unless their tools are bad. My ai after running it for 2 days non-stop couldn't even beat the level it was on in a consistent manner. Then again, i'm sure if it had all the inputs the ai could be as good, if not better. Just don't expect it to overflow the stack to create minigames for itself.
darkszero
He/Him
Joined: 7/12/2009
Posts: 181
Location: São Paulo, Brazil
If all you want is to be able to write a DLL in C++, you can use Lua's C bindings, then load the DLL directly from Lua via a require.
Editor, Emulator Coder
Joined: 8/7/2008
Posts: 1156
Well... that's a decent idea... it will be a bit odd due to needing to move a lot of data out of bizhawk, into lua, and from then to the dll, and the opposite: calling bizhawk (savestate, for example) methods curiously by passing delegates to do it out of lua and into c++. But--it might be easier overall than bolting a new controller onto a c++ core and you can do it piecemeal instead of taking the plunge all at once. Certainly worth considering.
Joined: 6/18/2015
Posts: 54
zeromus wrote:
Well... that's a decent idea... it will be a bit odd due to needing to move a lot of data out of bizhawk, into lua, and from then to the dll, and the opposite: calling bizhawk (savestate, for example) methods curiously by passing delegates to do it out of lua and into c++. But--it might be easier overall than bolting a new controller onto a c++ core and you can do it piecemeal instead of taking the plunge all at once. Certainly worth considering.
From what I can tell, what he's suggesting would make it so we can just bypass lua entirely. I haven't even downloaded bizhawk's source to know how tightly coupled the lua functions bizhawk exposes are between the core and LUA the language. If it's not too tightly coupled, we could make a plugin API so people can write their own scripts using managed .dll's. At any rate, if I were to actually do it, that would be exactly the plan I'd try to do. Just build the whole thing out that way. It may actually be faster since there's about a bajillion NEAT implementations in C#/C++ we could take advantage of instead. None in C# use CUDA/OpenCl, but even C# would offer better memory mgmt than lua. I think I'll do it. I'll download the source and if I have time this week between classes and studying for tests, I'll try to implement at least a basic NEAT implementation using the C# functions that bizhawk exposes to Lua.
1 2 3 4 5 6 7 8 9