Invarial, I loved that presentation. A lot of his techniques wold do well, he's definitely thought up the training portion well, but he could definitely do better with fitness in some of his examples (stopping breaking a block for 50 points is a possible net loss over the extra score boost by finishing the level earlier). But I like the portability he has.
Dwood
I don't understand enough of the NEAT internals yet to attempt making a gfx card run it :/ but it would be great at some point, especially if you can do that! I know CUDA better, but haven't the slightest how I would get NEAT to run on it.
Even with C, its going to be a lot faster than LUA. A lot of speedup could also be achieved if you built a library with C that handles all the NEAT internals during unsupervised training, and limit LUA to scanning the inputs and calling that. (CL/CUDA probably won't be much speedup here, since you only have one frame's worth of input at any given time; its usually only worth gpgpu processing when you have a lot of data, such as possibly in the supervised training case)
The way sethbling was doing handling timers (which I think is a decent approach) is to just terminate if mario doesn't make progress in a while. Since most of the time, he's either running right, stuck, or stopped, its pretty easy to detect when he's hung. I increased my timeout a bit, and kill him off if he loses any lives as well (timeout, falling down a hole, goomba; make sure to update the check if he gains a life)
I just built it, profiled it with visual studio, and used the profiler to figure out what was eating up cpu besides LUA, and turned off anything that wouldn't break things (stuff related to GL/rendering mostly, I think I started at the render function in mainform.cs).
I'm doing SMB1 instead of SMW almost entirely because it eliminates dealing with an overworld, and those colored blocks/switch palaces screwing with things. But you could check if its the overworld, and then grant it fitness if the world/level it goes to is higher than the last one it was on. I let it see what world/level it was on so the inputs for level 2 are less likely to mess with the already optimized inputs for level 1.
Keep in mind changing the fitness function could ruin any previous training you've done, and adding inputs almost certainly will. My best suggestion is if you do change it, to "play top" and make sure its still performing fairly well; this will also make sure the top genome is also getting its fitness updated.
Also I've noticed that additional input seems to increase the training time needed for similar results by more or less a square function. As in, every doubling of the inputs results in 4x increase in time to make similar progress, but it could certainly be even worse than that.
For an IPC, every generation I'm having each instance grab a block of untested genomes (if its to the point that its testing 15 species, you could have each one grab the next available species), and synchronize the results via a file, then spin if no blocks are left until the other instances finish. Whoever finishes last generates the next generation, and then they each grab the next block.
Something I've considered doing is making a function to pretrain the network based on actual gameplay. My initial thoughts are to record the inputs/outputs during a playthrough of a level, and then train the network until it perfectly mimics your outputs for those inputs. You could do this in a separate program outside of bizhawk once the run is recorded, which would be immensely faster since its not using LUA. Once the network is trained/built, cull any species that don't match your outputs, load it into bizhawk and let it play on its own.
I didn't kill the entire frameadvance, just the part that renders stuff (it looks like there was a config option to turn off rendering anyways, but I can't find anywhere to set it).
I also killed all the other GL rendering, so right now its a blank window (removing that to full command-line is my next step), but profiling shows that lua is using almost all the execution time so I have it good enough for now. Thanks a lot for pointing me in the right direction here.
I confirmed its still running everything properly despite not displaying anything, a console log in the script is showing that each trial is using the same amount of frames and coming up with the same fitness as it was before. But without running all the unnecessary rendering procedures I have 5 instances running at ~1000-1200 fps as opposed to the 150 I was getting before when having 5 open. For one window, its not really worth the effort, but with multiple running dividing the pool into batches, its working great.
I appreciate your help :)
I don't entirely understand the internals of the emulator, which is why I was looking to be pointed in the right direction. I didn't realize disabling the throttle invalidated the speedup :) Thanks for clarifying, and pointing me to FrameAdvance in QuickNES. I still want to get rid of the rest of the GUI, but thats definitely the most important bit.
Dwood, yes, anything that has to be drawn should be avoided. Updating the fitness header every frame and drawing all the neurons/lines/boxes cuts my performance in half.
client.speedmode seems to be capped at 6400, at least according to the code in emulualibrary.client.cs, but I guess its not as big of an issue. You're right, at the moment I doubt anything I have outside of my main rig has a chance of reaching that if its tied to frame rates. I was assuming it was somehow tied to the clock on an emulation of the NES CPU itself, which is 1.79mhz and could have been sped up considerably more.
My main concern still is I want to completely strip the display off it. (I'd love to run it via command line at some point, so I can send it off to some of my servers for even more distributed instances, but for right now I just want to eliminate any overhead).
All throttling is off. I'm running it with SMB1 not SMW, so its NES, and I don't see an option to disable the display completely. I do have it set to 0 sprites and stuff, but I want to completely turn off graphical rendering (and boost emulator speed past 6400%), which is why I'm looking to make a custom build of bizhawk.
I've added a lot of features (I'm just doing SMB1), to improve fitness and expose new inputs. But with more inputs, learning gets slowed down. I tried running multiple bizhawk instances and having them communicate with a file-based ipc so that they could split up the genome pool, but the overhead isn't worth it at the moment, as I'm progressing faster with one instance at 600+fps than I do with 4+ instances at 100-150 fps.
What I'd love is a way to run bizhawk completely headless, as graphical rendering is the biggest bottleneck, especially with multiple instances. Can one of the bizhawk devs point me in the right direction in the source to start at for removing this? I'd love to get a headless build sent to a couple of my servers and let them help distribute the load as well. I'd also like to increase the 6400x speedup cap when displaying, but I can probably figure that part out :)