Post subject: Bot that plays NES games by making numbers go higher
Patashu
He/Him
Joined: 10/2/2005
Posts: 4048
http://www.cs.cmu.edu/~tom7/mario/ Link to video Ok, this isn't my work but it's really, really cool. Basically, this guy wrote a bot that plays NES games by attempting to increase addresses in memory that have a lexicographical ordering (for instance, score goes up but it has more than one byte - but lexicographically it still increases when you observe someone play) and trying multiple future sequences and greedily picking the best one. Without knowing anything about a NES game except what it sees happen in memory from the game being played once by a human (which it uses to decide what memory addresses are important, e.g. increase lexicographically), it displays emergent skill in games like Super Mario Bros due to being driven to increase score and scroll the camera right (two lexicographical orderings) and therefore to progress and not to die to hazards (preferring to get their score). It even finds glitches in the game and does insanely risky dodges purely because it keeps the bot alive. In the video the gameplay starts at about 6 minutes in. He's published a paper and source code for it. While the bot only gets so far before dying in each game, I think that if it was made more 'death-averse' somehow and looked a bit further ahead it would be able to beat simple NES platformers. It's really exciting that it can get so far with so little knowledge of the game![/i]
Puzzle gamedev https://patashu.itch.io Famitracker musician https://soundcloud.com/patashu Programmer, DDR grinder, enjoys the occasional puzzle game/shmup.
Editor
Joined: 3/31/2010
Posts: 1466
Location: Not playing Puyo Tetris
Wow. I welcome our AI overlords. I am clearance Ultraviolet.
When TAS does Quake 1, SDA will declare war. The Prince doth arrive he doth please.
Noxxa
They/Them
Moderator, Expert player (4138)
Joined: 8/14/2009
Posts: 4094
Location: The Netherlands
This was very interesting. I like how it managed to find all kinds of odd tricks and fancy moves, such as the Super Mario Bros. reverse stomp and wall jump, shooting enemies from behind in Bubble Bobble and doing some cool dodging tricks in Pac-Man. It probably is barely even aware of what is going on, and yet it is already showing off for the viewers.
http://www.youtube.com/Noxxa <dwangoAC> This is a TAS (...). Not suitable for all audiences. May cause undesirable side-effects. May contain emulator abuse. Emulator may be abusive. This product contains glitches known to the state of California to cause egg defects. <Masterjun> I'm just a guy arranging bits in a sequence which could potentially amuse other people looking at these bits <adelikat> In Oregon Trail, I sacrificed my own family to save time. In Star trek, I killed helpless comrades in escape pods to save time. Here, I kill my allies to save time. I think I need help.
Patashu
He/Him
Joined: 10/2/2005
Posts: 4048
I'm reading the research paper now, it's a cool read. (One thing the bot does is, instead of trying inputs randomly, it prefers input chunks that featured commonly in the input file used to train the bot, which both means it can look ahead in larger chunks than single frames and tries to play the same way you did) One thing that strikes me is that the author often laments that things like frame counters and music cursors (that go up regardless of how good your play is) ensure that the bot will include them in their objective function for scoring (since they go up as progress is made) even though they are not discriminating. Maybe the bot, either while computing the objective function or as it plays and observes the objective function set, could notice when variables go up no matter what, and steadily weigh them less and less. (On the other hand, such functions that always go up might NOT go up when the player character is currently dying - and such a variable suddenly IS discriminatory, and the input sequence the bot uses to learn the game might not include any or many deaths. So this isn't a perfect solution either.) EDIT 1: At the end he admits that he hasn't gone as far as he'd like to due to the conference coming up. So, I'm interested to see if this will go further. Since he lists potential improvements and makes the source code available, maybe some collaboration/feedback could even happen? EDIT 2: adelikat and TASVideos in the references at the end :')
Puzzle gamedev https://patashu.itch.io Famitracker musician https://soundcloud.com/patashu Programmer, DDR grinder, enjoys the occasional puzzle game/shmup.
Joined: 10/31/2006
Posts: 134
Really interesting, I'd like to see the result when you feed the bot one of those sub 5 minutes mario run.
Patashu
He/Him
Joined: 10/2/2005
Posts: 4048
Master of Puppets wrote:
Really interesting, I'd like to see the result when you feed the bot one of those sub 5 minutes mario run.
The way it is currently written I suspect it would suspect it wouldn't do better than any good play of SMB1, because what it does with the input file is observe what memory addresses tend to go up over time (and in what ways) and what the input tends to be like for playing the game (identifying motifs).
Puzzle gamedev https://patashu.itch.io Famitracker musician https://soundcloud.com/patashu Programmer, DDR grinder, enjoys the occasional puzzle game/shmup.
Player (96)
Joined: 8/5/2007
Posts: 865
I'm interested in this. Verrrrrrrry interested...
Skilled player (1830)
Joined: 4/20/2005
Posts: 2161
Location: Norrköping, Sweden
This was really awesome. And didn't that input display look suspiciously like FCEUX?
Patashu
He/Him
Joined: 10/2/2005
Posts: 4048
Randil wrote:
This was really awesome. And didn't that input display look suspiciously like FCEUX?
It is an edit of FCEUX's source code, if you read the research paper.
Puzzle gamedev https://patashu.itch.io Famitracker musician https://soundcloud.com/patashu Programmer, DDR grinder, enjoys the occasional puzzle game/shmup.
MESHUGGAH
Other
Skilled player (1931)
Joined: 11/14/2009
Posts: 1358
Location: 𝔐𝔞𝔤𝑦𝔞𝔯
Ah god... I'm 1 day later... also I hate that I need to wait to search again in the forum. I laughed at the karate kid tie and pacman trick.
PhD in TASing 🎓 speedrun enthusiast ❤🚷🔥 white hat hacker ▓ black box tester ░ censorships and rules...
BigBoct
He/Him
Editor, Former player
Joined: 8/9/2007
Posts: 1692
Location: Tiffin/Republic, OH
That was a fascinating watch. Thanks for sharing it!
Previous Name: boct1584
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
Apparently it's now news to do a fraction of what Bisqwit did with BisqBot or what we now do with Lua: http://arstechnica.com/gaming/2013/04/this-ai-solves-super-mario-bros-and-other-classic-nes-games/
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
AnS
Emulator Coder, Experienced player (729)
Joined: 2/23/2006
Posts: 682
Nach wrote:
Apparently it's now news to do a fraction of what Bisqwit did with BisqBot or what we now do with Lua: http://arstechnica.com/gaming/2013/04/this-ai-solves-super-mario-bros-and-other-classic-nes-games/
Well, the huge difference is that our bots are game-specific and heavily engineered, while that guy found an almost reliable criterion (at least for a spoof research) which appears to work for many games (albeit not very well), and that's what's impressive about it. The whole thing is about aestetics, while bots are about achieving actual results.
Joined: 10/31/2006
Posts: 134
Nach wrote:
Apparently it's now news to do a fraction of what Bisqwit did with BisqBot or what we now do with Lua: http://arstechnica.com/gaming/2013/04/this-ai-solves-super-mario-bros-and-other-classic-nes-games/
http://tasvideos.org/forum/viewtopic.php?t=13996 Don't think it can't be compared to Lua/BisqBot
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
While it's true that his is more generic, the presentation on Ars isn't about that, but about botting a game in general, and in that sense, it pails in comparison to what we have achieved.
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
Master of Puppets wrote:
Don't think it can't be compared to Lua/BisqBot
BisqBot made it several worlds into SMB, this bot only made it to 1-3.
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
Patashu
He/Him
Joined: 10/2/2005
Posts: 4048
BisqBot had to be told that increasing x co-ordinate is. good, getting power ups is good, taking warp pipes is good, etc and dying is bad in SMB1. The novelty of this bot is that you don't tell it what to do - it watches you play and assigns itself its goals for the game. BisqBot is great but the idea of this bot is that it decides what it wants, in a way I haven't seen applied to game bots before. (btw, I think this should be split to http://tasvideos.org/forum/viewtopic.php?t=13996 ) (Mod edit: Done --Mothrayas)
Puzzle gamedev https://patashu.itch.io Famitracker musician https://soundcloud.com/patashu Programmer, DDR grinder, enjoys the occasional puzzle game/shmup.
Joined: 4/16/2013
Posts: 8
Hmmm, after looking at that and some older post about BisqBot, I wonder if the BisqBot concept can be taken to a more extreme level. Call it the DeepBlue of TASing. Some ideas of improvements: 1. Make it really think ahead, like 10 seconds ahead. 2. Give it more (weighted) variables as to what is considered "better". Make things like losing bigness a negative. 3. Have some form of smaller loops that save good inputs, so that the full 10 seconds isn't completely random. For example, as soon as it kills a Goompa, consider that input pattern to be a "good one" and keep duplicating that for a while. I guess item #3 would be explained as almost like some sort of multi-looped binary division: 10 second loop --> 2 5s loops --> 4 2.5s loops --> etc. I'm sure there is some form of genetic programming that would help out here, too.
Player (96)
Joined: 8/5/2007
Posts: 865
SineSwiper wrote:
Hmmm, after looking at that and some older post about BisqBot, I wonder if the BisqBot concept can be taken to a more extreme level. Call it the DeepBlue of TASing. Some ideas of improvements: 1. Make it really think ahead, like 10 seconds ahead. 2. Give it more (weighted) variables as to what is considered "better". Make things like losing bigness a negative. 3. Have some form of smaller loops that save good inputs, so that the full 10 seconds isn't completely random. For example, as soon as it kills a Goompa, consider that input pattern to be a "good one" and keep duplicating that for a while. I guess item #3 would be explained as almost like some sort of multi-looped binary division: 10 second loop --> 2 5s loops --> 4 2.5s loops --> etc. I'm sure there is some form of genetic programming that would help out here, too.
I was thinking of something along the lines of "trajectories", by which I mean, "If I were to execute the following button combinations over these frames, how would I expect relevant RAM values to change?" This would allow the bot to peer into the future predictively and relatively efficiently. In principle, it shouldn't be too difficult and should closely model human play. In practice... I'm in over my head.
Banned User
Joined: 3/10/2004
Posts: 7698
Location: Finland
SineSwiper wrote:
1. Make it really think ahead, like 10 seconds ahead.
The problem with that is that for it to be feasible, you need a really good pruning algorithm to discard search branches that will (most probably) result in a bad result. You can't test all possible combinations (because of exponential explosion.) This is actually a rather difficult heuristic to come up with (especially if you want it to be generic.) Sometimes if you go a bit backwards, ie. away from the desired goal rather than always towards it, the end result can be much better than if you just blindly always go towards the desired goal. Sometimes sacrificing something in the short term will give a great benefit in the long term.
Joined: 4/16/2013
Posts: 8
Bobo the King wrote:
I was thinking of something along the lines of "trajectories", by which I mean, "If I were to execute the following button combinations over these frames, how would I expect relevant RAM values to change?" This would allow the bot to peer into the future predictively and relatively efficiently. In principle, it shouldn't be too difficult and should closely model human play. In practice... I'm in over my head.
Well, that's exactly what "save state prediction" is. It's actually asking that very question by doing some random button presses, and then checking RAM values. It's predicting the future by playing the future and then reverting back. But, I guess both you and I are looking for something that is less random in its button presses. Something that wouldn't require a mammoth amount prediction cycles to play through the game. Another technique to consider is Maze solving algorithms. If a set of actions leads to a "dead end", like reducing in size or dying, chop off that branch and move to the next set. I don't know if that gives you much in savings, though, and it gives stuff like size reduction an infinitely negative weight. (This would be appropriate for death, but you might actually have a use for purposely getting hit.)
Patashu
He/Him
Joined: 10/2/2005
Posts: 4048
Just to make sure - if you're talking about improving this bot or BisqBot, have you fully read the research paper here? http://www.cs.cmu.edu/~tom7/mario/mario.pdf
Puzzle gamedev https://patashu.itch.io Famitracker musician https://soundcloud.com/patashu Programmer, DDR grinder, enjoys the occasional puzzle game/shmup.
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
The best part about the research paper is that he seems to be aware of some TASVideos inside jokes, and references them. He gives a mathematical explanation as to why some games are not eligible even for the vault.
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
Joined: 4/16/2013
Posts: 8
Patashu wrote:
Just to make sure - if you're talking about improving this bot or BisqBot, have you fully read the research paper here? http://www.cs.cmu.edu/~tom7/mario/mario.pdf
Partially. Still skimming through it. I'm mostly talking about BisqBot, though. Still trying to get it to compile (bisqbot-mario.cc) without much luck. Not sure how it interacts with the emulator, either.
Joined: 10/20/2006
Posts: 1248
Very nice. I still think that ultimately the best way to program a generalised TASing bot is to make it scientific. Forming and testing hypotheses about how memory addresses interact, how much of it it can control or influence, and under which circumstances. Very playfully at first, with the goal to learn as much as possible. Then finally tear the game apart. Of course this would be _really_ hard to achieve. A general rule to avoid deaths (and increasing the death counter) would have to be somewhat akin to it wanting to avoid RAM states that are very similar to previous ones without really getting any apparent new abilities, even if it increases the score or whatever. (I'll gladly accept the first bot to genuinely discover on its own how to acquire the hadouken in MMX as my new overlord) I swear it could be done if somebody was crazy enough to spend enough time on it. Lock somebody into a cell and don't leave them out until they're done. :p Tetris seems very difficult. And with that, I'll disappear back to whence I came.