Hi,
did anybody ever try to solve a game by using an A*-search?
(http://en.wikipedia.org/wiki/A*_search_algorithm)
This algorithm guarantees (under some assumptions) a perfect and unbeatable result.
IMO this algorithm should work very well for many old jump'n'run-games and deliver perfect solutions in a very reasonable time.
Let me describe the most important part in a few sentences, but be warned: This might be a bit theoretical!
You can read for an hour in the wiki or just believe the following statement:
The core of the algorithm is the evaluation-function of a gamestate: "How long will it at least take from here to the end?".
As long as this function does NEVER report a value that is higher than the value that is actually achievable, the algorithm will definitely find the perfect solution. That's it! Let's leave the algorithm's internals aside and concentrate on this evaluation-function.
You might want to take a shortcut and take the easiest evaluation-function possible: Always estimate the remaining time with "0".
You can do that, since this function fulfills the requirement: NEVER report a value higher than the value that is actually achievable. Thus the algorithm will eventually find a perfect solution.
BUT: With this function, the algorithm will basically brute-force the game, which will take far too long.
You see where this goes: The better the function is at estimating the remaining time (without ever returning a result too high), the faster the algorithm will terminate.
So we should provide a function that comes closer to the actual results.
If you take for instance Super Mario Bros:
Just return the time it would take to run from the current position to the flagpole at top-speed. Since mario does rarely slow down during the run, this simple estimate should be very good. In practice you would need some more fine-tuning to account for shortcuts inside the levels.
So back to the initial question:
Did anybody ever try to do that in an actual game?
The problem with that is that it doesn't take in account glitches. I'm ignoring the fact that mario runs at different sub-speed values because of the hopping glitch. It won't take into account frame rules, or the flagpole glitch etc. lag frames... so this is not a good way to see perfection, as it will give a value lower or higher than perfection could be sometimes. especially if you trigger an end level early. It could be done, but it's not worth the effort really. When you take everything into account, we're better off just looking for improvements in the current records.
Current TAS:
[SNES] Jelly Boy
[NES] Street Fighter 2010
As a general rule, this falls under "trying to solve games programmatically" and has similar flaws. The most glaring is that for an NES you have 8 inputs per turn, which means that each extra step you take in your search adds 8 more states to be evaluated. That means that after 1 second (60 frames), you have 8^60 ~= 1.5 * 10^54 states to deal with. For reference there are estimated to be about 10^80 atoms in the universe.
In order for a computerized solver to make any kind of progress, it must be able to recognize some states as being suboptimal. In most pathing situations, you can recognize when you reach the same position via multiple paths, and thus throw away all but the shortest path. But in a videogame, things are happening while your path moves. Enemies change position, time passes, etc. Also, your path is not just your position, but also your velocity and other game state; recognizing when a change in all that extra state is "redundant" is very difficult.
Even Super Mario Bros. 1, one of the simplest 2D platforming games, has many situations in which you need to take seemingly-suboptimal actions that don't pay off until quite some time in the future. Is it worth slowing down in order to get a mushroom? What about to manipulate the position of an enemy?
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
It's even worse than that. Each input can have one of two states (pressed or not pressed) so each frame has 2^8 possible inputs.
It seems like something like this could only be practical for some more limited situation where you could consider higher level moves than just "all possible inputs every frame".
That depends on the kind of glitch.
If the glitch takes too long to set up, it will probably not be found, since only "promising" states are examined. The hopping at the beginning of each level will probably be found easily.
As a general rule, this falls under "trying to solve games programmatically" and has similar flaws.
Before diving into the details, I would be interested in knowing who in this thread did actually know A*-search before reading this topic.
Now the details:
I am not sure that this will actually work either (otherwise I would just do it and not post here). BUT: If you do find a good evaluation-function for the state of the game, the algorithm works very efficiently and is NOT a brute-force-approach, but an informed search-algorithm that has to be tailored for each game it should be applied to.
Wikipedia shows a nice picture that shows the efficiency and the issues, this algorithm has:
1.) Before hitting the obstacle, the algorithm pursues a direct path.
2.) Upon hitting it, it has to go "wide" and examine a wide variety of states to find the perfect way around the obstacle.
Thus: As long as the game does not have many of these obstacles that slow down the progress, it should be very efficient to solve.
SMB falls into this category IMO.
Just to be clear: a tube you can jump over is not a relevant obstace, since it does not slow down progress. But a platform you have to wait for for severy seconds would create a severe hit in performance.
Basically, it all comes down to a graph-traversal-issue: What are the paths that are worth evaluating? This graph is really complex for most games. But A* offers two relevant means of reducing the computational complexity significantly:
1.) Detection of duplicate states
Many button-combinations lead to an equal state of the game. Let's take the starting-position of SMB and a simplified controller where you have only the buttons "up" and "right". That is 4 combinations per frame. If you made a pure brute-force-approach, after 60 frames, there would be 4^60 (about 10^36 states). Sounds frustrating! But if you detect the duplicate states, this number is reduced dramatically.
2.) Detection of inefficient states
Here the evaluation-function comes into play: Each state is rated by adding the amount of frames needed to get there and the (estimated) least amount of frames needed to complete the level (e.g. until after the fireworks have ended). Then the state with the smallest sum will be evaluated.
Here, all the moves that are too inefficient will be ignored.
Let me show a very simplyfied example to show how efficient the algorithm can work:
Imagine a game, where you just have to run 10m to the right. You start at position 0 and win when reaching 10. For each second, you have to decide if you want to want to run 1m to the left or the right.
Thus when starting the game, you have to decide if you run 1m to the left or the right.
A* will try both and end up with to states:
a) 1s into the game, standing on position 1 (1m to the right)
b) 1s into the game, standing on position -1 (1m to the left)
Now each state will be rated:
From state (a) you will need at least 9s to finish the game (9m left, 1m/s). State (a) is thus rated with 1s into the game + 9s to finish = 10s
State (b) is accordingly rated with 1s into the game + 11s to finish = 12s.
Now the algorighm chooses the best-valued state, which is (a) and evaluates the possible actions.
After this, there are three active states:
a) 2s into the game, standing on position 2 (twice to the right)
b) 2s into the game, standing on position 0 (once right, once left)
c) 1s into the game, standin on position -1 (the original state (b))
Let me introduce a shorter notation:
a) 2s, 2m, RR
b) 2s, 0m, RL
c) 1s, -1m, L, already known to have a rating of 12s
States (a) and (b) still need to be evaluated:
(a) will be rated with 10s: 2s into the game + 8s minimal completion time
(b) will be rated with 12s: 2s into the game + 10s minimal completion time.
Again, the best state will get further evaluation:
a) 3s, 3m, RRR, rated with 3s+7s=10s
b) 3s, 1m, RRL, rated with 3s + 9s = 12s
c) 2s, 0m, RL, already known to be rated as 12s
d) 1s, -1m, L, already known to be rated as 12s
You probably got the idea by now. And when applying this further, the optimal solution will be found with reaching the goal with the perfect time of 10s and only 20 evaluated states. Compare this to brute-forcing the game: There you would have had to evaluate all 2^10 = 1024 states. That's a very significant reduction that does even grow for longer periods:
If the goals was at 100m instead of 10m:
A* would need only 200 states, while brute-forcing would need 2^100 (about 10^30) states.
But we still don't have anybody around who actually tried that, do we?
As a general rule, this falls under "trying to solve games programmatically" and has similar flaws.
Before diving into the details, I would be interested in knowing who in this thread did actually know A*-search before reading this topic.
I have implemented A* for my own game work, thanks. It's not a very complicated algorithm.
Now, tell me how A* is going to figure things out like (examples taken from SMB1):
* At the start of each level, turn around, jump, and then start moving forwards
* Collect a mushroom and fireflower so the fortress level(s) can be finished faster
* Going through the wall in 1-2 to get to the warp zone faster
* The flagpole glitch
* The flagpole glitch with jumping off of an enemy
* The vine screen-scrolling glitch
* Avoiding fireworks
Like I said, you run into the problem that each state of the game may or may not be "closer" to your goal state than any other state, because there are a lot of situations that require you to slow down in your visible progress (distance towards goal) in order to set things up to ultimately be faster.
If all you care about is reaching the goal, regardless of the time taken, then sure, that's doable. Search YouTube for "infinite Mario"; there was an AI contest awhile back to write a Mariobot that could "solve" procedurally-generated levels, and the winner used A* search. Though, come to think, even that would have problems with the "maze" fortress levels.
But winning games is easy when you have tools. The trick is to win optimally. And that requires far more lateral thinking than A* is going to be able to provide.
EDIT: thanks for the correction, zoibot. I thought the number was a bit small...
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
A* only works well if a) you know what the fastest form of movement is, b) that fastest movement is usually achievable.
The problem is that a) is often wrong due to glitches, and b) is often wrong due to frame rules (which completely kill the performance of the algorithm). I can imagine that there would be games where it would work, but those are probably rare.
(I actually have written a bot to play NetHack using A* in the space of goals in the game. It's slow going and doesn't work so amazingly well; it comes up with really insightful strategies sometimes (I once saw it kite an entire roomful of monsters in one of the most dangerous levels in the game via grouping them and then running slightly faster in them, in a wide circle so that they didn't gain advantages from turning corners), but its play is noticeably worse than a good human because it doesn't make assumptions (and thus ends up exploring in an excessively OCD manner).
Of course, my bot doesn't have the help of savestates, which makes things quite different; it has to dynamically reroute as it gains more information. But I think a savestate-using bot would have the same general issues.)
I have implemented A* for my own game work, thanks. It's not a very complicated algorithm.
Good to know. That's makes the discussion a lot easier. Where did you try it? And how did it perform? Did you implement it directly in an emulator or did you use some interface?
* At the start of each level, turn around, jump, and then start moving forwards
This should IMO be easily found since AFAIK it pays off after a few seconds only.
* Collect a mushroom and fireflower so the fortress level(s) can be finished faster
IMO this cannot be solved by A*. Here the implementor will have to define different scenarios, let A* evaluate them and choose the best one.
* Going through the wall in 1-2 to get to the warp zone faster
Since this is faster than running on top, A* should find this automatically.
* The flagpole glitch
If it was an unknown glitch, A* would probably not find this. But since it is a known glitch, the target can be defined accordingly by not defining it as reaching the flagpole but reaching the next level.
* The flagpole glitch with jumping off of an enemy
Should be found as well IMO.
* The vine screen-scrolling glitch
This is a tough one, since the situation is hard to evaluate. The goal will have to be set by the programmer, just like with the mushrooms.
* Avoiding fireworks
Define the goal as reaching the next level (and not the flagpole) and rate the states accordingly.
The trick is to win optimally. And that requires far more lateral thinking than A* is going to be able to provide.
That's a good summary: A* is not artificial intelligence that will use some kind of creativity to win a game. Instead it is can be a good tool to find a perfect way along the paths, the programmer has in mind.
Before diving into the details, I would be interested in knowing who in this thread did actually know A*-search before reading this topic.
I used A* for pathfinding in an unfinished prototype of a 2D-maze-like game. Actually, at first I used a simple Dijkstra algorithm and all was well, but then I started overenginneering for no reason (hence "unfinished") and added simple heuristics formula (distance to the end tile). Can't say the performance was noticeably better (on some convoluted maps the simpler algorithm was even faster, while A* was better on empty fields with a few obstacles), but at least it worked. In a static map. Then I added moving obstacles and it became apparent that units have to recalculate their path every frame from scratch, because the map already changed since previous calculation. Well, of course in real development they use various tricks to minimize this problem (while also reducing the quality of pathfinding, which isn't noticeable when playing in realtime), but I don't think those tricks are applicable when we want perfection. So I started trying to make the evaluation formula more complex (take into account the nature of moving obstacles, like preiodic movements, etc). At the same time I noted the apparent need to simplify the game rules in order to make the heuristics more adequate to the complexity of the game.
AndiHoffi wrote:
If you do find a good evaluation-function for the state of the game, the algorithm works very efficiently and is NOT a brute-force-approach, but an informed search-algorithm that has to be tailored for each game it should be applied to.
I think, finding a good evaluation-function is the lesser problem, and the real problem is finding a good way to represent the game as a graph. The more gameplay nuances you omit, the less nodes your graph will have (thus also skipping some nodes that could lead to a shortcut).
AndiHoffi wrote:
Thus: As long as the game does not have many of these obstacles that slow down the progress, it should be very efficient to solve.
Yes, so everything depends on how you define obstacles. If your definition doesn't represent the real picture, the solution will be far from perfect, even though the searching algorithm will honestly produce the best result within the given framework.
AndiHoffi wrote:
1.) Detection of duplicate states
Many button-combinations lead to an equal state of the game.
If you're going to compare hashes of entire RAM, you likely won't ever get a two duplicate states, because the input often affects various irrelevant things.
So, to get anywhere, you'll have to filter out irrelevant variables (or at least those that you think are irrelevant), and this is the point where you build a simplified model of the game, thus risking to produce a solution that is only perfect in terms of your model.
By the way, have you checked this? http://tasvideos.org/forum/viewtopic.php?t=13996 The guy there found an interesting way to turn videogames into formalized tasks. Of course his way produces extremely simplified models ("make numbers go up", huh), and the results are, naturally, rather pale, but it's already something (out of almost nothing).
AndiHoffi wrote:
2.) Detection of inefficient states
Here the evaluation-function comes into play: Each state is rated by adding the amount of frames needed to get there and the (estimated) least amount of frames needed to complete the level (e.g. until after the fireworks have ended). Then the state with the smallest sum will be evaluated.
OK, this evaluation-function is pretty generic, so it will likely work, as in, produce the generic "run right for justice" movie.
AndiHoffi wrote:
* At the start of each level, turn around, jump, and then start moving forwards
This should IMO be easily found since AFAIK it pays off after a few seconds only.
Yes, but the algorithm won't even attempt to check those alternative actions, unless there's an obstacle that can't be walked around in less number of seconds. So, to ensure the perfection, you'd need to have some huge obstacle at the end of the level, one that would force the algorithm to try different routes, starting from the very beginning of the level. Needless to say, this would create a huge problem with performance.
I have implemented A* for my own game work, thanks. It's not a very complicated algorithm.
Good to know. That's makes the discussion a lot easier. Where did you try it? And how did it perform? Did you implement it directly in an emulator or did you use some interface?
I'm not talking about playing in an emulator, I'm talking about a game I'm (slowly) developing along with members of the Angband development community. Specifically, the game is Pyrel, a 2D roguelike (dungeon crawler). A* was used for AI pathfinding; however, it wasn't fast enough (Pyrel may have dozens or hundreds of enemies on a level, and running A* for each one is not plausible), so we switched to a heatmap approach.
* At the start of each level, turn around, jump, and then start moving forwards
This should IMO be easily found since AFAIK it pays off after a few seconds only.
Did you miss the part where I pointed out that maintaining 1 second's worth of possible paths in an NES game requires more memory than exists in the entire universe? All of your other "A* should find this automatically" comments fall under the same problem. There is such a massive potential search space in a videogame that A* simply is not a plausible way to find optimal solutions. Using A* to pathfind through the game requires the user to simplify the problem space to the extent that it cannot match the results that can be achieved by even a mildly experienced TASer working "by hand".
Let's put it another way: A* boils down to a breadth-first search. You examine all the possible paths to a goal until you find the first one that hits that goal. There's also some optimization in there to prioritize the order in which paths are processed, and to deal with the situation where an intermediate state can be reached from multiple prior states. However, that prioritization cannot reject states, but merely say that a given state is more likely to lead to a short path than another.
In a videogame, there are functionally no situations in which you can reach the same intermediate state from multiple prior states. If nothing else, the game timer will ruin you, since it advances every unpaused frame. Which means that practically possible action from every state (i.e. node) is unique, which leads to the "requires more memory than exists" problem. Remember, 1 second of gameplay requires (2^8)^60 = 3 * 10^144 possible states, i.e. more than the number of atoms in the universe.
Even a tenth of a second ((2^8)^6) requires 2.8 * 10^14 possible states. If each state took only a single byte, you'd need 256TB of RAM. For one tenth of one second of gameplay. If you had one of the world's most powerful supercomputers you might have that much RAM at your disposal. And you still wouldn't be able to consider any deviation from the "optimal path" that takes more than a tenth of a second to pay off.
Do you understand now why these huge numbers make optimal solving of games completely infeasible?
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
In short, unless you have a heuristic that can tell for sure if a partial path is part of the optimal solution, no path searching algorithm can find the optimal solution faster than by (eventually) trying all possible combinations.
(After it has found one solution, it can reject all partial solutions that are longer than that, but in the case of a video game, that solution will be enormously long, and as said, the search paths suffer from exponential explosion, and any such path can potentially be the optimal one.)
A* is good for navigating simple 2d bird's eye mazes. As soon as the character has abilities that affect movement, there's damage boosting, whatever more complex interactions, it's not going to perform very well when it comes to finding the optimal solution. If you wanted to convert a jump'n'run into a 2d maze it could solve, then you'd have to first invent an algorithm to do that (good luck), but even if you succeeded at that, the resulting 2d maze would only represent our current knowledge of the game. It wouldn't be able to discover anything new. If you allow it to discover new glitches/abuses, then that means it has to try close to absolutely everything, and that won't ever work.
I think an AI capable of finding faster solutions than a good TASer would have to be very intelligent, far beyond a simple A*-search algorithm.
This is highly tangential, so I've been sitting on it for a while, but when it comes to navigating 2-D maps, the jump point search (which augments A*) is orders of magnitude faster than A* alone while remaining optimal.
As for using A* to solve any game, I reiterate that we have this conversation every few months. For a short game, it requires more computations than there are Planck volumes in the observable universe times the number of Planck times since the birth of the universe. Can we please move on to another topic?
The 7th Saga TAS Nitrodon and I did used an A* or similar algorithm to traverse between towns on the grid while avoiding enemies.
Nitrodon gets the credit for the original script, but I did some modifications with the randomness and brute forcing.
To my understanding it first loaded the map from ROM and distinguished between walkable and non walkable tiles. Following this it determined the minimum number of steps required to get from point A to point B via a shortest path algorithm. As the algorithm ran, it rechecked to see that taking a given step made the path one step shorter.
As enemies literally get in the way due to the game's encounter system, any time we would run into an encounter, we would reload a past state and take a different path. Originally the step direction was deterministic, but as we found the need to manipulate enemies on the screen to alter RNG incrementation, we switched to a randomly chosen step sequence with some randomly inserted polarization towards the extremes. We needed to start a few boss fights on extremely specific RNG values, as boss fights are entirely deterministic.
Using a shortest path algorithm in our TAS saved a ton of work, and using it to influence RNG incrementation produced some otherwise nearly impossible results. In one case I ran the algorithm like 2000 times and saved the lowest number of RNG incrementations while still maintaining shortest path.
Using a shortest path algorithm to solve a game entirely is pretty much impossible as most others have discussed in this thread, but I would like to highlight that in a specific case usually of map traversal, especially on a grid, it can be extremely useful.
Neat post, Kirkq! It's always interesting to hear about formalizing and automating portions of a TAS. You're right, A* is great for this kind of constrained case.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.