Hi everyone,
I have somewhat of a fascination with writing game bots (well, game cheats in general, but bots provide the most challenge). You could say that this hobby somewhat intersects with tool-assisted speed-runs.
A while ago I found this TAS video of Kwirk, and I noticed that the solutions in the video were solved by humans, and thus could be non-optimal. I've written a solver for Kwirk (rewrote it several times, learned a lot about search in the process), and - indeed, out of the 19 levels my solver could stomach, it found better solutions for 5 levels (with the record gap being solving 2-3 in 98 steps vs. Nitrodon's 104-step solution). I plan to continue working on my solver and attempt to solve the remaining levels. So, personally I think there should be more effort in attempts to algorithmically solve some games (especially puzzles).
Anyway, the main constraint in solving Kwirk is memory, since it needs to memorize all states it has previously visited. I plan to redesign my solver so that it could swap out nodes not accessed frequently OSLT - even so, the availability of RAM would drastically affect its performance. I wondered if there is anyone in this community with access to machines with lots (>10GB) of RAM, and wouldn't mind running my Kwirk solver on them.

Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers

I don't know how you're programming it, but I would probably employ a recursive algorithm that marks each location it has been in already, and knows not to retry them, unless a block has been moved/will be moved, this could lessen the memory requirements.
If a lot of RAM is needed, perhaps you can create a distributed program? Between my PCs, I can probably muster up 15GB or so if the running time isn't too long and it was distributed.
Also, nice going with Kwirk, I did think to myself some of the levels in the published run were using suboptimal solutions.

Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.

Joined: 4/20/2005
Posts: 2160
Location: Norrköping, Sweden

This sounds really awesome. I applaude you for making a bot that finds better solutions on its own, and hopefully we'll even see this improved run in the workbench someday. I hope you can make a bot for the Adventures of Lolo games someday. :)

Joined: 1/16/2008
Posts: 358
Location: The Netherlands

Hello, nice work on your project! To avoid memory limitations you could use a different approach... but it's hard to give any advice without know what you currently have :) (quite possibly you don't want to share it yet ofc)
for example a classic depth-first tree search has very low RAM cost
I used several Sokoban (similar game) solvers for my Young Merlin TAS and found quite some differences in their outcome (too lazy to make my own ;)

Good luck on this project. I've done computer search for a couple things myself (one simple room in Eggerland:MnF and one boss in Lolo 3), but I didn't think it would be tractable for anything as complicated as a typical Kwirk (or any puzzle game) level.
I figured my Kwirk TAS might be suboptimal, mainly because it's the first TAS of the game (and was done by a human). I also found an improvement to 3-10 (426 steps, 1:14 vs 434 steps, 1:15) the day the run was published, but I'm sure a computer would do even better if it could stomach such a long level.
This probably won't affect the optimal solution too much, but times are measured in frames here instead of steps. Here are the frame counts for various actions:
Move one step normally: 9 frames
Move one step while pushing block: 10 frames
Move one step while pushing spinner: 12 frames (whether moving one space or two)
Filling in a hole: 26 frames (not including the time taken to push the block)
Switching character: 30 frames (32 if you immediately switch again)
Switching character due to finishing with one character: irrelevant, so I didn't check.

for example a classic depth-first tree search has very low RAM cost

The problem with a depth-first search solution (with a minimal amount of RAM usage) is that finding the optimal solution may take a very, very long time. What is worse, when you find a solution, there's no way of knowing if it is the optimal solution (you can only know if it's better than the solutions you have found so far during the search). Basically you have to try all possible permutations before you can be sure that you have found the optimal solution. In typical puzzles that's an O(n!)-time approach, so for puzzles with just a couple of dozen states you may have to wait until the Sun explodes before your algorithm has checked all possible permutations.
A breadth-first solution is better in that the first solution you find is also the optimal one. Its problem is that it may require a huge amount of RAM. (Of course a breadth-first solution may also take a humongous amount of time, but at least it doesn't have to check all possible permutations, only all permutations of paths at most as long as the optimal solution.)
With most puzzle solvers it's best to devise some kind of heuristic approach (preferably one which gives the optimal solution for sure, although that's not always possible).

Thanks for the replies, everyone :)
I'm currently using a breadth-first search algorithm with a fixed-size hash table. One state takes 10 bytes: 2 bytes (bitfield) to store the action required to get to that state from its parent state + the earliest frame to get to that state, 4 bytes for the parent state index, and 4 bytes for the index of the next state with the same hash (in case of hash collisions). So memory consumption is about 10*STATE_COUNT + 4*(2^HASH_BITS).
I'm thinking of rewriting it to use depth-first search (which can be limited to the number of steps of a known solution) or iterative-deepening DFS and with something like splay trees. Depth-first because with breadth-first search, it goes along the entire breadth of the state tree (and requires access to a great number of nodes), while with depth-first search the repeated nodes it'll encounter will be more "localized" and there should be significantly less "cache misses". The program would have to move out nodes that aren't frequently noticed out to slower memory.
Making the algorithm distributed would probably make sense only if network access to read or write a value in another machine's RAM would have a lower latency than HDD.
Nitrodon: thanks for the info - actually, my timing for filling a hole was wrong for some reason (18 frames instead of 26). I wonder if that would affect any solutions (my solver counts frames, not moves).

is their anyway you could do like nasa did once and distribute the testing between more than one machine, without them both checking the same thing in the same order? I have a computer that does virtually nothing but it has no where near 10 gigs of ram. Good luck with your bot though.

is their anyway you could do like nasa did once and distribute the testing between more than one machine, without them both checking the same thing in the same order? I have a computer that does virtually nothing but it has no where near 10 gigs of ram. Good luck with your bot though.

You could have each worker "check out" blocks of solutions to calculate. There's a distributed WEP cracker that does this: each worker checks out a set of keys, then tries to decrypt using each key. If the working one isn't found, it checks out another block.

Joined: 3/11/2004
Posts: 1058
Location: Reykjavík, Ísland

Warp wrote:

What is worse, when you find a solution, there's no way of knowing if it is the optimal solution (you can only know if it's better than the solutions you have found so far during the search). Basically you have to try all possible permutations before you can be sure that you have found the optimal solution. In typical puzzles that's an O(n!)-time approach, so for puzzles with just a couple of dozen states you may have to wait until the Sun explodes before your algorithm has checked all possible permutations.

Am I not understanding, or is this a slight exaggeration? You don't need to check *all* possible permutations. If there is at least one known solution, then the depth-first search need only go so deep. It doesn't need to find any permutations longer than the known solution. So after a certain number of steps, if a solution has not been reached, it can stop checking anymore in that direction because it will always be longer than a known solution anyway.

Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers

CyberShadow wrote:

Making the algorithm distributed would probably make sense only if network access to read or write a value in another machine's RAM would have a lower latency than HDD.

Over Gigabit Ethernet? I haven't really done latency tests, but it's definitely faster at transfer times. The algorithm should also ideally be threaded. I can easily get 10 cores working together on something.

Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.

Maybe a stupid comment, but if you want to play with more than 4gigs of RAM, shouldn't you use 64-bit pointers instead of 32-bit?

Precisely, that's why I use 32-bit array indexes instead of 64-bit pointers.
bzb95 wrote:

is their anyway you could do like nasa did once and distribute the testing between more than one machine, without them both checking the same thing in the same order? I have a computer that does virtually nothing but it has no where near 10 gigs of ram. Good luck with your bot though.

The problem with parallelizing the solver is that all instances need to have access to the set of states already examined, otherwise each instance would repeat the work of other clusters whenever they reached the same state. For example, consider that for level 3-2 my solver had to examine 97377428 nodes to find a solution in 180 steps. With no knowledge of previously-examined states, the number of states it'd have to examine would be D^{N}, where D is the average number of valid moves from each state and N is the number of moves from start to finish. Even with a very conservative value of 2 valid moves per state, we're looking at astronomical values.
ccfreak2k wrote:

You could have each worker "check out" blocks of solutions to calculate. There's a distributed WEP cracker that does this: each worker checks out a set of keys, then tries to decrypt using each key. If the working one isn't found, it checks out another block.

The thing is, problems like cracking a key are fully parallelizable (in that they don't share any data between processes). In language semantics, the function to verify a possible solution is "pure" in that it does not refer to external data (data not on the stack). As I explained above, solving Kwirk requires access to the set of examined states to prevent duplicate work.
Derakon wrote:

That's better than arbitrary depth, but you still have no guarantee that you'll find the fastest solution.

With a depth-first search, you simply need to limit the search to the depth of the best solution - then simply search the entire breadth of the tree. For iterative-deepening depth-first search this isn't a problem, unless you increase the depth by more than one depth unit per iteration. Iterative-deepening DFS probably doesn't justify itself for this problem anyway.
Nach wrote:

The algorithm should also ideally be threaded. I can easily get 10 cores working together on something.

Yep, it's threaded, but CPU performance isn't the bottleneck :)

One state takes 10 bytes: 2 bytes (bitfield) to store the action required to get to that state from its parent state

So you're storing a separate state for each step? That may be more than you need.
The puzzle isn't solved by walking, but by interacting with the objects (boxes, spinners, level exit, ..), so I propose to store a state only for every interaction done.
Instead of checking every possible walking direction, identify every tile from which an interaction can be started. Use dijkstra or something to determine if they're reachable without interactions, and if they are, the number of steps to reach them.
The states would look something like this
State 1:
parent: none
frame: 0
action: NONE
action from: none
standing at (x, y)
State 2:
parent: state 1
frame: 1234
action: SPINNER
action from: (x, y)
standing at: (x+2, y)
State 3:
parent: state 2
frame: 4242
action: PUSH
action from: (x, y)
standing at: (x, y+1)
your states will consume more memory each, but you'll need a lot less states and the tree is comparably flat.
Or maybe I overlooked something and you have your reasons for doing things the way you do? Or maybe I completely misunderstood your idea?
On another note, have you employed manual optimizations already? Adding checks like "the long block must not be sunk anywhere but at this specific location" or "if more than two small blocks are stuck in a corner, it's game over" could cut some branches a generic algorithm would not.

I suppose you could also shorten the solution from the end of the level; that is, moving the last reference point to a guaranteed optimal position closer to the beginning. Though I guess you've already done that.

Warp wrote:

Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.

Do the set of states have to be saved on the ram? Couldn't you maybe use a online storage drive as virtural ram on all the computers your running your bot. Like gmail drive where you download the software and it makes your computer think its a drive directly hooked up to your computer.
I dont know to much about programming so sorry if this is a stuiped question.:)

Do the set of states have to be saved on the ram? Couldn't you maybe use a online storage drive as virtural ram on all the computers your running your bot. Like gmail drive where you download the software and it makes your computer think its a drive directly hooked up to your computer.
I dont know to much about programming so sorry if this is a stuiped question.:)

That wouldn't work, he could just store the data on his HD instead of an online storage, but it's limited in size and in speed.
--
You may want to add some condition for solving the puzzle. Like Maximum number of state, every object that should be move. In which order if there's one.
Try to remove as many useless combinaison as possible. It might be faster that way. In any case, you should at least always find a solution that is the same number of step as the published run.

I've tried something similar to this with Shining Force 2. I found the largest difficulty to be the emulation time. Unfortunately, even with Gens.emulateframeinvisible(), I can only cut the time down by maybe 10-20x real-time. Additionally, a full search starts with very, very stupid moves by the algorithm.
I think the only way to brute force the game would be to rewrite the calculation aspect of the game in C - much like you intended to do for Puzzle Quest and did for Asteroids. Unfortunately, the AI routines are extremely cumbersome, and would require far more coding ability than I have.
Perhaps, after you finish this project, you may be interested in another?

Sage advice from a friend of Jim: So put your tinfoil hat back in the closet, open your eyes to the truth, and realize that the government is in fact causing austismal cancer with it's 9/11 fluoride vaccinations of your water supply.

One state takes 10 bytes: 2 bytes (bitfield) to store the action required to get to that state from its parent state

So you're storing a separate state for each step? That may be more than you need.
The puzzle isn't solved by walking, but by interacting with the objects (boxes, spinners, level exit, ..), so I propose to store a state only for every interaction done.

There are 5 possible steps (including "switch character"), so 2 bytes would be wasteful if he were storing a separate state for each step. Conversely, storing a location would take 9 or 10 bits, and storing an action and direction with it would still easily fit within 2 bytes. Hence, I assume he's already doing as you proposed.

Instead of checking every possible walking direction, identify every tile from which an interaction can be started. Use dijkstra or something to determine if they're reachable without interactions, and if they are, the number of steps to reach them.

That's a good point. I've briefly thought of this back when the main bottleneck was execution speed, but hadn't thought of this since. In more abstract terms, the idea is to divide the problem into "atomic subproblems", which would remove intermediary nodes in a subtree. I'll look into this.
Tub wrote:

On another note, have you employed manual optimizations already? Adding checks like "the long block must not be sunk anywhere but at this specific location" or "if more than two small blocks are stuck in a corner, it's game over" could cut some branches a generic algorithm would not.

ZeXr0 wrote:

You may want to add some condition for solving the puzzle. Like Maximum number of state, every object that should be move. In which order if there's one.
Try to remove as many useless combinaison as possible. It might be faster that way.

I have some optimizations, such as "fake walls" (free space that shouldn't be visited or occupied by a block) or "boring holes" (holes that should never be filled). You'd need to be careful in determining "lose conditions" for levels, as there might be a clever solution you haven't thought of. I'll see if it's possible to add more optimizations for levels.
moozooh wrote:

I suppose you could also shorten the solution from the end of the level; that is, moving the last reference point to a guaranteed optimal position closer to the beginning. Though I guess you've already done that.

That would require knowing the exact state of the rest of the level (blocks, rotators, filled holes), so it's not really feasible.
DarkKobold wrote:

Perhaps, after you finish this project, you may be interested in another?

If I'll find the time, sure, this community seems welcoming enough :)
Nitrodon wrote:

There are 5 possible steps (including "switch character"), so 2 bytes would be wasteful if he were storing a separate state for each step. Conversely, storing a location would take 9 or 10 bits, and storing an action and direction with it would still easily fit within 2 bytes. Hence, I assume he's already doing as you proposed.

Currently the 2 bytes are a bitfield and store the action (one of left/right/up/down/switch - 3 bits) and the earliest frame one could get to this state (15 bits). This is necessary as, even with breadth-first search, it might be possible to reach a state, then find a shorter path to that state.

This reminds me of the time I strived to write a solver for PeterBox
― http://tasvideos.org/forum/t/2614 ―
― http://bisqwit.iki.fi/jsgames/peterbox/ ―
which is a Sokoban clone and incorporates many of its puzzles.
BisqBot managed to solve find the optimal solution for a total of 13 rooms, but the rest were exponentially beyond its reach due to the amount of RAM needed to track all the combinations it has yet to visit, even though I used extremely sophisticated methods to minimize the RAM usage, starting from premapping out all 1-block, 2-block, 3-block, etc. combinations up to 8 where the puzzle becomes definitely unsolvable.
Unfortunately, the C++ code seems to have suffered some bitrot, and no longer compiles.
If someone wants to take a gander, here's the source code of the completer. Note: It produces a lot of output. Once you get it to compile, the following steps may help you (if memory serves me right):
– The dead*.dat files can be deleted if you want; completer simply regenerates them on demand. This takes time, though.
– While it is generating the dead*.dat files (e.g. "Collecting Dead4 data"), completer can be interrupted safely, and it will resume later from that point.
– Edit settings.hh and set CHOOSE_LEVEL to your choice (try some easy one first, such as 1
– Completer produces a lot of output, and it is best to redirect into some file, e.g. ./completer > logfile.txt. Observe this log file with e.g. tail -f logfile.txt. Note that the output includes ANSI color codes
– Interpreting the process report: A=How many situations queued for checking; S=how many layouts seen and timestamped; ign=how many queued situations skipped due to finding a faster way of reaching the same layout; Tign=how many operations disqualified due to exhausting the time limit; targ=how many operations tested on this round; bal=difference of the A value to that of the last round; dead=list of hit counts in each dead data table. Layout=set of positions of player and the blocks; Situation=layout + timestamp + history of moves done to reach that layout; operation = kicks and pushes from each reachable position
– If your PC is reaching the limits of its RAM capacity and the A value still keeps growing at a fast rate, it is likely that the solution will not be found within a week of runtime, because once it goes to swap, the process will slow down by several orders of magnitude. Also, if your OS is 32-bit, the 2 GB address space limit will probably hit you all too soon, terminating the program, in all but the simplest puzzles.
EDIT: Here: http://bisqwit.iki.fi/jsgames/peterbox/completer-sources.7z

What is worse, when you find a solution, there's no way of knowing if it is the optimal solution (you can only know if it's better than the solutions you have found so far during the search). Basically you have to try all possible permutations before you can be sure that you have found the optimal solution. In typical puzzles that's an O(n!)-time approach, so for puzzles with just a couple of dozen states you may have to wait until the Sun explodes before your algorithm has checked all possible permutations.

Am I not understanding, or is this a slight exaggeration? You don't need to check *all* possible permutations. If there is at least one known solution, then the depth-first search need only go so deep. It doesn't need to find any permutations longer than the known solution. So after a certain number of steps, if a solution has not been reached, it can stop checking anymore in that direction because it will always be longer than a known solution anyway.

True, but the depth-first algorithm can potentially perform enormously more work than a breadth-first solution.
For example, if the optimal solution is 10 steps long, but making the wrong choice at the beginning results in a million other solutions of about one million steps, the depth-first solution might make that wrong choice at first, go through the million suboptimal solutions before it comes back to the beginning and makes the correct choice and finds the 10-step solution. A breadth-first approach would quickly find the optimal solution.
Another problem with the depth-first approach is that it might take (after a small preceding detour) the same path than a previous sub-search already checked, and it has no way of knowing that there's no solution there. So it will check that path again, to no avail. (In order to avoid this, it would be necessary to store in memory all the paths which have already been checked, which is what the depth-first solution was trying to avoid to begin with.) A breadth-first search can check that no search branch merges with another branch.
This is not to say that the breadth-first approach is optimal and fast. Its problem is, of course, that it quickly consumes enormous amounts of memory if the optimal solution is even slightly long, and especially if the search space branches a lot. In the worst case the breadth-first approach cannot find the solution at all because it runs out of physical memory (while a depth-first approach could find the solution eventually).
Both are brute force approaches, and thus have the same problem. OTOH with many puzzles only a brute force approach can be sure to find the absolutely optimal solution, and a faster heuristic approach cannot make that promise (iow. if the problem is provably NP). Sometimes there are ways to speed up even a brute force approach by taking advantage of how the puzzle works, which can often be used to reduce the constant factor of the computational complexity (even though it doesn't change the complexity itself).

For example, if the optimal solution is 10 steps long, but making the wrong choice at the beginning results in a million other solutions of about one million steps, the depth-first solution might make that wrong choice at first, go through the million suboptimal solutions before it comes back to the beginning and makes the correct choice and finds the 10-step solution. A breadth-first approach would quickly find the optimal solution.

That's not a problem if the problem and implementation allow cheap "reparenting" of a subtree (which is the case with my solver).
Warp wrote:

Another problem with the depth-first approach is that it might take (after a small preceding detour) the same path than a previous sub-search already checked, and it has no way of knowing that there's no solution there. So it will check that path again, to no avail. (In order to avoid this, it would be necessary to store in memory all the paths which have already been checked, which is what the depth-first solution was trying to avoid to begin with.) A breadth-first search can check that no search branch merges with another branch.

I don't understand - how would it do that, without knowledge of all visited states? Are you talking about detecting branch merging only at the same depth (which doesn't really achieve much)?
Warp wrote:

This is not to say that the breadth-first approach is optimal and fast. Its problem is, of course, that it quickly consumes enormous amounts of memory if the optimal solution is even slightly long, and especially if the search space branches a lot. In the worst case the breadth-first approach cannot find the solution at all because it runs out of physical memory (while a depth-first approach could find the solution eventually).

As I've mentioned earlier in this thread, the approach isn't really very different if you track visited states to prevent duplicate work (and you NEED to to prevent exponential complexity) - and, of course, cheap "reparenting". Breadth-first search requires you to maintain a queue of nodes to examine, but this queue hardly consumes a considerable amount of memory compared to the set of examined states.

I have access to an 8-core Xeon w/ 48Gb of RAM. Its my work computer, so I don't know if I'd be able to run any non-work approved software. I can check with the boss, possibly. The computer tends to sit idle on weekends anyway.

Sage advice from a friend of Jim: So put your tinfoil hat back in the closet, open your eyes to the truth, and realize that the government is in fact causing austismal cancer with it's 9/11 fluoride vaccinations of your water supply.