Posts for CyberShadow


Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Well, since I've been dragged into this discussion already, I might as well ask: what did you mean when you said
the layout of the game's memory changes too much
? Hourglass could wrap memory management routines and simply never unmap any memory behind the scenes (and instead keep the address space so it could restore the memory at that address if needed). Or is the problem due to memory regions passed to sound/video APIs (surface/sound buffer memory)?
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Um, nitsuja has said that his system does not require reverse-engineering and isn't aimed at any particular game.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Haven't solved 2-2 yet. I'm almost done with my new algorithm though. I minimize time (frames), as I mentioned here.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
I have mentioned here why I don't think that a VM is viable for practical TASing.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
I am green with envy that someone has beaten me to it. However, I'll be following this closely and possibly even contribute at some point.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Cave Story has been nominated. That, or Multimedia Fusion, because all of Nifflas' games are made with it. :D
Post subject: TASing PC games: a TAS mod for No Quarter + further thoughts
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
(May the mods forgive me for opening yet another PC TAS topic. I'm new here.) I decided to take a break from solving Kwirk and try something different: I created a TAS "mod" for a Windows mini-game (Cryptic Sea's "No Quarter: V-Sides", and specifically the "Hitlers Must Die" preview it contains). I've posted details and the TAS video itself on my website, so I'm not going to repost the same content here. Apologies for the horrible web design. Binaries aren't currently available because I have selfishly used a proprietary commercial hooking library. Perhaps I'll eventually rewrite it to remove that limitation. Anyway, this isn't as much of an achievement as it is a basic example demonstrating that TASing some PC games is possible to do in a clean, deterministic, legal way that doesn't require enormous overhead due to emulation. Surely, there are a lot of 'if's with this model - the TAS extension would have to manage all resources that are allocated/destroyed by the game during the course of a replay, such as heap objects, handles, video surfaces, etc. - however, as this example shows, this may not even be necessary. Another thing: there are a few popular game-oriented virtual machines such as Game Maker and Multimedia Fusion. Creating a generic TAS system for such a VM would allow TASing all games created with that system. This approach at TASing PC games has several advantages over the more generic ideas of using hardware-level virtualization software, such as VMware or QEMU. The first and obvious advantage is speed - there is no virtualization overhead. Save-states are large and contain memory that's not directly related to the game (loaded OS files, filesystem cache, etc. - most often even free memory is dumped as the VM doesn't understand the OS' memory management). Replays themselves could be unnecessarily large, as they'd have to contain detailed input from all input devices. Timing is another problem - the VM software couldn't understand game input frames, so you wouldn't have discrete "frames", each with its own input state, but rather have a continuous time range and you would place input events across it, and try to guess which points in this time range would affect the game over its previous "logic" frame, which I would imagine would be extremely tedious unless it is somehow magically automated. And don't forget about input delay - it does take a while for input to get propagated from the hardware all the way down to the game logic code, passing through multiple layers of BIOS/kernel/game code. So, I really think that my approach is The Way To Go. Obviously, there need to be some restrictions. My suggestions: 1) The parts of the TAS extension software directly related to TAS must be entirely open-source (it should be allowed to use closed-source but freely-available 3rd-party libraries, including compiler runtimes) 2) The TAS extension software must be buildable using freely-available tools 3) The TAS extension source code should have a high level of readability and maintainability (clean, commented code) 4) If reverse-engineering (decompilation, disassembly) was involved in the development of the TAS extension, it must be accompanied by a written permission from the game author/copyright holder (Note: using a hex editor or looking at the import table is not commonly considered as reverse-engineering. IANAL.) Thoughts?
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
I'll put it on my list :P
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Lex: I was talking about the idea of using a rerecording PC emulator running various console emulators for TAS-ing. It's not practical, it's a lot less work to reverse-engineer a closed-source emulator and add TAS capabilities.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Yes, but you would need to first determine those points (which would involve either reverse-engineering or source modification of the emulators anyway), and then you'd need to strap on something to the VM software itself to work on those input frames, rather than whatever it uses for recording. Also all this is extremely oversimplifying - a keystroke goes through MANY layers of code until it reaches the emulator (BIOS -> OS kernel-space -> OS userspace -> Windows messages -> Emulator input handling code to name a few), so factoring in these delays would be an additional problem.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Beep. I have a separate tiny project of (truely) TASing a PC game without using emulation (not Worms Armageddon). Might post about it soon. (This isn't really related anyway.) On topic: I don't think this is a good idea, because video/input frames on VMs wouldn't correspond to those on emulators. It'd be hell to time the keystrokes correctly to correspond to emulator input frames.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Lex's movie/screenshot links to lex.clansfx.co.uk have been restored (as they've been broken for some time). Also, the mission records page has moved to the Worms Knowledge Base. It's not impossible that in the not-too-far future we'll make a new, updated demo version of W:A, with the capability to play replays, which would mean that you wouldn't need to own the game to watch/validate the replays.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Warp wrote:
CyberShadow wrote:
You're saying that an algorithm that enters an infinite loop is broken while an algorithm that takes zillions of years to run is not.
Isn't that rather obvious? If the algorithm enters an infinite loop and never finds a solution, it is broken. If the algorithm always finds a solution, then it's not broken. Computational complexity has nothing to do with whether an algorithm is correct or broken. I don't even understand what is it that you are arguing here. It's like you are comparing a working DFS implementation (one which eventually finds a solution) with a broken BFS implementation (one which may end up in an infinite loop). What exactly is it that you are claiming? That BFS always has the possibility of entering an infinite loop while DFS doesn't? Or something else?
This is why I said this is a semantic argument. If an algorithm won't solve my problem within my life-time, it's just as "broken" for me. Your original claim that keeping the entire search tree in memory is an innate property of BFS is sustained only by your claim that on graphs it would enter an infinite loop (which isn't even true, as cwitty pointed out).
Warp wrote:
That depends on the algorithm in question. Some graph searching algorithms can be performed in linear time using DFS, others have higher computational complexities. For example the problem "find a path from node A to node B in an undirected graph" can be solved in linear time using DFS. (Why? Because you can raise a flag in each node every time you recess in the DFS so that you don't consider that node anymore. Thus you will perform at most n steps during the search, where n is the total amount of nodes.)
How is that different from "keeping the entire search tree in memory" (aside not having to store the nodes' "parents")?
Warp wrote:
Of course that's a rather simple problem. More complicated problems (such as "find the shortest path from node A to node B") may require more complex algorithms. It really depends on the nature of the problem.
I don't understand how is this related. Please provide specific examples and show how they're related to this discussion.
Warp wrote:
That assumes that both algorithms perform approximately the same amount of work in total. However, DFS might perform enormously more work than BFS in certain cases.
There are no "certain cases" here other than solving Kwirk levels, and I'm sorry if you misunderstood that I was speaking generally. We already have an "upper bound" - the length of an already-known solution. As better solutions are found, the maximum search depth is reduced further.
Warp wrote:
Note that I'm not really advocating either one. I'm just pointing out their benefits and drawbacks.
I'm glad that we're done with abstract notions vulnerable to semantic disagreements.
Warp wrote:
The most optimal solution would be, as mentioned, to devise some kind of heuristic which would cut off most of the search tree right away (even if the overall computational complexity remains the same).
Yes, that would help, but it wouldn't need to be THE "optimal solution" (just one of several optimizations), and so far it's a "would be nice if". Also you probably didn't mean to use the term "heuristic", as it implies that it might not lead to the optimal result.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
DarkKobold wrote:
Anyway, I tried to read those articles on BFS vs. DFS, and this thread. And, they still don't make sense. Any chance you could explain them in more common terms?
I don't think I could generally explain it better than Wikipedia with its nice illustrations: http://en.wikipedia.org/wiki/Breadth-first_search http://en.wikipedia.org/wiki/Depth-first_search In our case, the nodes represent unique states of the level (the position of the players, blocks and turnstiles, as well as filled holes). The lines connecting the nodes represent the player's actions which transition the game from one state to another (for example, moving down onto an empty square creates a new state with a different player vertical coordinate than the "parent" state). The search problem represents trying all move combinations until the game reaches the state where all players have exited the level.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
This is becoming tiresome.
Warp wrote:
It is for a working BFS. Obviously if a BFS ends up in an infinite loop, it's not working. It's a buggy implementation.
Warp wrote:
A DFS only needs to store the current search branch in order to avoid infinite loops. It has no requirement to store the entire search tree for it to work.
You're saying that an algorithm that enters an infinite loop is broken while an algorithm that takes zillions of years to run is not. The complexity of DFS (with no knowledge of other nodes) over a graph is exponentially proportional to the number of loops in the graph. For a complete graph with N nodes, the complexity is N!. DFS would only be preferrable only in the unfathomable case of a very large search space (large enough for memory to be a problem) but very few loops (as each one increases the execution time exponentially).
Warp wrote:
If you are going to store all the found states anyways, then why use DFS at all? BFS finds the optimal solution faster. As I said, DFS which never discards its history has the worst of both worlds (at least when the task is to find an optimal solution).
I have already explained this when I first mentioned DFS. DFS over a search space with more nodes than available fast memory would probably be more cache-efficient than BFS, as it is more likely to have a much smaller "working set" (nodes it needs to look up). Even though it'd have to search deeper than BFS, its total execution time can be considerably faster. I've begun rewriting my solver to use DFS + caching using splay trees, but since DFS vs. BFS performance seems to be such a hot topic I think I'll take a detour and implement caching first, then benchmark it on BFS and DFS. P.S. After implementing the optimizations suggested by Tub and ZeXr0, my solver managed to solve 3-7 in 321 steps, which is quite an improvement over Nitrodon's 360-step solution! Can't wait to see how much the other solutions can be improved.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Warp wrote:
Obviously you cannot perform a BFS if you don't store the paths which have been checked so far because, as you say, you would easily end up in an infinite loop. Rather obviously you need to keep the full search tree which has been constructed so far in order to avoid going to branches which would form a loop. The difference between BFS and DFS is that the former can also check if a parallel search branch has already checked some state, while a DFS cannot (unless it also keeps the entire search tree in memory and doesn't discard anything). (The other difference is, as mentioned, that with BFS the first found solution is also the optimal one.)
This is turning into an argument of semantics. You say that keeping the entire search tree in memory is an innate property of BFS (which is not true for BFS over a tree) and not of DFS, while I say that there is no discernible difference because there is no practical difference between infinite and astronomical execution time, so for a practical search over a graph you need to store the search tree for both.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Warp: personally I don't see where you're coming from with this distinction between DFS and BFS. You can perform both a BFS and a DFS with no knowledge of previously visited nodes - for BFS you'd just keep a queue of nodes to visit, while for DFS the current "stack". Both will have the same complexity for a full visit of a tree, but in the case of a graph a BFS would probably get stuck in a loop, while DFS would have an insane complexity, so it doesn't really matter. I have mentioned DFS because in the case of limited RAM, DFS could be faster because it is more likely to work with a smaller set of nodes than BFS, for which a depth iteration spans the entire width of the tree. DarkKobold: that would be awesome, but I think it's best if I optimize my code some more first :)
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Warp wrote:
For example, if the optimal solution is 10 steps long, but making the wrong choice at the beginning results in a million other solutions of about one million steps, the depth-first solution might make that wrong choice at first, go through the million suboptimal solutions before it comes back to the beginning and makes the correct choice and finds the 10-step solution. A breadth-first approach would quickly find the optimal solution.
That's not a problem if the problem and implementation allow cheap "reparenting" of a subtree (which is the case with my solver).
Warp wrote:
Another problem with the depth-first approach is that it might take (after a small preceding detour) the same path than a previous sub-search already checked, and it has no way of knowing that there's no solution there. So it will check that path again, to no avail. (In order to avoid this, it would be necessary to store in memory all the paths which have already been checked, which is what the depth-first solution was trying to avoid to begin with.) A breadth-first search can check that no search branch merges with another branch.
I don't understand - how would it do that, without knowledge of all visited states? Are you talking about detecting branch merging only at the same depth (which doesn't really achieve much)?
Warp wrote:
This is not to say that the breadth-first approach is optimal and fast. Its problem is, of course, that it quickly consumes enormous amounts of memory if the optimal solution is even slightly long, and especially if the search space branches a lot. In the worst case the breadth-first approach cannot find the solution at all because it runs out of physical memory (while a depth-first approach could find the solution eventually).
As I've mentioned earlier in this thread, the approach isn't really very different if you track visited states to prevent duplicate work (and you NEED to to prevent exponential complexity) - and, of course, cheap "reparenting". Breadth-first search requires you to maintain a queue of nodes to examine, but this queue hardly consumes a considerable amount of memory compared to the set of examined states.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Tub wrote:
Instead of checking every possible walking direction, identify every tile from which an interaction can be started. Use dijkstra or something to determine if they're reachable without interactions, and if they are, the number of steps to reach them.
That's a good point. I've briefly thought of this back when the main bottleneck was execution speed, but hadn't thought of this since. In more abstract terms, the idea is to divide the problem into "atomic subproblems", which would remove intermediary nodes in a subtree. I'll look into this.
Tub wrote:
On another note, have you employed manual optimizations already? Adding checks like "the long block must not be sunk anywhere but at this specific location" or "if more than two small blocks are stuck in a corner, it's game over" could cut some branches a generic algorithm would not.
ZeXr0 wrote:
You may want to add some condition for solving the puzzle. Like Maximum number of state, every object that should be move. In which order if there's one. Try to remove as many useless combinaison as possible. It might be faster that way.
I have some optimizations, such as "fake walls" (free space that shouldn't be visited or occupied by a block) or "boring holes" (holes that should never be filled). You'd need to be careful in determining "lose conditions" for levels, as there might be a clever solution you haven't thought of. I'll see if it's possible to add more optimizations for levels.
moozooh wrote:
I suppose you could also shorten the solution from the end of the level; that is, moving the last reference point to a guaranteed optimal position closer to the beginning. Though I guess you've already done that.
That would require knowing the exact state of the rest of the level (blocks, rotators, filled holes), so it's not really feasible.
DarkKobold wrote:
Perhaps, after you finish this project, you may be interested in another?
If I'll find the time, sure, this community seems welcoming enough :)
Nitrodon wrote:
There are 5 possible steps (including "switch character"), so 2 bytes would be wasteful if he were storing a separate state for each step. Conversely, storing a location would take 9 or 10 bits, and storing an action and direction with it would still easily fit within 2 bytes. Hence, I assume he's already doing as you proposed.
Currently the 2 bytes are a bitfield and store the action (one of left/right/up/down/switch - 3 bits) and the earliest frame one could get to this state (15 bits). This is necessary as, even with breadth-first search, it might be possible to reach a state, then find a shorter path to that state.
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Halamantariel wrote:
Maybe a stupid comment, but if you want to play with more than 4gigs of RAM, shouldn't you use 64-bit pointers instead of 32-bit?
Precisely, that's why I use 32-bit array indexes instead of 64-bit pointers.
bzb95 wrote:
is their anyway you could do like nasa did once and distribute the testing between more than one machine, without them both checking the same thing in the same order? I have a computer that does virtually nothing but it has no where near 10 gigs of ram. Good luck with your bot though.
The problem with parallelizing the solver is that all instances need to have access to the set of states already examined, otherwise each instance would repeat the work of other clusters whenever they reached the same state. For example, consider that for level 3-2 my solver had to examine 97377428 nodes to find a solution in 180 steps. With no knowledge of previously-examined states, the number of states it'd have to examine would be DN, where D is the average number of valid moves from each state and N is the number of moves from start to finish. Even with a very conservative value of 2 valid moves per state, we're looking at astronomical values.
ccfreak2k wrote:
You could have each worker "check out" blocks of solutions to calculate. There's a distributed WEP cracker that does this: each worker checks out a set of keys, then tries to decrypt using each key. If the working one isn't found, it checks out another block.
The thing is, problems like cracking a key are fully parallelizable (in that they don't share any data between processes). In language semantics, the function to verify a possible solution is "pure" in that it does not refer to external data (data not on the stack). As I explained above, solving Kwirk requires access to the set of examined states to prevent duplicate work.
Derakon wrote:
That's better than arbitrary depth, but you still have no guarantee that you'll find the fastest solution.
With a depth-first search, you simply need to limit the search to the depth of the best solution - then simply search the entire breadth of the tree. For iterative-deepening depth-first search this isn't a problem, unless you increase the depth by more than one depth unit per iteration. Iterative-deepening DFS probably doesn't justify itself for this problem anyway.
Nach wrote:
The algorithm should also ideally be threaded. I can easily get 10 cores working together on something.
Yep, it's threaded, but CPU performance isn't the bottleneck :)
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Thanks for the replies, everyone :) I'm currently using a breadth-first search algorithm with a fixed-size hash table. One state takes 10 bytes: 2 bytes (bitfield) to store the action required to get to that state from its parent state + the earliest frame to get to that state, 4 bytes for the parent state index, and 4 bytes for the index of the next state with the same hash (in case of hash collisions). So memory consumption is about 10*STATE_COUNT + 4*(2^HASH_BITS). I'm thinking of rewriting it to use depth-first search (which can be limited to the number of steps of a known solution) or iterative-deepening DFS and with something like splay trees. Depth-first because with breadth-first search, it goes along the entire breadth of the state tree (and requires access to a great number of nodes), while with depth-first search the repeated nodes it'll encounter will be more "localized" and there should be significantly less "cache misses". The program would have to move out nodes that aren't frequently noticed out to slower memory. Making the algorithm distributed would probably make sense only if network access to read or write a value in another machine's RAM would have a lower latency than HDD. Nitrodon: thanks for the info - actually, my timing for filling a hole was wrong for some reason (18 frames instead of 26). I wonder if that would affect any solutions (my solver counts frames, not moves).
Post subject: Algorithm superiority
Experienced Forum User, Published Author, Player (6)
Joined: 10/25/2009
Posts: 22
Hi everyone, I have somewhat of a fascination with writing game bots (well, game cheats in general, but bots provide the most challenge). You could say that this hobby somewhat intersects with tool-assisted speed-runs. A while ago I found this TAS video of Kwirk, and I noticed that the solutions in the video were solved by humans, and thus could be non-optimal. I've written a solver for Kwirk (rewrote it several times, learned a lot about search in the process), and - indeed, out of the 19 levels my solver could stomach, it found better solutions for 5 levels (with the record gap being solving 2-3 in 98 steps vs. Nitrodon's 104-step solution). I plan to continue working on my solver and attempt to solve the remaining levels. So, personally I think there should be more effort in attempts to algorithmically solve some games (especially puzzles). Anyway, the main constraint in solving Kwirk is memory, since it needs to memorize all states it has previously visited. I plan to redesign my solver so that it could swap out nodes not accessed frequently OSLT - even so, the availability of RAM would drastically affect its performance. I wondered if there is anyone in this community with access to machines with lots (>10GB) of RAM, and wouldn't mind running my Kwirk solver on them.