Post subject: Programming: serializing game state
Joined: 7/2/2007
Posts: 3960
I've been working on Pyrel, a roguelike implemented in Python, off and on for a bit over a year now. And it's now about time for me to start working on implementing saving and loading of the game. Yeah, you'd have thought I'd get to this point earlier; oh well. I've run into a bit of a snarl and would appreciate insight from the many experienced programmers here. I feel confident that I can write code that will serialize/deserialize game objects and the formal relationships between them (i.e. this object contains this other object, etc.). Where I run into problems is with function pointers, and worse, lambdas. For example, I have some game objects that are simply timers: they count down until their timer hits 0, then they invoke a function. So their game state looks something like this:
Language: python

"id": 10024 # Auto-incrementing; unique across objects "duration": 10 "func": [bound method Foo.foo of [__main__.Foo instance at 0x1004a73f8]]
or, with lambdas:
Language: python

"func": [function [lambda] at 0x1004a1de8]
(Pardon, had to replace < with [ in the code because otherwise it broke when displayed) In the former case, I should be able to map from 0x1004a73f8 back to a game object (since the serialization process needs to touch every object anyway, it's straightforward to get a mapping of object addresses to those objects), get the object's ID, and generate a serialization that says "Okay, you need to call the "foo" method of the object with ID 9782345". Then the deserialization process would have access to the recreated object with ID 9782345, and thus can look up its "foo" attribute, to recreate the function pointer. It'd be pretty dang hacky, but I think I could do it; I'd like to hear comments on this though. In the latter case, as far as I can tell, I'm completely boned. Lambdas contain both bytecode and context. Bytecode is verboten as far as I'm concerned -- as soon as you have any bytecode (or any text that you compile into code, etc.) in your savefile, you leave yourself open to malicious users distributing savefiles that do nasty things to unsuspecting players. Would this ever happen in practice? Probably not. But it's bad design anyway; for the same reason, Pyrel's datafiles are JSON instead of pure Python, even though the latter would be easier to load. And the lambda's execution context is almost as bad -- a single lambda could easily contain basically the entire program context. Am I missing anything here? I've just about convinced myself that I have to ban lambdas from any part of the program that needs to be serialized, but I'd love to be wrong. And I'm not especially happy with my proposed approach to [de]serialization of normal function pointers, but I can't think of a cleaner solution. Feedback is welcome.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Editor
Joined: 3/31/2010
Posts: 1466
Location: Not playing Puyo Tetris
If I was doing a roguelike, I would have save files in XML format. That allows me to separate things neatly into groups.
When TAS does Quake 1, SDA will declare war. The Prince doth arrive he doth please.
Joined: 7/2/2007
Posts: 3960
No offense intended, but that's a bit like someone asking for help with a math problem and being told they should try using a pencil. The file format will likely be JSON, in any case, since that's what is used for the other datafiles that Pyrel uses.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Patashu
He/Him
Joined: 10/2/2005
Posts: 4043
What if you require every lambda used to be registered with the Official Bureau of Lambda Organization, including 1) providing/pointing to a method to return a lambda with the same bytecode 2) exposing the context for serialization I'm a bit hazy on how this would work, or if this is even a thing that makes sense, but clearly as you say you can't serialize and deserialize arbitrary bytecode, so you can only create bytecode from methods coded into the game and determine which bytecode you make by what the serialized version of the lambda says needs to be made.
My Chiptune music, made in Famitracker: http://soundcloud.com/patashu My twitch. I stream mostly shmups & rhythm games http://twitch.tv/patashu My youtube, again shmups and rhythm games and misc stuff: http://youtube.com/user/patashu
Banned User
Joined: 3/10/2004
Posts: 7698
Location: Finland
hegyak wrote:
If I was doing a roguelike, I would have save files in XML format. That allows me to separate things neatly into groups.
Yes, because XML is the only standardized format in existence that allows separating things neatly into groups. (At the cost of making the data take 10 times more space than it needs to.) The only possible reason to use XML would be if you want the savedata to be easily read and parsed by third-party programs. However, even for that purpose there are much better and compact standardized formats.
Former player
Joined: 2/19/2007
Posts: 424
Location: UK
I'm not experienced with this, but what you say makes sense, Derakon. I think you will have to avoid lambdas if you want the kind of automatic serialization you are thinking of. Of course, if you do things the old fasihoned way, with manual serialization and deserialization, you can use as many lambdas as you want, but instead have to spend lots of maintenance time making sure all relevant variables are being saved and restored correctly. By the way, your problem with "<" is probably due to not checking "Disable HTML code in this post".
Player (42)
Joined: 12/27/2008
Posts: 873
Location: Germany
You could use enums for each function pointer that can be called, and serialize the enums instead. I think this is less hacky than the solution you propose, but suffers from similar problems, since you have to maintain the enums together with the functions. I understand you want to use Python's serialization API for all this, so it seems to me that the best solution is to forgo function pointers and use Java-style dependency injection. Basically you make all function pointers that can be called in the game object derive from an abstract base class, and inject this abstract class into the game object class. In this case, you'd be serializing a class, with the same effect of a function pointer. I also agree that you should avoid lambdas, using them is not worth the security risk, but you seem to have some intense coupling if there are lambdas that capture the whole application context.
Joined: 7/2/2007
Posts: 3960
I actually don't want to use Python's serialization library (pickle), precisely because pickle lets you serialize code. As soon as you have "pickle.load()" in your code, you have a potential security vulnerability. Since players may distribute savefiles between each other, it is theoretically possible for someone to distribute a malicious savefile that would, when loaded, do Bad Things. This is almost certainly paranoid, but I consider it a good design principle to avoid known security holes. :) I'm willing to use Python's json library, though, since loading JSON objects is perfectly safe. In fact the main reason Pyrel uses json instead of, say, YAML, is because Python has the json library built-in, so it's one less dependency that prospective contributors have to install. As for intense coupling (ahem), it's more that there is a single object in the game state that contains all other game objects (it's the game map, and also serves as a database of sorts). Any lambda that has that object in its context perforce has the entire game in its context, whether or not it uses it. Unless I misunderstand how lambda contexts are set up. Enumerating the allowed serializable function pointers seems kind of brittle; every time you introduce a new function that you want to be able to serialize, you have to add it to the enum. I expect in practice that means you'd have a file in the project that knows about a huge proportion of the functions in the codebase. Making serializable function pointers into their own separate objects is a bit better (since you don't have that centralized list of things-that-can-be-serialized), but then you have to deal with giving the function access to member variables in its parent. Doable, but probably pretty fiddly. Maybe a decorator function could do the job, though? Basically it sounds like I can either have an "automagic" hacky system, or a more rigorously formalized system; the latter will be cleaner / more explicit, but requires significant extra work for anything that wants to be serializable. Hm.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Joined: 1/26/2009
Posts: 558
Location: Canada - Québec
If you want add a better security level you can simply make some hashcode function to generate some string from the serialized object, then you add this string to your favorite file format. Then, when you want to restore the serialized object you should be able to valid if the file is fine. Thought, some hacker could still been able to modify their internal game state(with external program) and ask Pyrel to generate the savestate, so a malicious game function could be triggered on the first execution...(or simply do all the fuzzy math manually to get the right hashcode) But, that's still very tricky and I believe this is a perfectly good enough solution if you don't want to do some custom rules everywhere that might require some heavy maintenance.
Joined: 7/2/2007
Posts: 3960
Alas, Pyrel is open source, so as you say, the hacker could just make a modified version of the game that inserts their malicious code into the savefile and then generates an appropriate hash/checksum.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Player (42)
Joined: 12/27/2008
Posts: 873
Location: Germany
Derakon wrote:
Enumerating the allowed serializable function pointers seems kind of brittle; every time you introduce a new function that you want to be able to serialize, you have to add it to the enum. I expect in practice that means you'd have a file in the project that knows about a huge proportion of the functions in the codebase. Making serializable function pointers into their own separate objects is a bit better (since you don't have that centralized list of things-that-can-be-serialized), but then you have to deal with giving the function access to member variables in its parent. Doable, but probably pretty fiddly. Maybe a decorator function could do the job, though?
Well, the enum solution has the advantage of being simpler, and as you said it can put everything that's serializable in a single file. I think that's convenient to have, but I love coding in plain C and I'm not your average OO proponent, so take what I say with a huge grain of salt :) Anyway, I agree that it doesn't scale well. If you want persistence for a large number of things, you'll likely have to drop function pointers and resort to classes some time. I'm not into design patterns, so I don't know what's a decorator, but I'd solve these access problems by passing references/pointers to the variables the function needs to modify, or declare them protected and make these functions a method of a subclass.
Joined: 7/2/2007
Posts: 3960
Decorator functions are functions that modify other functions; there's some nice syntactic sugar in Python for applying them. For example:
Language: python

@callInNewThread def doSomething(): .... def callInNewThread(func): def wrappedFunc(*args, **kwargs): threading.Thread(target = func, args = args, kwargs = kwargs).start() return wrappedFunc
callInNewThread is a function that accepts functions as input and returns functions as output. In this case, it takes doSomething() as its input, and returns as its output a function that calls doSomething() in a new thread. As far as serializing, I'm not entirely certain how it'd work but I guess the decorator could simply act to register the function as a serializable function? The actual operation of the function need not be modified; the decorator would just note it down.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Banned User
Joined: 3/10/2004
Posts: 7698
Location: Finland
BadPotato wrote:
If you want add a better security level you can simply make some hashcode function to generate some string from the serialized object, then you add this string to your favorite file format.
Nothing would stop a hacker from generating such a valid checksum for his malicious savedata. (Sure, if the program is compiled, rather than interpreted, and it uses a custom hashing function, it becomes more difficult to find out how the checksum is calculated. However, it's not impossible. If the CPU can read the code, a hacker can too. It may not be trivial, but it's far from impossible. Such a solution would basically be "security by obfuscation" which is not a good idea.)
Joined: 7/2/2007
Posts: 3960
Warp wrote:
BadPotato wrote:
(Such a solution would basically be "security by obfuscation" which is not a good idea.)
Well, it's not a good idea to solely rely on security through obscurity; however, that can make a useful component of making your security as a whole harder to crack. It just needs to not be the only component. Sort of like how it's reasonable to set up your SSH server on some weird random port just because most attackers will assume it's on port 22; that one minor bit of obfuscation can save you from a lot of attacks. But yes, if you're relying solely on security through obscurity then you're doing it wrong.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Joined: 1/26/2009
Posts: 558
Location: Canada - Québec
Yeah, at some point crypted data isn't good enough. Well, I checked quickly what's up in your Bitbucket repo and it seem that you already use a "command pattern" to handle must of your stuff. Why not use more of these command object instead of using some pointer function? Then I guess you can handle what kind/inherited of command should be allowed when you load a savestate. Basically, only function that deal with the current game should be accepted, right? I don't think it would make any sense if the game start adding/removing random value in the score board or mess with the some other io files.
sack_bot
He/Him
Player (112)
Joined: 11/27/2011
Posts: 394
Location: Massachusetts
I know my opinion doesn't really matter, but I would come up with a toBinary function for each object and then on loading recreate the classes by sending that data to a constructor. A format idea is like
{className (fixed or variable string),dataLength (32 bit unsigned int), data (data the length of dataLength)},{className2....},...
TL; DR: Use a custom binary type that acts like a pickle without allowing code in.
Message me here for my discord. Current Project: Psycho Waluigi Project on wait list: None?