Post subject: How to determine the formula for a random number generator?
Player (99)
Joined: 3/20/2008
Posts: 466
Location: Montreal, Quebec, Canada
Suppose you find the address(es) that govern the RNG, and know how to advance the RNG one RN at a time, indefinitely. This would effectively allow you to have a sample size of consecutive RNs as large as you want. Further assume that later RNs are generated entirely from previous RNs (say R_n+3 = (R_n+2 + R_n+1 + R_n)/3 or something similar) and not dictated by something more complex such as the character's position on the screen. Is there a way to reverse engineer a huge data sample to find the actual formula that generates the RNs? As a specific example, the random number generator for Fire Emblems 6 to 8 is as follows:
function nextrng(r1, r2, r3)
  return AND(XOR(SHIFT(r3, 5), SHIFT(r2, -11), SHIFT(r1, -1), SHIFT(r2, 15)),0xFFFF)
end
It took me a while to even understand what the formula meant, but I basically convert the 3 RNs to a binary value, perform the binary operations (shift, xor, and), and convert the number back to decimal format to yield the correct result. Is there a way a formula like that could be determined from a sample set of say, 100,000 consecutive RNs? Is there some kind of pattern recognition program you could run? Do you have to disassemble the ROM?
Joined: 7/2/2007
Posts: 3960
If you know where the random number is stored in memory, the most straightforward way to understand the formula is to read the assembly that changes that value (by, yes, disassembling the ROM). Theoretically you could reverse-engineer the formula by examining the values that the generator spits out, but since RNGs are designed to seem random, this would be a tricky project -- you'd have precious little in the way of patterns to use to figure out what was going on.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
One thing to bear in mind is that most programmers who need a random number are either going to be using some off the shelf algorithm that's well known, or use a library call which is the same. If you know when the game is made, or what language/platform it's for, you can generally narrow down what viable candidates are for an algorithm. If you can narrow it down to a dozen possibilities or less, you can for the simpler algorithms just see which if any of the algorithms has its output sequence match one of them. During the 80s through the 90s, the most popular algorithm was the LCG, or something close to it, which was included in every major library. If the game was made in the time frame such a library would be used, it's highly likely that's what the game is using. The only question is what constants is it using it with. You may be able to find that by looking at the code in question, or just running through a list of popular constants (a partial list appears on the WP article I linked to above).
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
Post subject: Re: How to determine the formula for a random number generator?
ALAKTORN
He/Him
Player (105)
Joined: 10/19/2009
Posts: 2527
Location: Italy
Vykan12 wrote:
Is there a way a formula like that could be determined from a sample set of say, 100,000 consecutive RNs? Is there some kind of pattern recognition program you could run? Do you have to disassemble the ROM?
100,000 are way too many. You should be fine after about 20, if you’re not, then I don’t think you’re gonna figure it out anyway. You can try using http://www.wolframalpha.com/ for pattern recognition. For AWDS me and my partner have been wanting to disassemble the RNG, but we’re not sure how to do that or if it’s even possible on DeSmuME. All that was figured out was through testing and mathematics, he’s a bit of a maths genius.
Post subject: Re: How to determine the formula for a random number generator?
Banned User
Joined: 3/10/2004
Posts: 7698
Location: Finland
Vykan12 wrote:
Is there a way a formula like that could be determined from a sample set of say, 100,000 consecutive RNs?
My intuition is that in the general case that's an impossible task. (And by that I don't mean "extremely difficult" or "it would take millions of years for all the computers in the world to calculate it". No, I literally mean impossible.) If you suspect that the program might be using a certain RNG, such as a Linear Congruential Generator, then it's not impossible, but it could ostensibly require a lot of time and, moreover, the answer may be ambiguous (ie. it's theoretically possible that different LCG's give that same particular pattern, but differ when you go further.) A brute force method may well require millions of years, depending on the size of the seed (for 8-bit seeds it would probably take a few seconds, for 32-bit seeds it could take thousands of years...) However, even then, there's not one single standard way of implementing a LCG, so you might actually never find it this way. By far the easiest way is to disassemble and find the actual implementation of the RNG.
Post subject: Re: How to determine the formula for a random number generator?
Editor, Skilled player (1982)
Joined: 6/15/2005
Posts: 3261
Warp wrote:
If you suspect that the program might be using a certain RNG, such as a Linear Congruential Generator, then it's not impossible, but it could ostensibly require a lot of time and, moreover, the answer may be ambiguous
Actually, if you assume that LCG is used, and it is well-chosen (it runs over all possible numbers mod whatever), then the LCG's formula can be discovered just by looking at 3 consecutive numbers and using some number theory. It would not be ambiguous, and can easily be checked for correctness using the next few values. LCGs in video games/computing very often use a modulus which is a power of 2. Note: You need to look at the entire number used in the LCG which is stored in memory, not just the upper few bits that the game spits out. Other than LCG and arguably LFSR, other types of pseudorandom number generators are generally too difficult to reverse engineer from their output values (most of them being homemade bit-manipulation functions). In this case, there is no better way to figure out what the formula is than disassembly, however difficult doing that is. Or, if the sequence is always the same, skip the formula altogether and just list all the numbers in sequence, or as many as you need for your purposes.
ALAKTORN
He/Him
Player (105)
Joined: 10/19/2009
Posts: 2527
Location: Italy
If you’re stuck having to math it out, here’s a tip my partner showed me today: translate things into a different perspective. 0 288999488 718069888 793069760 774832384 605423936 981885312 135006656 271778304 59198016 1066721920 250071744 This series of base 10 numbers probably doesn’t mean shit to you (it doesn’t to me). But if you translate it into binary and space it out accordingly… 00 0000000000000000 000 00000 000000 = 0 00 0100010011100111 001 00001 000000 = 288999488 00 1010101100110011 100 00010 000000 = 718069888 00 1011110100010101 001 00011 000000 = 793069760 00 1011100010111100 000 00100 000000 = 774832384 00 1001000001011000 001 00101 000000 = 605423936 00 1110101000011001 100 00110 000000 = 981885312 00 0010000000110000 001 00111 000000 = 135006656 00 0100000011001100 000 01000 000000 = 271778304 00 0000111000011101 001 01001 000000 = 59198016 00 1111111001010011 100 01010 000000 = 1066721920 00 0011101110011111 001 01011 000000 = 250071744 You can start seeing patterns where you wouldn’t have before. This is actually an on-going work me and my partner are doing with AWDS’s RNG, everything in the chunk of bits in the second column is still an unknown, but we’ve figured out all the other columns at least. We tried disassembling but we suck at it.
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
ALAKTORN wrote:
We tried disassembling but we suck at it.
If you can find any numeric constants used in the code, even if you can't figure out what the algorithm is, you'll probably get much closer to figure out what's going on.
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
ALAKTORN
He/Him
Player (105)
Joined: 10/19/2009
Posts: 2527
Location: Italy
Nach wrote:
ALAKTORN wrote:
We tried disassembling but we suck at it.
If you can find any numeric constants used in the code, even if you can't figure out what the algorithm is, you'll probably get much closer to figure out what's going on.
that is true but how are we gonna get the constant?
I don’t even know what we’re talking about.
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
Every random number generator (that I know of anyway) does math with certain "constants" (variables that never change). Even if you have no clue what the code with the address you found is doing, if you can find the numbers involved, you're halfway to figuring out what the algorithm is.
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
ALAKTORN
He/Him
Player (105)
Joined: 10/19/2009
Posts: 2527
Location: Italy
How would you find the constants?
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
ALAKTORN wrote:
How would you find the constants?
You'd have to use whatever debugger the emulator provides to see which piece of code is modifying the memory where you found the random number is being stored. If your emulator debugger provides breakpoint on memory modification, you're golden. Then when you find that piece of code which is continually modifying it, there should either be inline numbers involved, or other areas of memory that are being accessed along with the one you know of. In the former case, you have your constants. In the latter, look up the memory in those locations, and you have your constants.
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
Player (99)
Joined: 3/20/2008
Posts: 466
Location: Montreal, Quebec, Canada
Well, looks like I'll have a go at Nintendo DS disassembly and see what I can come up with. Anyone have some experience with DS disassembly (esp. which tools to use) or disassembly in general that can provide some general tips & guidelines?
Joined: 12/31/2009
Posts: 174
Vykan12 wrote:
Well, looks like I'll have a go at Nintendo DS disassembly and see what I can come up with. Anyone have some experience with DS disassembly (esp. which tools to use) or disassembly in general that can provide some general tips & guidelines?
Just wondering, which game? I made some scripts to fish out information in Desmume for that kind of stuff.
Player (99)
Joined: 3/20/2008
Posts: 466
Location: Montreal, Quebec, Canada
Zanoab wrote:
Just wondering, which game? I made some scripts to fish out information in Desmume for that kind of stuff.
FE: Shadow Dragon and FE: Heroes of Light & Shadow, the latter of which is more interesting. For FE12, I've found that the RNG appears to change every frame, but is cycling between the same 4 outcomes over and over. Beyond that, I haven't even been able to identify the RNG's address in memory after a couple hours trying.
Site Admin, Skilled player (1242)
Joined: 4/17/2010
Posts: 11373
Location: RU
Warning: When making decisions, I try to collect as much data as possible before actually deciding. I try to abstract away and see the principles behind real world events and people's opinions. I try to generalize them and turn into something clear and reusable. I hate depending on unpredictable and having to make lottery guesses. Any problem can be solved by systems thinking and acting.
Banned User
Joined: 3/10/2004
Posts: 7698
Location: Finland
The common way of discovering such a thing as the RNG used by a program is to first find the memory location of the seed(s) used (which can often be found using memory watch or other similar features, although by the nature of random numbers it might be a bit laborious to distinguish it from everything else that's going on), and then you use a disassembler/debugger to find out which part of the machine code modifies that particular memory location. Usually you have found your RNG. (Although it's possible for more than one part of the code to modify the same memory location. Most typically one part will set it up to something, like the system's clock or something else in older consoles, and then another will be the actual RNG algorithm. I suppose it may even be possible for other parts of the code to modify it.)
STL
Joined: 12/5/2010
Posts: 5
But how to you disassemble it to get information? I know the address of the RNG seed and if I go into the DeSMume disassembler and jump to this address, all I get is 1 line of random assembler code. It shows a different line with the same advance and sometimes even UNDEFINED.
Joined: 12/31/2009
Posts: 174
STL wrote:
But how to you disassemble it to get information? I know the address of the RNG seed and if I go into the DeSMume disassembler and jump to this address, all I get is 1 line of random assembler code. It shows a different line with the same advance and sometimes even UNDEFINED.
That is because the RNG seed isn't the code itself. The code will most likely be elsewhere and you will need a lua script to find it. Let me grab the script I use for you. EDIT: Here is my script: https://pastebin.com/wUuWxyUy Change "address" to the RNG seed's address and "length" to the size in bytes. It will log all the times the RNG seed is read/written into a file (dump.log). You will want to point the disassembler to instruction addresses it lists. Keep in mind that only do it when the RNG seed is used or the code might not be loaded.
STL
Joined: 12/5/2010
Posts: 5
Many thanks :D I already tried using memory.register* as I read about it on the onsite guide here but it gave me this error: "memory.registerwrite failed: function is not available in this build. script stopped." I thought deSmuME doesn't support it, but maybe it's just my deSmuME version (0.9.10 x86) or the lua dll?
Joined: 12/31/2009
Posts: 174
STL wrote:
Many thanks :D I already tried using memory.register* as I read about it on the onsite guide here but it gave me this error: "memory.registerwrite failed: function is not available in this build. script stopped." I thought deSmuME doesn't support it, but maybe it's just my deSmuME version (0.9.10 x86) or the lua dll?
You need to use the dev build. The zip with all the desmume stuff used to have a desmume_dev.exe in it but I'm not sure if they stopped including it.
ALAKTORN
He/Him
Player (105)
Joined: 10/19/2009
Posts: 2527
Location: Italy
Thank you Zanoab. Thanks to your script we were able to make this: Link to video
Post subject: Re: How to determine the formula for a random number generator?
Joined: 10/20/2006
Posts: 1248
In practice, them using LCG very often means you're going to have an incredibly easy time figuring out the algorithm. Most of the time, you don't even need to disassemble those. For LCG, the formula for advancing through seeds is (oldSeed*x+y)%z. (Will keep calling each iteration "seed", even though that might not be technically correct) Most of the time, the %z part is just dropping the highest bits and will occur naturally in older systems. F.e., z would be 65536 for 16bit values. 1) Find the address they store the last seed at 2) Find a way to advance this seed by exactly one iteration. 3) Set this address to 0, advance once. (0*x+y)%z will give you y. 4) Set this address to 1, advance once. (1*x+y)%z will either give you x+y or (x+y)%z (They might also use XOR) And you're usually just about done at this point. When analyzing the RNG of older games, I usually first try this procedure. Only if it fails, I'll disassemble. It can often save you lots of time. (If you already have a disassembler open, it might still be faster to just disassemble though) 5) For each random event, they'll usually take either a portion of the last seed or the seed they have just advanced to (they'll either advance the seed before or after getting the actual random number they'll use), and take that portion % the number of possible outcomes for the random event in question. By "portion", I mean any number of consecutive bits of the seed. You should be able to work things out from there. Maybe without disassembling again. (Though disassembling would generally save time with this step) Please go easy on me when pointing out everything that's wrong with this post. I know straight-up disassembling is actually the proper way, and I don't recommend anyone should follow my advice.
Post subject: Re: How to determine the formula for a random number generator?
ALAKTORN
He/Him
Player (105)
Joined: 10/19/2009
Posts: 2527
Location: Italy
Kuwaga wrote:
3) Set this address to 0, advance once. (0*x+y)%z will give you y. 4) Set this address to 1, advance once. (1*x+y)%z will either give you x+y or (x+y)%z
This is how me and my friends figured out part of the AWDS and the whole Tenchu RNGs. It’s definitely a great way of doing it if you know how to advance the RNG a single step, which in some games could be hard to do. In AWDS you just move the cursor 1 diagonal, in Tenchu you just wait 2 or so frames with nothing else going on on-screen. I’ve never tried this with SM64DS but I probably will at some point. Mega Man ZX should also be easy to figure out with this process.
Post subject: Re: How to determine the formula for a random number generator?
Banned User
Joined: 3/10/2004
Posts: 7698
Location: Finland
Kuwaga wrote:
When analyzing the RNG of older games, I usually first try this procedure. Only if it fails, I'll disassemble. It can often save you lots of time.
Out of curiosity, do you have any examples of actual games and the LCG values they use, that you have found this way (or in general)?