1 2
5 6
GJTASer2018
He/Him
Joined: 1/24/2018
Posts: 313
Location: Stafford, NY
mklip2001 wrote:
I don't have any ability to study this game technically or TAS it, but I'm definitely interested in any progress you'd make! Someone else who used to post in this thread might be more helpful though.
It might also be worth asking Bisqwit for help - he's good at this kind of disassembly to retrieve passwords/passcodes. It doesn't look like he's done anything with this particular game, but he's done plenty with other NES games. Especially since bruteforcing isn't as infeasible as it was 6-7 years ago, it might be worth a shot trying to get his help. :)
c-square wrote:
Yes, standard runs are needed and very appreciated here too
Dylon Stejakoski wrote:
Me and the boys starting over our games of choice for the infinityieth time in a row because of just-found optimizations
^ Why I don't have any submissions despite being on the forums for years now...
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3840)
Joined: 11/30/2014
Posts: 2845
Location: US
Good to see someone taking an interest in this. One of the big loose ends here is how the second bonus world is supposed to end. When I originally put together the code I made the level end in an arbitrary way that matched the ending of the first bonus world (being in a certain spot., in this case the upper right corner of the last room.) For the first bonus world this makes sense since there is an opening in the wall there, indicating an exit. But, for the second bonus world, there is no such indication. If you look at the second bonus world map, there is no indication of where an exit should be either. So what's happening? Is the level unfinished? Am I missing a subroutine call in my decrypted code? Figuring this out might make working backwards more plausible.
Henny022
They/Them
Joined: 7/6/2022
Posts: 6
Location: Erlangen
small update on my progress. I implemented my avx512 solution. I couldn't test its validity yet, because the test case file for validity tests was not in the repo, so I need to go make my own (unless someone still has that lying around somewhere and can give it to me). But unless I'm stupid it should be correct now. (I used the tm_8_test as a reference, I'm assuming that one to be correct) But I did some speed comparisons and got good results. Below are some speed measurements for the normal tm_8_test, tm_avx2_test and my new tm_avx512_test. also comparing different compilers (icpc 2022.1.0, icpx 2022.1.0, gcc 12.1.0, all latest versions I have available on the system I'm using). times are for 1,000,000,000 iterations of each algorithm (running on a Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz). Results look good. I could get some nice improvements with avx512, especially with algorithms 2 and 5 (the snake shifts as I call them because when I made some drawings on how the bits shift around to understand what they do, it looked like a snake). I was able to improve these a lot (well, still need to doublecheck they are correct). Also, no clue why g++ segfaults in the avx512 version.
=== 8 ========
--- icpc ----
File error
9.66668s
9.4408s
118.568s
9.19606s
9.29057s
118.434s
10.7579s
3.18474s
--- icpx ----
File error
9.62064s
9.65191s
12.1744s
9.65188s
9.65251s
12.8067s
9.63132s
3.20349s
--- g++ -----
File error
9.63486s
9.72641s
92.4006s
9.72545s
9.75109s
87.0335s
9.29817s
2.92814s
=== avx2 =====
--- icpc ----
File error
6.74517s
6.67073s
17.2636s
6.67197s
6.66976s
17.2553s
6.7253s
1.11651s
--- icpx ----
File error
7.15043s
6.98434s
14.3042s
6.67s
6.96987s
14.3078s
7.17228s
0.104574s
--- g++ -----
File error
6.70568s
6.82548s
16.9376s
6.84379s
6.80145s
16.9673s
7.02715s
1.11546s
=== avx512 ===
--- icpc ----
File error
5.76196s
5.69119s
6.57826s
5.93968s
5.70522s
6.61671s
5.75089s
0.836571s
--- icpx ----
File error
5.82707s
5.84054s
5.86947s
5.83748s
5.87174s
6.06394s
5.94894s
0.0522875s
--- g++ -----
Segmentation fault (core dumped)
My repo is at https://github.com/Henny022/treasure-master-hack if anyone wants to look a the code. Next step is to verify the correctness (unless someone can tell me a better way, I'll do that by comparing to the tm_8_test implementation. Then make a binary to bruteforce keys, including the code decryption checks. Then parallelize using MPI and see if we can find anything.
Henny022
They/Them
Joined: 7/6/2022
Posts: 6
Location: Erlangen
small update. after fixing my implementations of alg2 and alg5 (of course they were not quite right), these are the times for the tests.
=== avx512 ===
--- icpc ----
All tests passed!
1000000000
5.53738s
5.53569s
6.40238s
5.49683s
5.53581s
6.40107s
5.52177s
0.836553s
--- icpx ----
All tests passed!
1000000000
5.64209s
5.65338s
6.4749s
5.65798s
5.65272s
6.4709s
5.74766s
0.0522909s
--- g++ -----
All tests passed!
1000000000
5.66931s
5.60702s
6.34123s
5.68652s
5.65095s
6.25263s
5.63692s
0.483818s
also gcc does not segfault anymore I guess
Henny022
They/Them
Joined: 7/6/2022
Posts: 6
Location: Erlangen
Another update on my progress. I finished the MPI bruteforce program and did some runs. It processes 40bit on 8 compute nodes is a bit above 3 hours. Meaning if I could convince my professor to let me use the entire cluster, it would take 50-60 years to process all 64bit. I'm sure my implementation can be improved a bit more, but that wont help us to far. So if we want to crack this, we need a smarter approach. I think our best hope is investigating patterns in the collisions and maybe use that to cut down the search space... I'll see that I upload the results of that one run I did somewhere (its 2.5MB of text) in case someone wants to take a look through that. I also need to do some verification somehow to see if my results are really correct.
Active player (413)
Joined: 10/4/2015
Posts: 99
A couple things: 1) I can't file issues on Henry's github fork, so there is no way to directly communicate there. 2) There is a lot of talk here about brute force but I don't get it. Reverse engineering is likely a more fruitful path. This is 1989 cryptography on a toy computer. It almost certainly has weaknesses that can be exploited, including the ability to just run the algorithm in reverse, from the desired plaintext message. Consider: the developers would have developed this game already knowing the 8 byte machine code they wanted, and then they would run that through an encryption algorithm. Playing the game runs the decryption algorithm. As long as you know the 8 byte machine code you want to generate, it should be as simple as running the decryption algorithm in reverse. i.e. easy. The fact that the level can by loaded, even if the exit appeared in a suspiciously wrong place... that implies something about the machine code. I.e. that it can't be that complicated. It's bootstrapping the game, and maybe is starting at a slightly wrong offset, or something else like that. The fact you can load garbage levels implies the same thing. Ultimately what those 8 bytes are doing are likely just triggering a level load at an offset in memory or something else straight forward. I would consider looking at the plaintext machine code for the first prize world. It's almost certainly going to be nearly identical to the second. If I was designing the game, why wouldn't it be? Maybe it would be helpful for someone to post the plaintext machine codes for both Prize world 1 (official), and prize world 2 (suspected). Now, mind you, I'm posting this in 2023, and a lot of the historical documents, like the reddit thread, don't seem to exist any more.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3840)
Joined: 11/30/2014
Posts: 2845
Location: US
You can still find the code and some analysis archived here: https://code.google.com/archive/p/treasure-master-hack/wikis Certainly a good place to start would be figuring out a proper ending for the second prize world (if one exists.) A feasible path forward would be to map out the ROM and see if there are any subroutines that aren't called anywhere else in the game that might be used to create an ending for the second prize world. Reverse engineering the algorithm has been done, but there are a lot of steps and a lot of one-way-ish functions so going backwards is also hard.
Emulator Coder, Player (75)
Joined: 10/4/2005
Posts: 197
Meerkov wrote:
As long as you know the 8 byte machine code you want to generate, it should be as simple as running the decryption algorithm in reverse. i.e. easy.
The 8 bytes you mention are only what you as a player have access to enter into the password screen. One of the first steps in the cryptographic process is to expand those 8 bytes into 128 bytes, which are then run through multiple stages which are mostly one-way. The resulting values are then used as an XOR to the encrypted machine code for the level, which is 83 bytes long. Due to the XOR this is basically a one-time pad encryption making it extremely difficult to determine what the expected output would be. Alyosha made a good effort to work out what that machine code might look like, but there were plenty of unknowns and guesses that we can't be sure how correct it was.
Alyosha wrote:
You can still find the code and some analysis archived here: https://code.google.com/archive/p/treasure-master-hack/wikis
That analysis was also moved to my github repo/wiki: https://github.com/micro500/treasure-master-hack/wiki The repo is a bit out of date and I definitely need to clean it up. I also have a decent amount of local work that isn't on github yet. I'm planning to tackle that cleanup in the next few weeks and make some more progress on this project,
Henny022
They/Them
Joined: 7/6/2022
Posts: 6
Location: Erlangen
The main reason I tried bruteforcing is because it's the easy way, and the hope was that with some cleanup of the code and the university supercomputer that I was able to use, maybe we'd have a chance. Finding some way to solving this problem in reverse obviously would be the best solution, but due to the nature of how all this works, we don't really have anything we could reverse. This is because this password algorithm can be split into two parts (at least thats how I view it): 1. is a hash function. it takes the 13 character input, converted to the 8 byte starting value and produces a 128 byte hash 2. that hash is used to decrypt the level load function using perfect encryption The hash function is not the best, full of collisions, but good enough to not be trivial to reverse. But even if we reverse it, that won't get us anywhere without a correct hash we can apply the reverse function to. We don't have that. And due to the nature of perfect encryption, it could be pretty much anything. All we really have is a single checksum (if I remember correctly), and some heuristics about the resulting code. I think our best hope is to find some weakness in the hash function we can exploit, some pattern that would drastically reduce the seach space. Maybe combined with a few known good bytes of the output, if we can get something we are sure about. I hope I got all the numbers right, it's been a few days, but the endresult stays the same regardless. By now I no longer have access to that supercomputer. If in the next few months we get something where we'd be sure we could use it to solve this, I might be able to convince the professor to let me use it again. But as my results showed, even with that, we'd need to find something big before that could possibly have a chance.
Alyosha
He/Him
Editor, Emulator Coder, Expert player (3840)
Joined: 11/30/2014
Posts: 2845
Location: US
micro500 wrote:
The repo is a bit out of date and I definitely need to clean it up. I also have a decent amount of local work that isn't on github yet. I'm planning to tackle that cleanup in the next few weeks and make some more progress on this project,
Excited to hear you are still working on this. Good luck!
Henny022
They/Them
Joined: 7/6/2022
Posts: 6
Location: Erlangen
small update from me, no progress, but still working on this there are a few things I wanna do 1. make a reference implementation using a C++ soft cpu for the 6502 - this would be really close to the ASM, making it easy to write and verify it's correctness - it should be easy enough to read to serve as a nice reference and documentation on what we are trying to do - while not fast enough to solve the problem, it should be fast enough to serve as a reference to verify other implementations work correctly 2. make a test framework to test implementations - nothing fancy, but should be simple to test new implementations - similar to the existing stuff in the repo, but better structured - include some performance measuring stuff (gprof, likwid, something like that) 3. use these to do some tests on the existing stuff (especially wanna analyze the cache behaviour of the rng tables) 4. try find some heuristics to cut down the search space 5. hopefully find an implementation that is fast enough to get us the solution on top of that, I was thinking about FPGAs. there was some stuff done there before, but not much as far as I'm aware. One crazy idea I had was, make a custom RISC-V extension for this, put a softcore for that on a fpga and use that. hopefully easier to get a full implementation done this way, including some communication code to scale this somewhat, while still being fast enough to get us some results I did enable issues and stuff on my fork just now, since that was mentioned before
1 2
5 6