Posts for Henny022


Henny022
They/Them
Experienced Forum User
Joined: 7/6/2022
Posts: 6
Location: Erlangen
small update from me, no progress, but still working on this there are a few things I wanna do 1. make a reference implementation using a C++ soft cpu for the 6502 - this would be really close to the ASM, making it easy to write and verify it's correctness - it should be easy enough to read to serve as a nice reference and documentation on what we are trying to do - while not fast enough to solve the problem, it should be fast enough to serve as a reference to verify other implementations work correctly 2. make a test framework to test implementations - nothing fancy, but should be simple to test new implementations - similar to the existing stuff in the repo, but better structured - include some performance measuring stuff (gprof, likwid, something like that) 3. use these to do some tests on the existing stuff (especially wanna analyze the cache behaviour of the rng tables) 4. try find some heuristics to cut down the search space 5. hopefully find an implementation that is fast enough to get us the solution on top of that, I was thinking about FPGAs. there was some stuff done there before, but not much as far as I'm aware. One crazy idea I had was, make a custom RISC-V extension for this, put a softcore for that on a fpga and use that. hopefully easier to get a full implementation done this way, including some communication code to scale this somewhat, while still being fast enough to get us some results I did enable issues and stuff on my fork just now, since that was mentioned before
Henny022
They/Them
Experienced Forum User
Joined: 7/6/2022
Posts: 6
Location: Erlangen
The main reason I tried bruteforcing is because it's the easy way, and the hope was that with some cleanup of the code and the university supercomputer that I was able to use, maybe we'd have a chance. Finding some way to solving this problem in reverse obviously would be the best solution, but due to the nature of how all this works, we don't really have anything we could reverse. This is because this password algorithm can be split into two parts (at least thats how I view it): 1. is a hash function. it takes the 13 character input, converted to the 8 byte starting value and produces a 128 byte hash 2. that hash is used to decrypt the level load function using perfect encryption The hash function is not the best, full of collisions, but good enough to not be trivial to reverse. But even if we reverse it, that won't get us anywhere without a correct hash we can apply the reverse function to. We don't have that. And due to the nature of perfect encryption, it could be pretty much anything. All we really have is a single checksum (if I remember correctly), and some heuristics about the resulting code. I think our best hope is to find some weakness in the hash function we can exploit, some pattern that would drastically reduce the seach space. Maybe combined with a few known good bytes of the output, if we can get something we are sure about. I hope I got all the numbers right, it's been a few days, but the endresult stays the same regardless. By now I no longer have access to that supercomputer. If in the next few months we get something where we'd be sure we could use it to solve this, I might be able to convince the professor to let me use it again. But as my results showed, even with that, we'd need to find something big before that could possibly have a chance.
Henny022
They/Them
Experienced Forum User
Joined: 7/6/2022
Posts: 6
Location: Erlangen
Another update on my progress. I finished the MPI bruteforce program and did some runs. It processes 40bit on 8 compute nodes is a bit above 3 hours. Meaning if I could convince my professor to let me use the entire cluster, it would take 50-60 years to process all 64bit. I'm sure my implementation can be improved a bit more, but that wont help us to far. So if we want to crack this, we need a smarter approach. I think our best hope is investigating patterns in the collisions and maybe use that to cut down the search space... I'll see that I upload the results of that one run I did somewhere (its 2.5MB of text) in case someone wants to take a look through that. I also need to do some verification somehow to see if my results are really correct.
Henny022
They/Them
Experienced Forum User
Joined: 7/6/2022
Posts: 6
Location: Erlangen
small update. after fixing my implementations of alg2 and alg5 (of course they were not quite right), these are the times for the tests.
=== avx512 ===
--- icpc ----
All tests passed!
1000000000
5.53738s
5.53569s
6.40238s
5.49683s
5.53581s
6.40107s
5.52177s
0.836553s
--- icpx ----
All tests passed!
1000000000
5.64209s
5.65338s
6.4749s
5.65798s
5.65272s
6.4709s
5.74766s
0.0522909s
--- g++ -----
All tests passed!
1000000000
5.66931s
5.60702s
6.34123s
5.68652s
5.65095s
6.25263s
5.63692s
0.483818s
also gcc does not segfault anymore I guess
Henny022
They/Them
Experienced Forum User
Joined: 7/6/2022
Posts: 6
Location: Erlangen
small update on my progress. I implemented my avx512 solution. I couldn't test its validity yet, because the test case file for validity tests was not in the repo, so I need to go make my own (unless someone still has that lying around somewhere and can give it to me). But unless I'm stupid it should be correct now. (I used the tm_8_test as a reference, I'm assuming that one to be correct) But I did some speed comparisons and got good results. Below are some speed measurements for the normal tm_8_test, tm_avx2_test and my new tm_avx512_test. also comparing different compilers (icpc 2022.1.0, icpx 2022.1.0, gcc 12.1.0, all latest versions I have available on the system I'm using). times are for 1,000,000,000 iterations of each algorithm (running on a Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz). Results look good. I could get some nice improvements with avx512, especially with algorithms 2 and 5 (the snake shifts as I call them because when I made some drawings on how the bits shift around to understand what they do, it looked like a snake). I was able to improve these a lot (well, still need to doublecheck they are correct). Also, no clue why g++ segfaults in the avx512 version.
=== 8 ========
--- icpc ----
File error
9.66668s
9.4408s
118.568s
9.19606s
9.29057s
118.434s
10.7579s
3.18474s
--- icpx ----
File error
9.62064s
9.65191s
12.1744s
9.65188s
9.65251s
12.8067s
9.63132s
3.20349s
--- g++ -----
File error
9.63486s
9.72641s
92.4006s
9.72545s
9.75109s
87.0335s
9.29817s
2.92814s
=== avx2 =====
--- icpc ----
File error
6.74517s
6.67073s
17.2636s
6.67197s
6.66976s
17.2553s
6.7253s
1.11651s
--- icpx ----
File error
7.15043s
6.98434s
14.3042s
6.67s
6.96987s
14.3078s
7.17228s
0.104574s
--- g++ -----
File error
6.70568s
6.82548s
16.9376s
6.84379s
6.80145s
16.9673s
7.02715s
1.11546s
=== avx512 ===
--- icpc ----
File error
5.76196s
5.69119s
6.57826s
5.93968s
5.70522s
6.61671s
5.75089s
0.836571s
--- icpx ----
File error
5.82707s
5.84054s
5.86947s
5.83748s
5.87174s
6.06394s
5.94894s
0.0522875s
--- g++ -----
Segmentation fault (core dumped)
My repo is at https://github.com/Henny022/treasure-master-hack if anyone wants to look a the code. Next step is to verify the correctness (unless someone can tell me a better way, I'll do that by comparing to the tm_8_test implementation. Then make a binary to bruteforce keys, including the code decryption checks. Then parallelize using MPI and see if we can find anything.
Henny022
They/Them
Experienced Forum User
Joined: 7/6/2022
Posts: 6
Location: Erlangen
I was made aware of the search for the second password for this game and decided to give it a shot, since I currently have access to my university supercomputer. Maybe I can use that to finally solve this. I have read this topic, the page on tcrf.net about this and been looking at the code on micro500's github. I was hoping to be able to get some more details here. Where should I pick up with this? The repo contains a few different approaches and not too much documentation about them. So far I've been mainly looking at the bruteforce code. From my understanding so far, I would use tm_avx_full or something similar to find potential codes and then check everything that finds with process_code. tm_avx_full has an IV array for byte0 - byte3. My understanding is I would keep changing these values until I find something? byte4 is always 0 in the code, is that proven to be the case, or is that another number I would adjust while searching? My plan going forward would be to move the project to using cmake to be able to test different compilers and stuff. I also want to do a few modifications to the code to use more modern language features and maybe make the code a bit more safe (no more new :). I also got some ideas for some improvements to the implementation, including a potential avx512 implementation that I wanna test. And the MPI parallelize the thing and run it on the supercomputer if I manage to finish all that before I loose access. Any additional info on this problem and the past work that has been done, as well as any further help would be very appreciated. PS: Nec-nec-nec-necropooooost