small update on my progress.
I implemented my avx512 solution.
I couldn't test its validity yet, because the test case file for validity tests was not in the repo, so I need to go make my own (unless someone still has that lying around somewhere and can give it to me).
But unless I'm stupid it should be correct now. (I used the tm_8_test as a reference, I'm assuming that one to be correct)
But I did some speed comparisons and got good results.
Below are some speed measurements for the normal tm_8_test, tm_avx2_test and my new tm_avx512_test.
also comparing different compilers (icpc 2022.1.0, icpx 2022.1.0, gcc 12.1.0, all latest versions I have available on the system I'm using).
times are for 1,000,000,000 iterations of each algorithm (running on a Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz).
Results look good.
I could get some nice improvements with avx512, especially with algorithms 2 and 5 (the snake shifts as I call them because when I made some drawings on how the bits shift around to understand what they do, it looked like a snake). I was able to improve these a lot (well, still need to doublecheck they are correct).
Also, no clue why g++ segfaults in the avx512 version.
=== 8 ========
--- icpc ----
File error
9.66668s
9.4408s
118.568s
9.19606s
9.29057s
118.434s
10.7579s
3.18474s
--- icpx ----
File error
9.62064s
9.65191s
12.1744s
9.65188s
9.65251s
12.8067s
9.63132s
3.20349s
--- g++ -----
File error
9.63486s
9.72641s
92.4006s
9.72545s
9.75109s
87.0335s
9.29817s
2.92814s
=== avx2 =====
--- icpc ----
File error
6.74517s
6.67073s
17.2636s
6.67197s
6.66976s
17.2553s
6.7253s
1.11651s
--- icpx ----
File error
7.15043s
6.98434s
14.3042s
6.67s
6.96987s
14.3078s
7.17228s
0.104574s
--- g++ -----
File error
6.70568s
6.82548s
16.9376s
6.84379s
6.80145s
16.9673s
7.02715s
1.11546s
=== avx512 ===
--- icpc ----
File error
5.76196s
5.69119s
6.57826s
5.93968s
5.70522s
6.61671s
5.75089s
0.836571s
--- icpx ----
File error
5.82707s
5.84054s
5.86947s
5.83748s
5.87174s
6.06394s
5.94894s
0.0522875s
--- g++ -----
Segmentation fault (core dumped)
My repo is at
https://github.com/Henny022/treasure-master-hack if anyone wants to look a the code.
Next step is to verify the correctness (unless someone can tell me a better way, I'll do that by comparing to the tm_8_test implementation.
Then make a binary to bruteforce keys, including the code decryption checks.
Then parallelize using MPI and see if we can find anything.