Thanks very much!
If I wanted 64 bit multiplication for implementing UMULL (The lower 32 bits of the 64
bit result are written to RdLo, the upper 32 bits of the result are written to RdHi.), I can still use your approach right? Just for the upper 32 bits don't use %, and right shift reslow/reshigh at the end?
Edit: Going to do the 32 bit multiplication by hand to see how this works.
Language: lua
--[[
0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111 A 4,294,967,295
x 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 0101 1010 1010 1010 B 4,294,924,970
1111 1111 1111 1111 0101 1010 1010 1001 0000 0000 0000 0000 1010 0101 0101 0110 Res -181,793,080,695,466
RdHi should be
0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 0101 1010 1010 1001
Lower 32 bits, and thus Answer and RdLo should be
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1010 0101 0101 0110 Answer 42,326
---------------------------------------------------------------------------------
Break B to 2 parts
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 B1 Upper 16 65,535
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0101 1010 1010 1010 B2 Lower 16 23,210
A x B1
0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111 A 4,294,967,295
x 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 B1 65,535
0000 0000 0000 0000 1111 1111 1111 1110 1111 1111 1111 1111 0000 0000 0000 0001 C1 281,470,681,677,825
Do bits 16 to 31 in C1 corresponds to bits 48 to 63 in "Answer"? That is:
RdHi_1 = C1 Mod 4,294,967,296 shift right 16
0000 0000 0000 0000 1111 1111 1111 1110 1111 1111 1111 1111 0000 0000 0000 0001 C1 281,470,681,677,825
Mod 4,294,967,296
0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 0000 0000 0000 0001 4,294,901,761
Shift 16 right
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 RdHi_1
Mod 4,294,967,296
0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 0000 0000 0000 0001 4,294,901,761
Shift left 16 times
0000 0000 0000 0000 1111 1111 1111 1111 0000 0000 0000 0001 0000 0000 0000 0000
Mod 4,294,967,296
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 C1 65,536
A x B2
0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111 A 4,294,967,295
x 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0101 1010 1010 1010 B2 23,210
0000 0000 0000 0000 0101 1010 1010 1001 1111 1111 1111 1111 1010 0101 0101 0110 C2 99,686,190,916,950
Do bits 32 to 47 in C2 corresponds to bits 32 to 47 in "Answer"? That is:
RdHi_2 = C2 shift right 32
0000 0000 0000 0000 0101 1010 1010 1001 1111 1111 1111 1111 1010 0101 0101 0110 C2 99,686,190,916,950
Shift right 32
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0101 1010 1010 1001 RdHi_2 23,209
Combining C1 + C2
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 C1 281,470,681,808,896
+ 0000 0000 0000 0000 0101 1010 1010 1001 1111 1111 1111 1111 1010 0101 0101 0110 C2 99,686,190,916,950
0000 0000 0000 0000 0101 1010 1010 1010 0000 0000 0000 0000 1010 0101 0101 0110 D 381,156,872,725,846
D Mod 4,294,967,296 gives RdLo (lower 32 bits)
RdHi_1 x 65,536 + RdHi_2 gives RdHi (upper 32 bits)
]]--
function MUL(A,B)
--The lower 32 bits of the 64 bit result are written to RdLo, the upper 32 bits of the result are written to RdHi.
local reslow = A * (B%0x10000) -- A multiplied with lower 16 bits of B
local reshigh = A * (math.floor(B/0x10000)%0x10000) -- A multiplied with higher 16 bits of B (shifted down)
local RdHi = math.floor(reshigh)/0x10000 --Shift right
RdHi = RdHi * 0x10000 --Shift left
RdHi = RdHi + math.floor(reslow/0x10000)
reshigh = reshigh%0x10000 -- only 16 bits can matter here if result is 32 bits
local RdLo = (reshigh*0x10000 + reslow)%0x100000000 -- recombine and cut off to 32 bits
return RdHi, RdLo
end
Is this correct on bits 16 to 31 and bits 32 to 47 comment?