-
Notifications
You must be signed in to change notification settings - Fork 11
Suggestions for improving code performance #17
Copy link
Copy link
Open
Description
uint64_t
encode_ten_thousands(uint64_t hi, uint64_t lo)
{
uint64_t merged = hi | (lo << 32);
/* Truncate division by 100: 10486 / 2**20 ~= 1/100. */
uint64_t top = ((merged * 10486ULL) >> 20) & ((0x7FULL << 32) | 0x7FULL);
/* Trailing 2 digits in the 1e4 chunks. */
uint64_t bot = merged - 100ULL * top;
uint64_t hundreds;
uint64_t tens;
/*
* We now have 4 radix-100 digits in little-endian order, each
* in its own 16 bit area.
*/
hundreds = (bot << 16) + top;
/* Divide and mod by 10 all 4 radix-100 digits in parallel. */
tens = (hundreds * 103ULL) >> 10;
tens &= (0xFULL << 48) | (0xFULL << 32) | (0xFULL << 16) | 0xFULL;
//tens += (hundreds - 10ULL * tens) << 8;
tens = (hundreds << 8) - 2559ull * tens;
return tens;
}Explanation:
decimal: a*10+b=x
BCD: a+256*b=a+256*(x-a*10)=a+256*x-2560*x=256*x-2559*a=256*x-2559*(x/10)
I haven't conducted performance tests, but the above implementation has fewer instructions and fewer instruction dependencies, so there should be some performance improvement.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels