Optimized ABS function in Assembly


In mathematics, the absolute function is defined as:

abs Optimized ABS function in Assembly algorithms assembly language c / c++ code compiler implementation interpreter / compiler math object pascal optimization programming languages tricks

This function is simple, and quite in popular usages. The absolute value is considered as one of the fundamental operations in computer programs, especially if you disassemble the code into assembly language.

One can easily define the abs function that implements the above math equation in a straightforward manner, like this:

def abs(x):
  if x >= 0: return x
  return -x

This is slow because it involves the branch prediction, especially when the sign of is mis-predict, which causes a lot more dozens of cycles. Let’s see how modern compilers deal with that.

The Microsoft Visual Studio 2005 (C++) generates the following three lines of assembly code for 32-bit integer abs function.

cdq
xor eax, edx
sub eax, edx

The above three lines of code removes the branch and is much faster. The cdq copies the sign of the register eax to register edx. For example, if it is a positive number, edx will be zero, otherwise, edx will be 0xFFFFFF which denotes -1. The xor operation with the origin number will change nothing if it is a positive number (any number xor 0 will not change). However, when eax is negative, eax xor 0xFFFFFF yields (not eax). The final step is to subtract edx from eax. Again, if eax is positive, edx is zero, and the final value is still the same. For negative values, (~ eax) – (-1) =eax which is the value wanted.

The above shows a clever usage of the fast instruction cdq which is quite fast in almost all modern CPUs, unlike SAR eax, 31 which is slow on Pentium IV. It is also said that the above three lines of code is patented by some guy that works for SUN in US. Maybe Microsoft violates the patent?

The alternative approach, quite similar, but swapping the second and the third, slightly different, is shown below.

cdq
add eax, edx
xor eax, edx

Again, if eax is positve, edx will be zero, and the result will be eax because adding zero and xor zero doesn’t change the value.  If edx is 0xFFFFFF, the second instruction is in fact to subtract 1, which makes eax – 1. The third yields ~(eax – 1) which becomes -eax.

The above equivalent C code can be translated to:

1
2
3
4
int abs(int x) {
   int mask = (x >> (sizeof(int) * CHAR_BIT - 1));
   return (x + mask) ^ mask;
}
int abs(int x) {
   int mask = (x >> (sizeof(int) * CHAR_BIT - 1));
   return (x + mask) ^ mask;
}

The Delphi XE3, generates the same 32-bit assembly code for 32-bit integer type.
delphiabs Optimized ABS function in Assembly algorithms assembly language c / c++ code compiler implementation interpreter / compiler math object pascal optimization programming languages tricks

For 32-bit FPU code on floating types such as Single, Double or Extended on 32-bit, it generates quite simple code using FPU instruction fabs without branches as well. Remember, the FPU contains 80-bit floating storage which is enough for Extended.

delphiabs-float Optimized ABS function in Assembly algorithms assembly language c / c++ code compiler implementation interpreter / compiler math object pascal optimization programming languages tricks

For 64-bit integers, the data type is int64 in Delphi, in 32-bit exectuables, Delphi XE3 needs to store the number in EDX:EAX registers, and therefore, the jmp is used. It looks slower.

delphiabs-int64 Optimized ABS function in Assembly algorithms assembly language c / c++ code compiler implementation interpreter / compiler math object pascal optimization programming languages tricks

How about 64-bit ? Delphi XE3 needs the branch to do 64-bit integer absolute value as well but it can handle it once in a single 64-bit register rax.

delphi64abs Optimized ABS function in Assembly algorithms assembly language c / c++ code compiler implementation interpreter / compiler math object pascal optimization programming languages tricks

For Extended data types, 64-bit executable generated by Delphi XE3 require doing masks on 128-bit xmm registers.

The compiler magic will use the corresponding abs function for correct data types at different platforms. The below is the source code related to abs functions defined in System unit.

delphi-abs Optimized ABS function in Assembly algorithms assembly language c / c++ code compiler implementation interpreter / compiler math object pascal optimization programming languages tricks

For Extended data types, it generates the following assembly code using a function call for 64-bit using Delphi XE3.

delphiabs64 Optimized ABS function in Assembly algorithms assembly language c / c++ code compiler implementation interpreter / compiler math object pascal optimization programming languages tricks

It looks like other platforms (such as IOS, MAC), there are currently no better low-level code optimization on abs value for extended floating values.

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
992 words
Last Post: Delphi XE3, {$EXCESSPRECISION OFF}
Next Post: Access Violation Bug Migrating to 64-bit

The Permanent URL is: Optimized ABS function in Assembly

Leave a Reply