Using CPU Intrinsics __builtin_ctz Function to Compute the Number of Trailing Zeros for a Integer (Binary)


Consider the following C/C++ function to count the number of trailing zeros for a integer (in binary form):

1
2
3
4
5
6
static inline int count_trailing_zeros(int value) {
  int n = 0;
  while ((value & (1 << n)) == 0) {
    n ++;
  }
  return n;
static inline int count_trailing_zeros(int value) {
  int n = 0;
  while ((value & (1 << n)) == 0) {
    n ++;
  }
  return n;

The above original code works as intended, finding the first set bit in the "value". However, we can improve the performance a bit by using built-in functions (the CPU Intrinsics) for counting trailing zeros, if your compiler supports them. These are often hardware-accelerated and thus faster than a loop.

Here is the refactored code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdlib.h>
#ifdef __GNUC__
  #define ctz __builtin_ctz
#else
  static inline int ctz(int x) {
    int i = 0;
    while ((x & 1) == 0) {
      x >>= 1;
      i++;
    }
    return i;
  }
#endif
 
static inline int count_trailing_zeros(int v) {
  if (v == 0) {
    // Handle error
    abort(); // Or return an error code
  }
  return ctz(v);
}
#include <stdlib.h>
#ifdef __GNUC__
  #define ctz __builtin_ctz
#else
  static inline int ctz(int x) {
    int i = 0;
    while ((x & 1) == 0) {
      x >>= 1;
      i++;
    }
    return i;
  }
#endif

static inline int count_trailing_zeros(int v) {
  if (v == 0) {
    // Handle error
    abort(); // Or return an error code
  }
  return ctz(v);
}

This code uses the GCC built-in function __builtin_ctz, which returns the number of trailing 0-bits in a word, starting from the least significant bit. It is the bit-wise equivalent of finding the index of the least significant 1-bit.

If the compiler is not GCC (or compatible), the code falls back to a custom implementation of ctz, which is similar to your original function. The #ifdef __GNUC__ line checks whether the current compiler is GCC or compatible.

Also, the function now checks whether cpu_mask is 0. If so, it calls abort(), because the original code would go into an infinite loop in that case. Depending on the requirements of your program, you might want to handle this case differently (e.g., return an error code).

--EOF (The Ultimate Computing & Technology Blog) --

GD Star Rating
loading...
357 words
Last Post: What is the CPU Intrinsics?
Next Post: What is the advantages of using Azure Bicep rather than ARM?

The Permanent URL is: Using CPU Intrinsics __builtin_ctz Function to Compute the Number of Trailing Zeros for a Integer (Binary)

Leave a Reply