GetThreadID and GetProcessID from TIB

I was reading the source code of the FastMM4 and this short function came to my eyes.

function GetThreadID: Cardinal;
{$ifdef 32Bit}
asm
  mov eax, FS:[$24]
end;
{$else}
begin
  Result := GetCurrentThreadID;
end;
{$endif}

I guess from the {$ELSE} block that fetching value from segment FS:[$24] should be the same as returning the thread ID. The C++ version is similar.

unsigned int GetThreadID(void) {
  __asm {
    mov eax, FS:[$24]
  }
}
 
unsigned int GetProcessID(void) {
  __asm {
    mov eax, FS:[$20]
  }
}

unsigned int GetThreadID(void) {
  __asm {
    mov eax, FS:[$24]
  }
}

unsigned int GetProcessID(void) {
  __asm {
    mov eax, FS:[$20]
  }
}

Then I asked for more explanation in stackoverflow.com and as always, I obtained the answer. The TIB (Thread Information Block) can be accessed using FS register on x86 platforms. The value at FS+$24 stores the thread ID and I can see that FS+$20 stores the process ID. The register eax on x86 calling convention stores the return value (32-bit signed/unsigned integer).

It is unlikely that getting Thread ID or Process ID will be the bottleneck of the applications since these two IDs will not changed (you just get the values and store them once for further usages). But I ‘d like to know in extreme case, how fast would this be compared to the standard ones defined in Windows unit.

function GetCurrentProcessId; external kernel32 name 'GetCurrentProcessId';
function GetCurrentThreadId; external kernel32 name 'GetCurrentThreadId';

So we can easily come up the performance comparison console application (written in Delphi XE3).

program Test;

{$APPTYPE CONSOLE}
uses
  Windows;

function GetThreadID: Cardinal; register; assembler;
asm
  mov eax, FS:[$24]
end;

function GetProcessID: Cardinal; register; assembler;
asm
  mov eax, FS:[$20]
end;

var
  i: integer;
  x: Cardinal;
  t: DWORD;

begin
  Writeln(GetThreadID = Windows.GetCurrentThreadId);
  Writeln(GetProcessID = Windows.GetCurrentProcessId);
  /////////////   thread id /////////////////////
  t := GetTickCount;
  for i := 1 to 1000000000 do
  begin
    x := GetThreadID;
  end;
  Writeln(GetTickCount - t);
  t := GetTickCount;
  for i := 1 to 1000000000 do
  begin
    x := Windows.GetCurrentThreadId;
  end;
  Writeln(GetTickCount - t);
  ///////////    process id ///////////////////////////////
  t := GetTickCount;
  for i := 1 to 1000000000 do
  begin
    x := GetProcessID;
  end;
  Writeln(GetTickCount - t);
  t := GetTickCount;
  for i := 1 to 1000000000 do
  begin
    x := Windows.GetCurrentProcessId;
  end;
  Writeln(GetTickCount - t);

  Readln;
end.

And we have the following results.
fs GetThreadID and GetProcessID from TIB assembly language delphi implementation optimization programming languages tricks Win32 API windows

We can see that the above assembly versions are at least two times faster. The performance difference thus is said to be the function invoke overhead i.e. the Windows unit invokes the WinAPI in kernel32.dll.