From Delphi XE7, the RTL introduces the Paralle.For statement which allows users to deal with SIMD (Single Introduction Multiple Data). For example, if you have a large piece of data and want to do an operation (single instruction) on it, then you might want to know if there is any performance gain by replacing simple For loop with Parallel.For.
In order to use Parallel.For, you would need to use System.Threading unit.
We have the following code, which sets the number of elements in the array (a large piece of data – global dynamica array), the Data Type (4-byte Single or 8-byte Double) and a procedure prototype that is handy to pass in for performance comparisons on different instructions.
const N = 100; type DataType = Single; DataFunc = procedure(var num: DataType); var data: array of DataType;
And here are some basic operations, you might want to profiling yours as well.
procedure DataFunc_Sin(var num: DataType); inline; begin num := Sin(num); end; procedure DataFunc_Cos(var num: DataType); inline; begin num := Cos(num); end; procedure DataFunc_Sleep(var num: DataType); inline; begin Sleep(1); end; procedure DataFunc_SinCos(var num: DataType); inline; begin num := Sin(num) + Cos(num); end; procedure DataFunc_XXX100(var num: DataType); inline; var i: integer; begin for i := 0 to 100 do begin num := Sin(num) + Cos(num); end; end; procedure DataFunc_XXX10(var num: DataType); inline; var i: integer; begin for i := 0 to 10 do begin num := Sin(num) + Cos(num); end; end;
Then, the serial (normal) implementation.
procedure TestSerial(fun: DataFunc); var i: integer; begin for i := 0 to High(data) do begin fun(data[i]); end; end;
and the parallel implementation using Parallel.For.
procedure TestParallel(fun: DataFunc); begin TParallel.&For(0, High(data), procedure(i: integer) begin fun(data[i]); end ); end;
And we then can have a compare function, that uses the QueryPerformanceCounter to do the timing.
procedure Compare(fun: DataFunc); var c1, c2, f: Int64; begin QueryPerformanceFrequency(f); ZeroMemory(data, N * SizeOf(DataType)); // Parallel QueryPerformanceCounter(c1); TestParallel(fun); QueryPerformanceCounter(c2); Writeln('p=', (c2 - c1)); ZeroMemory(data, N * SizeOf(DataType)); // Serial QueryPerformanceCounter(c1); TestSerial(fun); QueryPerformanceCounter(c2); Writeln('s=', (c2 - c1)); end;
Then, finally, the main program looks like this.
SetLength(data, N); Writeln('Cos'); Compare(DataFunc_Cos); Writeln('Sin'); Compare(DataFunc_Sin); Writeln('Sleep'); Compare(DataFunc_Sleep); Writeln('XXX100'); Compare(DataFunc_XXX100); Writeln('XXX10'); Compare(DataFunc_XXX10); Writeln('SinCos'); Compare(DataFunc_SinCos);
So we can have 4 sets of results by using Single/Double, 32/64 bit.
Performance Comparisons
We set the number of elements of array to 100. And 4 runs are carried out. Timing is recorded for each combination of Single/Double, 32/64 bit. All under RELEASE modes. Interestingly, we find out that the only cases that Parallel.For outperforms the traditional For-Loops are when individual computation (single instruction) is timing consuming. We use sleep(1) to emulate a computation-intensive instruction.
For easy/trivial computation, the serial implementation may be a lot faster because the modern CPU may take advantage of the high-speed caching, prefetching etc.
–EOF (The Ultimate Computing & Technology Blog) —
loading...
Last Post: Simple and Fast Hash Functions in Delphi
Next Post: Utilising The Best API for Your Niche