Refactoring Re-sampling Algorithm using O(n) Hashtable

Background

I was asked by my manager to optimize a piece of code that has been in the code repository for like 4 years. Actually, that particular piece of code has turned out to be written by the manager 4 years ago in the very first beginning and he didn’t remember he wrote it (no comments!). He and me took sometime to understand why it worked and how it worked.

The function is to re-sampling a set of points given a specified distance. In other words, we need to obtain a subset of the input that satisfies: any of two points have distance more than the specified distance. Of course, another requirement is that the target should be as many points in the subset as possible (otherwise you can just return any single point!)

clutter Refactoring Re-sampling Algorithm using O(n) Hashtable algorithms c # hash refactoring

clutter of points

The original Solution

The original solution in C# is elegant using LINQ, however, the efficiency is low if the number of input set is huge. For example, there is actually a input of 1 million points, and the algorithm takes around 20 minutes to complete. What it actually happens is that the software kinda hangs/freezes in the background (intensive computation).

public List<Point3F> Resampling(double resolution, List<Point3F> tempPtList)
{
    var samplePtList = new List<Point3F>();
    Point3F currentPt;
    double T = resolution * resolution;
    while (tempPtList.Count != 0)
    {
        currentPt = new Point3F(tempPtList[0].X, tempPtList[0].Y, tempPtList[0].Z);
        tempPtList = tempPtList.Where(x => (Math.Pow(x.X - currentPt.X, 2) + Math.Pow(x.Y - currentPt.Y, 2) + 
                     Math.Pow(x.Z - currentPt.Z, 2)) >= T).ToList();
        samplePtList.Add(currentPt);
    }
    return samplePtList;
}

public List<Point3F> Resampling(double resolution, List<Point3F> tempPtList)
{
    var samplePtList = new List<Point3F>();
    Point3F currentPt;
    double T = resolution * resolution;
    while (tempPtList.Count != 0)
    {
        currentPt = new Point3F(tempPtList[0].X, tempPtList[0].Y, tempPtList[0].Z);
        tempPtList = tempPtList.Where(x => (Math.Pow(x.X - currentPt.X, 2) + Math.Pow(x.Y - currentPt.Y, 2) + 
                     Math.Pow(x.Z - currentPt.Z, 2)) >= T).ToList();
        samplePtList.Add(currentPt);
    }
    return samplePtList;
}

This is a O(N^2) solution because the LINQ selection (where) is O(n). The idea is quite interesting, at each iteration, the set excludes the points that have less than the resolution distance than the candidate point (e.g. currentPt).

O(n) Hashtable implementation

The Hashtable is a powerful weapon. The insertion/update operation in Hashtable is O(n). It also takes O(n) time to look for an element in the hashtable. It can reduce the algorithm complexity at least one magnitude e.g. O(N^2) to O(N). We virtually create a 3D uniform grid in space that have size of each grid = resolution. Then It is not hard to imagine that _MOST_ close points will fall into the same grid. I said _MOST_ because there are few cases that two points are closer enough but they are put in neighbourhood grid. Then we can use second round filtering to remove such points, but this should be fast anyway. Even bruteforce algorithms should be fast enough on small set of points.

public List<Point3F> Resampling(double resolution, List<Point3F> tempPtList)
{
    var samplePtList = new List<Point3F>();
    var added = new Hashtable();
 
    for (var i = 0; i < tempPtList.Count; i ++)
    {
        var cur = tempPtList[i];
        // virtually get the grid location in space
        int x = (int)Math.Round(cur.X / resolution);
        int y = (int)Math.Round(cur.Y / resolution);
        int z = (int)Math.Round(cur.Z / resolution);
        var key = x + "," + y + "," + z;   // hash key
        if (added.ContainsKey(key)) continue; // we have added a point that is nearby already
        var currentPt = new Point3F(tempPtList[i].X, tempPtList[i].Y, tempPtList[i].Z);
        samplePtList.Add(currentPt);
        added.Add(key, key); // record this point
    }
 
    var T = resolution*resolution;
    for (var i = 0; i < samplePtList.Count; i ++) // second round of filtering
    {
        var cur = samplePtList[i];
        samplePtList.RemoveAll( // use LINQ to remove additional neighbour points
            x =>
                (
                    ((x != cur) && 
                     ((x.X - cur.X) * (x.X - cur.X) + (x.Y - cur.Y) * (x.Y - cur.Y) + 
                      (x.Z - cur.Z) * (x.Z - cur.Z)) < T
                )
        ));
    }
 
    return samplePtList;
}

public List<Point3F> Resampling(double resolution, List<Point3F> tempPtList)
{
    var samplePtList = new List<Point3F>();
    var added = new Hashtable();

    for (var i = 0; i < tempPtList.Count; i ++)
    {
        var cur = tempPtList[i];
        // virtually get the grid location in space
        int x = (int)Math.Round(cur.X / resolution);
        int y = (int)Math.Round(cur.Y / resolution);
        int z = (int)Math.Round(cur.Z / resolution);
        var key = x + "," + y + "," + z;   // hash key
        if (added.ContainsKey(key)) continue; // we have added a point that is nearby already
        var currentPt = new Point3F(tempPtList[i].X, tempPtList[i].Y, tempPtList[i].Z);
        samplePtList.Add(currentPt);
        added.Add(key, key); // record this point
    }

    var T = resolution*resolution;
    for (var i = 0; i < samplePtList.Count; i ++) // second round of filtering
    {
        var cur = samplePtList[i];
        samplePtList.RemoveAll( // use LINQ to remove additional neighbour points
            x =>
                (
                    ((x != cur) && 
                     ((x.X - cur.X) * (x.X - cur.X) + (x.Y - cur.Y) * (x.Y - cur.Y) + 
                      (x.Z - cur.Z) * (x.Z - cur.Z)) < T
                )
        ));
    }

    return samplePtList;
}

Outcome

On the same size of input set, the lighting-fast algorithm runs instantly, less than 1 second and the manager is pleased.

Update: Use HashSet instead of HashTable to improve memory-efficiency. The above Hashtable can be just replaced by Hashset.

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...

804 words
Last Post: Full Disclosure: helloacm.com XSS vulnerability notification
Next Post: Case Study - Optimize WIFI by using WiFi Repeater and Powerline Adapter

The Permanent URL is: Refactoring Re-sampling Algorithm using O(n) Hashtable

3 Comments

Ajitesh Susai

1. You are using a HashTable to store the 3D grid, but that is not necessary here. Use a HashSet instead, since you only need a hashed set of list items.

2. One of the reasons the first algorithm is not optimal is, its executing Math.Pow function many times as its in a O(n ^ 2) loop.

3. Another reason here, it is not memory efficient because it is creating a new list every time the loop is iterated

4.Second code is also not that memory optimal because, in the first loop you are creating a new Point3F and assigning it to currentPt, instead you could be doing “currentPt = cur;”

5.Basically based on your code, your problem is not correctly stated, it shouldn’t be “any of two points have distance more than the specified distance” it should be “list of points which are at least resolution distance far away from any other point in the data set”.

6.Let me me know if I am wrong.

- ACMer
  
  yes, you are more accurate. thanks.
  
test

blah blah blah blah

Algorithms, Blockchain and Cloud

Refactoring Re-sampling Algorithm using O(n) Hashtable

Background

The original Solution

O(n) Hashtable implementation

Outcome

3 Comments

Leave a Reply

Background

The original Solution

O(n) Hashtable implementation

Outcome

Related posts:

3 Comments

Leave a Reply