O(NLogN) to Compute the Median in a Stream using Two Priority Queues


You’re given a list of n integers arr[0..(n-1)]. You must compute a list output[0..(n-1)] such that, for each index i (between 0 and n-1, inclusive), output[i] is equal to the median of the elements arr[0..i] (rounded down to the nearest integer).

The median of a list of integers is defined as follows. If the integers were to be sorted, then:
If there are an odd number of integers, then the median is equal to the middle integer in the sorted order.
Otherwise, if there are an even number of integers, then the median is equal to the average of the two middle-most integers in the sorted order.

1
int[] findMedian(int[] arr)
int[] findMedian(int[] arr)

Input
n is in the range [1, 1,000,000].
Each value arr[i] is in the range [1, 1,000,000].

Output
Return a list of n integers output[0..(n-1)], as described above.

Example 1
n = 4
arr = [5, 15, 1, 3]
output = [5, 10, 5, 4]
The median of [5] is 5, the median of [5, 15] is (5 + 15) / 2 = 10, the median of [5, 15, 1] is 5, and the median of [5, 15, 1, 3] is (3 + 5) / 2 = 4.

Example 2
n = 2
arr = [1, 2]
output = [1, 1]
The median of [1] is 1, the median of [1, 2] is (1 + 2) / 2 = 1.5 (which should be rounded down to 1).

Compute the Median by Bruteforce by Sorting

We can sort the stream every iteration when we want to compute the median. The complexity will be O(N^2LogN).

1
2
3
4
5
6
7
8
9
10
11
12
vector<int> findMedian(const vector<int> &arr) {
  vector<int> ans;
  for (auto i = 0; i < arr.size(); ++ i) {
    sort(begin(arr), begin(arr) + i + 1);
    if ((i + 1) & 1 == 1) {
      ans.push_back(arr[i / 2]);
    } else {
      ans.push_back((arr[i / 2] + arr[i / 2 + 1]) / 2);
    }
  }
  return ans;
}
vector<int> findMedian(const vector<int> &arr) {
  vector<int> ans;
  for (auto i = 0; i < arr.size(); ++ i) {
    sort(begin(arr), begin(arr) + i + 1);
    if ((i + 1) & 1 == 1) {
      ans.push_back(arr[i / 2]);
    } else {
      ans.push_back((arr[i / 2] + arr[i / 2 + 1]) / 2);
    }
  }
  return ans;
}

Using Two Priority Queues to Compute the Median Stream

One better algorithm is to use two priority queues to compute the median dynamically. We can split the stream into two halfs, the top and the bottom. To compute the median, we are interested in the middle two or one numbers.

Let the top half be a priority queue which is sorted from small to large and the bottom half be a priority queue from large to small. The operations to retreive the top of the queue will be O(1) constant.

When we push a new element into the stream, we can push it to the bottom half, then, pop a element (maximum of the bottom half) into top half. The complexity to push a new element into a priority queue costs O(LogN).

We need to maintain two priority queues such that the top half is at most 1 element more than the bottom half. Then if number of the elements in both priority queues are equal, we simply retreive the top of the both queues and compute its average as the median. Otherwise, we retrieve the top from the top half priority queue and it is the median.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
vector<int> findMedian(const vector<int> &arr) {
  vector<int> ans;
  priority_queue<int> higher;
  priority_queue<int, vector<int>, std::greater<int>> lower;
  for (const auto &n: arr) {
    lower.push(n);
    higher.push(lower.top());
    lower.pop();
    while (higher.size() > lower.size() + 1) {
      lower.push(higher.top());
      higher.pop();
    }
    if (higher.size() == lower.size()) {
      ans.push_back((lower.top() + higher.top()) / 2);
    } else {
      ans.push_back(higher.top());
    }    
  }
  return ans;
}
vector<int> findMedian(const vector<int> &arr) {
  vector<int> ans;
  priority_queue<int> higher;
  priority_queue<int, vector<int>, std::greater<int>> lower;
  for (const auto &n: arr) {
    lower.push(n);
    higher.push(lower.top());
    lower.pop();
    while (higher.size() > lower.size() + 1) {
      lower.push(higher.top());
      higher.pop();
    }
    if (higher.size() == lower.size()) {
      ans.push_back((lower.top() + higher.top()) / 2);
    } else {
      ans.push_back(higher.top());
    }    
  }
  return ans;
}

The complexity is O(NLogN) as there are N elements in the stream and on average it taks O(LogN) to maintain two priority queues.

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
711 words
Last Post: Two Lines of C++ Code to Convert an Array into Sorted Unique Elements
Next Post: Sorting and Simulation Algorithms to Compute the ATM Queue

The Permanent URL is: O(NLogN) to Compute the Median in a Stream using Two Priority Queues

Leave a Reply