I was looking for a quick way to calculate the median of a list of numbers and came across this:
function array_median($array) {
  // perhaps all non numeric values should filtered out of $array here?
  $iCount = count($array);
  if ($iCount == 0) {
    return null;
  }
  // if we're down here it must mean $array
  // has at least 1 item in the array.
  $middle_index = floor($iCount / 2);
  sort($array, SORT_NUMERIC);
  $median = $array[$middle_index]; // assume an odd # of items
  // Handle the even case by averaging the middle 2 items
  if ($iCount % 2 == 0) {
    $median = ($median + $array[$middle_index - 1]) / 2;
  }
  return $median;
}
This approach using sort() makes sense and is certainly the obvious approach. However, I was curious if a median heap would be faster.  What was surprising was that when I implemented a simple median heap it is consistently significantly slower than the above method.
My simple MedianHeap class:
class MedianHeap{
private $lowerHeap;
private $higherHeap;
private $numbers = [];
public function __construct($numbers = null)
{
    $this->lowerHeap = new SplMaxHeap();
    $this->higherHeap = new SplMinHeap();
    if (count($numbers)) {
        $this->insertArray($numbers);   
    }
}
public function insertArray ($numbers) {
    foreach($numbers as $number) {
        $this->insert($number);
    }
}
public function insert($number)
{
    $this->numbers[] = $number;
    if ($this->lowerHeap->count() == 0 || $number < $this->lowerHeap->top()) {
        $this->lowerHeap->insert($number);
    } else {
        $this->higherHeap->insert($number);
    }
    $this->balance();
}
protected function balance()
{
    $biggerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->lowerHeap : $this->higherHeap;
    $smallerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->higherHeap : $this->lowerHeap;
    if ($biggerHeap->count() - $smallerHeap->count() >= 2) {
        $smallerHeap->insert($biggerHeap->extract());
    }
}
public function getMedian()
{
    if (!count($this->numbers)) {
        return null;
    }
    $biggerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->lowerHeap : $this->higherHeap;
    $smallerHeap = $this->lowerHeap->count() > $this->higherHeap->count() ? $this->higherHeap : $this->lowerHeap;
    if ($biggerHeap->count() == $smallerHeap->count()) {
        return ($biggerHeap->top() + $smallerHeap->top())/2;
    } else {
        return $biggerHeap->top();
    }
}
}
And then the code to benchmark:
$array = [];
for($i=0; $i<100000; $i++) {
     $array[] = mt_rand(1,100000) / mt_rand(1,10000);
}
$t = microtime(true);
echo array_median($array);
echo PHP_EOL . 'Sort Median: ' . (microtime(true) - $t) . ' seconds';
echo PHP_EOL;
$t = microtime(true);
$list = new MedianHeap($array);
echo $list->getMedian();
echo PHP_EOL . 'Heap Median: '. (microtime(true) - $t) . ' seconds';
Is there something in PHP that makes using heaps for this inefficient somehow or is there something wrong with my implemenation?
