I have two pieces of code (doing the same job) which takes in array of datetime and produces clusters of datetime which have difference of 1 hour.
First piece is:
def findClustersOfRuns(data):
    runClusters = []
    for k, g in groupby(itertools.izip(data[0:-1], data[1:]),
                        lambda (i, x): (i - x).total_seconds() / 3600):
        runClusters.append(map(itemgetter(1), g))
Second piece is:
def findClustersOfRuns(data):
    if len(data) <= 1:
        return []
        current_group = [data[0]]
        delta = 3600
        results = []
        for current, next in itertools.izip(data, data[1:]):
            if abs((next - current).total_seconds()) > delta:
                # Here, `current` is the last item of the previous subsequence
                # and `next` is the first item of the next subsequence.
                if len(current_group) >= 2:
                    results.append(current_group)
                current_group = [next]
                continue
            current_group.append(next)
        return results
The first code takes 5 minutes to execute while second piece takes few seconds. I am trying to understand why.
The data over which I am running the code has size:
data.shape
(13989L,)
The data contents is as:
data
array([datetime.datetime(2016, 10, 1, 8, 0),
       datetime.datetime(2016, 10, 1, 9, 0),
       datetime.datetime(2016, 10, 1, 10, 0), ...,
       datetime.datetime(2019, 1, 3, 9, 0),
       datetime.datetime(2019, 1, 3, 10, 0),
       datetime.datetime(2019, 1, 3, 11, 0)], dtype=object)
How do I improve the first piece of code to make it run as fast?
 
     
     
    