I have a system log file, for which I would like to count the number of a subsequent event after each event. The events are listed in a file, that is in a long format, in which each event is recorded as a row with a timestamp. Thus, in order to count subsequent events, I would have to count these after each row for each user.
My data frame looks like this (so in this example, the event would be new and the subsequent event to be counted would be analysis, hope you get the gist):
    created_at_date    id   type     username
0   2017-11-13 09:17:59 1   new      user1
1   2017-11-13 09:18:00 1   analysis user1
2   2017-11-13 09:18:20 1   feedback user1
3   2017-11-13 09:26:58 NaN id_new   user2
The desired output is a matrix, counting the subsequent occurrences after each event.
                               Event
             login  id_new  new   analysis   feedback   check   edit
subsequent event                                
login        0      0       1     0         0           2      0
id_new       0      0       1     0         0           0      0 
new          0      0       0     3         0           0      0
analysis     0      0       1     1         3           0      1 
feedback     1      0       1     0         0           2      0 
check        0      0       0     0         0           0      0 
edit         0      0       0     0         0           0      0 
As such, I first created an empty matrix:
a = np.zeros(shape=(7,7))
names=['new', 'analysis', 'feedback','id_new','login', 'check','edit']
df_empty = pd.DataFrame(a, index=names, columns=names)
and then fill the matrix with a solution adapted from here, however, just iterating over the rows I get an error saying ValueError: cannot label index with a null key
from itertools import tee, islice, chain, izip
def previous_and_next(some_iterable):
    items, nexts = tee(some_iterable, 2)
    nexts = chain(islice(nexts, 1, None), [None])
    return izip(items, nexts)
for item, nxt in previous_and_next(df['event']):
    df_empty.loc[item, nxt]= df_empty.loc[item, nxt] +1
df_empty
Any idea where this is coming from?
A reproducible example:
def data():
    data = {'created_at_date': pd.Series(['2017-11-13 09:17:59', '2017-11-13 09:18:00', '2017-11-13 09:18:20', 
                                      '2017-11-13 09:26:58', '2017-11-13 09:28:23', '2017-11-13 09:28:24', 
                                      '2017-11-13 09:40:16', '2017-11-13 09:45:37', '2017-11-13 09:46:03',
                                      '2017-11-13 09:46:38', '2017-11-13 09:46:39', '2017-11-13 09:46:48',
                                      '2017-11-13 09:58:19', '2017-11-13 10:04:44', '2017-11-13 10:04:56',
                                     '2018-04-13 10:02:36',  '2018-04-13 10:02:37', '2018-04-13 10:02:41',
                                     '2018-04-13 10:54:22', '2017-11-17 11:15:03', '2017-11-17 11:15:40',
                                     '2018-04-13 13:38:33']),
        'username': pd.Series(['user1', 'user1', 'user1', 
                               'user2', 'user2', 'user2', 
                               'None', 'None', 'user3',
                               'user3', 'user3', 'user3',
                               'None', 'user3', 'user3',
                               'user4', 'user4', 'user4',
                               'None', 'user1','user1', 'None']),
        'id': pd.Series([1, 1, 1,
                         'NaN', 2, 2,
                         'NaN', 'NaN', 'NaN',
                         3, 3, 3,
                         'NaN', 'NaN', 'NaN',
                         4, 4, 'NaN',
                         'NaN', 5, 5,'NaN']),
       'event': pd.Series(['new', 'analysis', 'feedback', 
                          'id_new', 'new', 'analysis', 
                          'login', 'login', 'new',
                          'analysis', 'feedback', 'login',
                          'check', 'new', 'new', 
                          'analysis','edit', 'login', 
                          'check', 'analysis','feedback',
                          'login'])}
    df = pd.DataFrame(data)
    df['created_at_date'] = pd.to_datetime(df['created_at_date'])
    return df
df=data()    
Additional: how would I iterate over grouped data?
EDIT: @coldspeed's response offers an alternative, but does not explain the error I received with a different method. So any additional advice would be welcome!
