I have a dataframe with a sample attached. I want to give loyalty labels for all the users in my dataframe.
The labels and their conditions are:
- first_time_visitor- any user with session = 0 (see the first line in loyalty column)
- frequent_visitor- any user with session > 0 and count_session_products > 0
- first_time_customer- first time checkout:confirmation appears in the type column
- repeat_customer- second time of checkout:confirmation appears in the type column
- loyal_customer- third time of checkout:confirmation appears in the type column
I am having trouble calculating first_time_customer, repeat_customer and loyal_customer labels.
I have had a couple of ideas only, one being to use first_valid_index() or argmax() to find the index and somehow use that in a condition to create the first_time_customer label.
(df_customer_1.type.values == 'checkout:confirmation').argmax()
df_customer_1[df_customer_1.type == 'checkout:confirmation'].first_valid_index()
But I am not sure how to implement these conditions.
In the end I would like the loyalty column in the picture to look like:
first_time_visitor
frequent_visitor
first_time_customer
repeat_customer
frequent_visitor
loyal_customer
Can anyone guide me as to how I can achieve this? I do not have any code to show for it so far therefore you would probably have to come up with something. We can try both pandas or a typical pythonic way.
This is what I have so far, sample from the data:
data = {
'user_id': [
    '9EPWZVMNP6D6KWX', '9EPWZVMNP6D6KWX', '9EPWZVMNP6D6KWX',
    '9EPWZVMNP6D6KWX', '9EPWZVMNP6D6KWX', '9EPWZVMNP6D6KWX'
],
'timestamp': [
    1612139269, 1612139665, 1612139579,
    1612141096, 1612143046, 1612143729
],
'type': ['productDetails', 'productDetails', 'checkout:confirmation', 'checkout:confirmation', 'productList', 'checkout:confirmation'],
'session': [0,1,2,3,4,5],
'count_session_products': [4, 1, 0, 4, 2, 2],
'loyalty' : [0,0,0,0,0,0]}
test_df = pd.DataFrame(data)
test_df
which gives me:
| user_id | timestamp | type | session | prods | loyalty | 
|---|---|---|---|---|---|
| 9EPWZVMNP6D6KWX | 1612139269 | productDetails | 0 | 4 | 0 | 
| 9EPWZVMNP6D6KWX | 1612139665 | productDetails | 1 | 1 | 0 | 
| 9EPWZVMNP6D6KWX | 1612139579 | checkout:confirmation | 2 | 0 | 0 | 
| 9EPWZVMNP6D6KWX | 1612141096 | checkout:confirmation | 3 | 4 | 0 | 
| 9EPWZVMNP6D6KWX | 1612143046 | productList | 4 | 2 | 0 | 
| 9EPWZVMNP6D6KWX | 1612143729 | checkout:confirmation | 5 | 2 | 0 | 
First condition for frequent_visiotr and first_time_visitor:
test_df['loyalty'] = np.where((test_df['session'] > 0) & ((test_df['type'] != 'checkout:confirmation')), 'frequent_visitor', None)
test_df.loc[test_df['session'] == 0, 'loyalty'] = 'first_time_visitor'
which gives me:
| user_id | timestamp | type | session | prods | loyalty | 
|---|---|---|---|---|---|
| 9EPWZVMNP6D6KWX | 1612139269 | productDetails | 0 | 4 | first_time_visitor | 
| 9EPWZVMNP6D6KWX | 1612139665 | productDetails | 1 | 1 | frequent_visitor | 
| 9EPWZVMNP6D6KWX | 1612139579 | checkout:confirmation | 2 | 0 | 0 | 
| 9EPWZVMNP6D6KWX | 1612141096 | checkout:confirmation | 3 | 4 | 0 | 
| 9EPWZVMNP6D6KWX | 1612143046 | productList | 4 | 2 | frequent_visitor | 
| 9EPWZVMNP6D6KWX | 1612143729 | checkout:confirmation | 5 | 2 | 0 | 
Now this is where I am stuck, I need conditions for first_time_customer, repeat_customer and loyal_customer. The rules for these conditions are described above. I would like the end dataframe to look like this:
| user_id | timestamp | type | session | prods | loyalty | 
|---|---|---|---|---|---|
| 9EPWZVMNP6D6KWX | 1612139269 | productDetails | 0 | 4 | first_time_visitor | 
| 9EPWZVMNP6D6KWX | 1612139665 | productDetails | 1 | 1 | frequent_visitor | 
| 9EPWZVMNP6D6KWX | 1612139579 | checkout:confirmation | 2 | 0 | first_time_customer | 
| 9EPWZVMNP6D6KWX | 1612141096 | checkout:confirmation | 3 | 4 | repeat_customer | 
| 9EPWZVMNP6D6KWX | 1612143046 | productList | 4 | 2 | frequent_visitor | 
| 9EPWZVMNP6D6KWX | 1612143729 | checkout:confirmation | 5 | 2 | loyal_customer | 

 
     
    