I have the following pandas dataframe df:
  Book_Category |   Book_Title                       |  Revenue 
  Thriller        You don't know what I have done       200
  Romance         Last Summer I loved you               100
I am trying to find a way to create a new dataframe, by word in the Book Title (please note that lower and upper case should not matter)
This is the end goal df2:
Book_Title_word   | Revenue 
you                   300
I                     300
don't                 200
know                  200
what                  200
have                  200
done                  200
last                  100
summer                100
loved                 100
Because the words I and you were in both titles, the revenue was summed for them.
Is this feasible in python?
Thank you very much
UPDATE:
Because I am using larger numbers, when using the revenue provided by A-Za-z is in scientific notation fromat ('2.155051e-01').
Book_Category |   Book_Title                       |  Revenue  | Quantity
  A               ...what ...                          3459283      45757
  B               what ...                             4376899      35657
  C               .....what                            4567856      7689
df_new = pd.DataFrame(df['Book_Title'].str.split(' ').tolist(),  index=df['Revenue']).stack().reset_index()[[0, 'Revenue']]
df_new.columns = ['Book_Title_word', 'Revenue']
df_new.Book_Title_word = df_new.Book_Title_word.str.lower()
df_new.groupby('Book_Title_word').sum().sort_values(by = 'Revenue',ascending = False)
Book_Title_word   |   Revenue 
what                 2.160651e-01
This fixed the issue
pd.set_option('display.float_format', lambda x: '%.3f' % x) 
from this answer Format / Suppress Scientific Notation from Python Pandas Aggregation Results
 
     
     
    