Let me explain. My dflook like this:
id ` text c1
1 Hello world how are you people 1
2 Hello people I am fine people 1
3 Good Morning people -1
4 Good Evening -1
c1 contains only two values 1 or -1
Now I want a dataframe (output) like this:
Word Totalcount Points PercentageOfPointAndTotalCount
hello 2 2 100
world 1 1 100
how 1 1 100
are 1 1 100
you 1 1 100
people 3 1 33.33
I 1 1 100
am 1 1 100
fine 1 1 100
Good 2 -2 -100
Morning 1 -1 -100
Evening 1 -1 -100
Here, Totalcount is the total times each word appears in text column.
points is the sum of c1 of each word. Example: people word is in two rows where c1 is 1, and one row where c1 is -1. So it's point is just 1 (2-1 = 1).
PercentageOfPointAndTotalCount = Points/TotalCount*100
print(df)
id comment_text target
0 59848 Hello world -1.0
1 59849 Hello world -1.0