This is the continuation of my previous post on normalizing columns of a Pandas DataFrame with a specific condition for negative value.
The DataFrame I'm using is the following:
import numpy as np
import pandas as pd
df = pd.DataFrame({'key' : [111, 222, 333, 444, 555, 666, 777, 888, 999],
                   'score1' : [-1, 0, 2, -1, 7, 0, 15, 0, 1], 
                   'score2' : [2, 2, -1, 10, 0, 5, -1, 1, 0]})
print(df)
   key  score1  score2
0  111      -1       2
1  222       0       2
2  333       2      -1
3  444      -1      10
4  555       7       0
5  666       0       5
6  777      15      -1
7  888       0       1
8  999       1       0
The possible values for the score1 and score2 Series are -1 and all positive integers (including 0). My goal was to normalize both columns the following way:
- If the value is equal to -1, then return a missingNaNvalue
- Else, normalize the remaining positive integers on a range between 0and1.
I'm extremely happy with the solution from ezrael. That being said, I continued working on my problem to see if I could come up with an alternate solution. Here's my try:
- I'm defining the following function:
def normalize(x):
    if x == -1:
        return np.nan
    else:
        return x/x.max()
- I'm creating the new norm1Series by applying the above function to thescore1Series:
df['norm1'] = df['score1'].apply(normalize)
Unfortunately, this raises the following AttributeError: 'int' object has no attribute 'max'.
I converted the score1 Series to float64 but it does not fix the problem: 'float' object has no attribute 'max'.
I also did a quick test by replacing the second ´return´ statement with return x/15 (15 being the maximum value of the score1 Series) and it worked:
   key  score1  score2     norm1
0  111    -1.0       2       NaN
1  222     0.0       2  0.000000
2  333     2.0      -1  0.133333
3  444    -1.0      10       NaN
4  555     7.0       0  0.466667
5  666     0.0       5  0.000000
6  777    15.0      -1  1.000000
7  888     0.0       1  0.000000
8  999     1.0       0  0.066667
But this is not a viable solution. I want to be able to divide by the maximum value of the Series instead of hard-coding it. WHY is my solution not working and HOW do I fix my code?
 
     
     
    