I have this code below. It is surprizing for me that it works for the columns and not for the rows.
import pandas as pd
def summarizing_data_variables(df):
    numberRows=size(df['ID'])
    numberColumns=size(df.columns)
    summaryVariables=np.empty([numberColumns,2], dtype =  np.dtype('a50'))    
    cont=-1    
    for column in df.columns:
        cont=cont+1
        summaryVariables[cont][0]=column
        summaryVariables[cont][1]=size(df[df[column].isin([0])][column])/(1.0*numberRows)
    print summaryVariables
def summarizing_data_users(fileName):
    print "Sumarizing users..."   
    numberRows=size(df['ID'])
    numberColumns=size(df.columns)      
    summaryVariables=np.empty([numberRows,2], dtype =  np.dtype('a50'))    
    cont=-1
    for row in df['ID']:
        cont=cont+1
        summaryVariables[cont][0]=row
        dft=df[df['ID']==row]
        proportionZeros=(size(dft[dft.isin([0])])-1)/(1.0*(numberColumns-1)) # THe -1 is used to not count the ID column
        summaryVariables[cont][1]=proportionZeros
    print summaryVariables
if __name__ == '__main__':
    df = pd.DataFrame([[1, 2, 3], [2, 5, 0.0],[3,4,5]])
    df.columns=['ID','var1','var2']
    print df
    summarizing_data_variables(df)
    summarizing_data_users(df) 
The output is this:
   ID  var1  var2
0   1     2     3
1   2     5     0
2   3     4     5
[['ID' '0.0']
 ['var1' '0.0']
 ['var2' '0.333333333333']]
Sumarizing users...
[['1' '1.0']
 ['2' '1.0']
 ['3' '1.0']]
I was expecting that for users:
Sumarizing users...
[['1' '0.0']
 ['2' '0.5']
 ['3' '0.0']]
It seems that the problem is in this line:
dft[dft.isin([0])]
It does not constrain dft to the "True" values like in the first case.
Can you help me with this? (1) How to correct the users (ROWS) part (second function above)? (2) Is this the most efficient method to do this? [My database is very big]
EDIT:
In function summarizing_data_variables(df) I try to evaluate the proportion of zeros in each column. In the example above, the variable Id has no zero (thus the proportion is zero), the variable var1 has no zero (thus the proportion is also zero) and the variable var2 presents a zero in the second row (thus the proportion is 1/3). I keep these values in a 2D numpy.array where the first column is the label of the column of the dataframe and the second column is the evaluated proportion.
The function summarizing_data_users I want to do the same, but I do that for each row. However, it is NOT working.
 
     
    