I want to solve the correlation coefficient while one row is removed from the dataframe. Then after getting all the correlation coefficients, I need to remove the row that causes the highest increase in the correlation coefficient.
The code below shows my solution:
import pandas as pd
import numpy as np
#Access the data
file='tc_yolanda2.csv'
df = pd.read_csv(file)
x = df['dist']
y = df['mps']
#compute the correlation coefficient
def correlation_coefficient_4u(a,b):
    correl_mat = np.corrcoef(a,b)
    correlation = correl_mat[0,1]
    return correlation
c = correlation_coefficient_4u(x,y)
print('Correlation coeffcient is:',c)
#Let us try this one
lenght = len(df)
print(lenght)
a = 0
while lenght != 0:
    df.drop([a], inplace=True)
    c = correlation_coefficient_4u(df.dist,df.mps)
    a += 1
    print(round(c,4))
It has successfully generated 50 correlation coefficients but generated also many errors such as
RuntimeWarning: Degrees of freedom <= 0 for slice
RuntimeWarning: divide by zero encountered in double_scalars
RuntimeWarning: invalid value encountered in multiply
RuntimeWarning: Mean of empty slice.
RuntimeWarning: invalid value encountered in true_divide
ValueError: labels [50] not contained in axis
My next problem is how to remove the errors and how to locate the index of the correlation coefficients with the highest negative values so that I could remove that row permanently and repeat the above procedures.
By the way, this is my data.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 2 columns):
dist    50 non-null float64
mps     50 non-null int64
dtypes: float64(1), int64(1)
memory usage: 880.0 bytes
None
And the result:
dist  mps
0   441.6    2
1   385.4    7
2   470.7    1
3   402.2    0
4   361.6    0
5   458.6    3
6   453.9    6
7   425.2    4
8   336.6    8
9   265.4    5
10  207.0    5
11  140.5   28
12  229.9    4
13  175.2    6
14  244.5    2
15  455.7    4
16  396.4   12
17  261.8    7
18  291.5    9
19  233.9    2
20  167.8    9
21   88.9   15
22  110.1   25
23   97.1   15
24  160.4   10
25  344.0    0
26  381.6   21
27  391.9    3
28  314.7    2
29  320.7   14
30  252.9   10
31  323.1   12
32  256.0    6
33  281.6    5
34  280.4    5
35  339.8   10
36  301.9   12
37  381.8    0
38  320.2   10
39  347.6    8
40  301.0    4
41  369.7    6
42  378.4    4
43  446.8    4
44  397.4    3
45  454.2    2
46  475.1    0
47  427.0    8
48  463.4    8
49  464.6    2
Correlation coeffcient is: -0.529328951782
49
-0.5209
-0.5227
-0.5091
-0.4998
-0.4975
-0.4879
-0.4903
-0.4838
-0.4845
-0.4908
-0.5085
-0.4541
-0.4736
-0.4962
-0.5273
-0.5189
-0.5452
-0.5494
-0.5485
-0.5882
-0.5999
-0.5711
-0.4321
-0.3251
-0.296
-0.3214
-0.4595
-0.4516
-0.5018
-0.5
-0.4524
-0.431
-0.4514
-0.4955
-0.5603
-0.5263
-0.385
-0.4764
-0.3229
-0.194
-0.3029
-0.1961
-0.2572
-0.2572
-0.6454
-0.7041
-0.5241
-1.0
Warning (from warnings module):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 3159
    c = cov(x, y, rowvar)
RuntimeWarning: Degrees of freedom <= 0 for slice
Warning (from warnings module):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 3093
    c *= 1. / np.float64(fact)
RuntimeWarning: divide by zero encountered in double_scalars
Warning (from warnings module):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 3093
    c *= 1. / np.float64(fact)
RuntimeWarning: invalid value encountered in multiply
nan
Warning (from warnings module):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 1110
    avg = a.mean(axis)
RuntimeWarning: Mean of empty slice.
Warning (from warnings module):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\_methods.py", line 73
    ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
nan
Traceback (most recent call last):
  File "C:/Users/User/Desktop/CARDS 2017 Research Study/Python/methodology.py", line 28, in <module>
    df.drop([a], inplace=True)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 2530, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 2562, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3744, in drop
    labels[mask])
ValueError: labels [50] not contained in axis
 
    