I have a 4-by-3 matrix, X, and wish to form the 3-by-3 Pearson correlation matrix, C, obtained by computing correlations between all 3 possible column combinations of X. However, entries of C that correspond to correlations that aren't statistically significant should be set to zero.
I know how to get pair-wise correlations and significance values using pearsonr in scipy.stats. For example,
import numpy as np
from scipy.stats.stats import pearsonr
X = np.array([[1, 1, -2], [0, 0, 0], [0, .2, 1], [5, 3, 4]])
pearsonr(X[:, 0], X[:, 1])
returns (0.9915008164289165, 0.00849918357108348), a correlation of about .9915 between columns one and two of X, with p-value .0085.
I could easily get my desired matrix using nested loops:
- Pre-populate
Cas a 3-by-3 matrix of zeros. - Each pass of the nested loop will correspond to two columns of
X. The entry ofCcorresponding to this pair of columns will be set to the pairwise correlation provided the p-value is less than or equal to my threshold, say .01.
I'm wondering if there's a simpler way. I know in Pandas, I can create the correlation matrix, C, in basically one line:
import pandas as pd
df = pd.DataFrame(data=X)
C_frame = df.corr(method='pearson')
C = C_frame.to_numpy()
Is there a way to get the matrix or data frame of p-values, P, without a loop? If so, how could I set each entry of C to zero should the corresponding p-value in P exceed my threshold?