I would like to generate random floating number including NaN in a Data Frame with np.random.randn
            Asked
            
        
        
            Active
            
        
            Viewed 2,199 times
        
    1
            
            
         
    
    
        boboo
        
- 105
- 12
- 
                    1What would like to be the distribution of NaNs? Eg. NaN with probability p and uniform random with probability 1-p? – Jónás Balázs Dec 30 '18 at 22:37
- 
                    I would like 2/7 NaNs. – boboo Dec 30 '18 at 22:42
- 
                    I suggest to check this [question](https://stackoverflow.com/questions/34962104/pandas-how-can-i-use-the-apply-function-for-a-single-column) – Jónás Balázs Dec 30 '18 at 22:44
2 Answers
6
            You can generate an array of random floats, then create a mask with np.choice using p to allow you to set a weight for the number of NaN to include. 
Something like:
import numpy as np
a = np.random.randn(20)
mask = np.random.choice([1, 0], a.shape, p=[.1, .9]).astype(bool)
a[mask] = np.nan
Result:
array([ 1.2769248 ,  0.5949608 , -1.38006737,  0.3582266 , -1.852884  ,
        0.81121663, -1.45830948,  0.03117856,  0.54509948,  1.22019729,
        1.71643753,         nan, -0.32470862, -0.77604474,  0.76698089,
       -0.47863251,         nan, -0.33308071, -0.32026717,  1.8493752 ])
 
    
    
        Mark
        
- 90,562
- 7
- 108
- 148
0
            
            
        If you are working on a DataFrame you can use apply.
import numpy as np
import pandas as np
df = pd.DataFrame()
df['a'] = np.zeros(10) # or get data from somewhere else
p = 2/7
df.a.apply(lambda x: np.nan if np.random.rand() < p else np.random.rand())
 
    
    
        Jónás Balázs
        
- 781
- 10
- 24