As titled, it seems very didactic to set random_state for every randomness-related pandas function. Any way to set it only once to make sure the random state is set for all functions?
            Asked
            
        
        
            Active
            
        
            Viewed 7,631 times
        
    9
            
            
         
    
    
        Mr.cysl
        
- 1,494
- 6
- 23
- 37
- 
                    1https://stackoverflow.com/questions/11526975/set-random-seed-programwide-in-python – BENY Sep 17 '18 at 20:39
- 
                    This arg is optional, no? – Oliver Charlesworth Sep 17 '18 at 20:40
- 
                    @Wen Does this work with pandas? – Mr.cysl Sep 17 '18 at 20:40
- 
                    @OliverCharlesworth Yes it is. But I am trying to make sure I could reproduce what I am doing, so I need to set random_state for every (applicable) function. – Mr.cysl Sep 17 '18 at 20:42
1 Answers
15
            Pandas functions get their random source by calling pd.core.common._random_state, which accepts a single state argument, defaulting to None.  From its docs:
Parameters
----------
state : int, np.random.RandomState, None.
    If receives an int, passes to np.random.RandomState() as seed.
    If receives an np.random.RandomState object, just returns object.
    If receives `None`, returns np.random.
    If receives anything else, raises an informative ValueError.
    Default None.
So if it gets None, which is the default value for the caller's random_state, it returns the np.random module itself:
In [247]: pd.core.common._random_state(None)
Out[247]: <module 'numpy.random' from 'C:\\Python\\lib\\site-packages\\numpy\\random\\__init__.py'>
and it will use the global numpy state. So:
In [262]: np.random.seed(3)
In [263]: pd.Series(range(10)).sample(3).tolist()
Out[263]: [5, 4, 1]
In [264]: pd.DataFrame({0: range(10)}).sample(3)[0].tolist()
Out[264]: [3, 8, 2]
In [265]: np.random.seed(3)
In [266]: pd.Series(range(10)).sample(3).tolist()
Out[266]: [5, 4, 1]
In [267]: pd.DataFrame({0: range(10)}).sample(3)[0].tolist()
Out[267]: [3, 8, 2]
If any method doesn't respect this, it's a bug.
 
    
    
        DSM
        
- 342,061
- 65
- 592
- 494
- 
                    1So whenever I set numpy's random seed and do not pass any sort of random_state to pandas operations, my code will still be deterministic based on np.random.seed. Is that right? – Mr.cysl Sep 17 '18 at 20:50
- 
                    1
- 
                    Thanks!! Also, is there a connection between `np.random.seed` and `random.seed`? – Mr.cysl Sep 17 '18 at 20:53
- 
                    2