Does Pandas have an equivalent of R's na (meaning not available)? If not, what is the convention for representing a missing value, as opposed to NaN which represents a mathematically impossible value such as a divide by zero?
3 Answers
Currently there is no NA value available in Pandas or NumPy. From the section "Working with missing data" in the Pandas manual (http://pandas.pydata.org/pandas-docs/stable/missing_data.html):
The choice of using NaN internally to denote missing data was largely for simplicity and performance reasons. It differs from the MaskedArray approach of, for example,
scikits.timeseries. We are hopeful that NumPy will soon be able to provide a native NA type solution (similar to R) performant enough to be used in pandas.
Also, this part of the documentation (http://pandas.pydata.org/pandas-docs/stable/gotchas.html#nan-integer-na-values-and-na-type-promotions) provides more details on the trade-offs in this choice of NA representation.
- 620
- 1
- 5
- 11
-
1See also [NumPy or Pandas: Keeping array type as integer while having a NaN value](http://stackoverflow.com/questions/11548005/numpy-or-pandas-keeping-array-type-as-integer-while-having-a-nan-value/11548224#11548224) – smci Dec 06 '14 at 02:32
-
You can use it from numpy:
import numpy as np
np.nan
or simply
float('NaN')
In pandas docs the np.nan version is used mostly: http://pandas.pydata.org/pandas-docs/dev/missing_data.html
- 24,826
- 18
- 96
- 135