How to preserve Labels when SPSS file (.sav) imported into pandas via rpy?

Question

I'm looking to work on a SPSS files (.sav) using pandas. In the absence of the SPSS program, here's what a typical file looks like when converted to .csv:

On investigation into what the first two rows signify (I don't know SPSS), it seems that the first row contains the Labels, while the second row contains the VarNames.

When I bring the file into pandas thus:

import pandas.rpy.common as com

def savtocsv(filename):
    w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
    w = com.convert_robj(w)
    return w

and then do a head(), the first row (Label) is missing:

How can labels be maintained?

Ref: Is there a Python module to open SPSS files?
Python: 2.7.10
Pandas: 0.17.1

ayhan · Accepted Answer · 2016-03-29T23:26:17.980

6

Labels in a sav file are stored in variable.labels attribute of the returning object from the read.spss function.

You can get the variable labels with the following:

import pandas.rpy.common as com

def get_labels(filename):
    w = com.robj.r('attr(foreign::read.spss("%s"), "variable.labels")' % filename)
    w = com.convert_robj(w)
    return w

If you want to set the labels as the column names of your dataframe:

import pandas.rpy.common as com

def savtocsv(filename):
    w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
    cols = list(com.robj.r("attr")(w, "variable.labels"))
    w = com.convert_robj(w)
    w.columns = cols
    return w

edited Mar 29 '16 at 23:26

answered Mar 29 '16 at 22:14

ayhan

70,170
20
182
203

Great, that seems to do what I need, thanks. I guess I can then someone with pandas wedge these in to be the column headers, replacing the varName values. But is it possible do the conversion **and** include the labels in one go (one call to `com.robj.r()`), to save handing to do further manipulating in pandas? – Pyderman Mar 29 '16 at 22:29
1

It is possible to read the file once and get the attributes on the returning object but it will require another r call I think. Please see the update. – ayhan Mar 29 '16 at 23:25

How to preserve Labels when SPSS file (.sav) imported into pandas via rpy?

1 Answers1

Linked