SettingWithCopyWarning using pandas apply

Question

Trying to figure out why the below function is returning the dreaded SettingWithCopyWarning... Here is my function that intends to modify the dataframe df by reference.

def remove_outliers_by_group(df, cols):
    """
    Removes outliers based on median and median deviation computed using cols
    :param df: The dataframe reference
    :param cols: The columns to compute the median and median dev of
    :return:
    """
    flattened = df[cols].as_matrix().reshape(-1, )
    median = np.nanmedian(flattened)
    median_dev = np.nanmedian(np.abs(flattened) - median)
    for col in cols:
        df[col] = df[col].apply(lambda x: np.nan if get_absolute_median_z_score(x, median, median_dev) >= 2 else x)

And the offending line is df[col] = df[col].apply(lambda x: np.nan if get_absolute_median_z_score(x, median, median_dev) >= 2 else x) as per this error:

A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy df[col] = df[col].apply(lambda x: np.nan if get_absolute_median_z_score(x, median, median_dev) >= 2 else x)

What I don't understand is that I see this pattern all over the place, using something like df['a'] = df['a'].apply(lambda x: ...), so I can't imagine all of them are doing it wrong.

Am I doing it wrong? What is the best way to do this? I want to modify the original dataframe.

Thanks for your help.

It is not due to the apply method but the fact that you reassign a column of your dataframe. You can use `copy()`or simply disable the warning. — Thomas Grsp, Aug 10 '17 at 14:05
So am I modifying the original dataframe in that line? That is what I want. Or am I creating a new dataframe and not modifying the passed `df` (I don't want this) — coolboyjules, Aug 10 '17 at 14:09
In fact, you are modifying the original dataframe, i give you more insight in an answer. — Thomas Grsp, Aug 10 '17 at 14:24

score 20 · Answer 1 · edited Aug 26 '22 at 18:41

20

Make sure that df is a copy of another data frame. In that case, you should write your code like

df = df_test.copy()

This makes sure df is a copy and not a view.

Learn more about this warning from the below link

https://www.youtube.com/watch?v=4R4WsDJ-KVc

edited Aug 26 '22 at 18:41

wisbucky

33,218
10
150
101

answered Apr 01 '18 at 21:57

LazyNearestNeigbour

311
2
5

1

Thanks, that is actually right, I solved my warning with a copy. In my case I had: `df = df_original['col1', 'col2']` here add `.copy()`. Then this will not generate warning anymore: `df['col1'] = df['col1'].apply(lambda x: x)` – steco Jun 15 '18 at 13:33

Thomas Grsp · Answer 2 · 2017-08-10T14:21:46.377

19

The problem is due to the reassignement and not the fact that you use apply.

SettingWithCopyWarning is a warning that chained-indexing has been detected in an assignment. It does not necessarily mean anything has gone wrong.

To avoid, the warning, as adviced use .loc like this

df.loc[:, col] = df[col].apply(...)

edited Aug 10 '17 at 14:21

answered Aug 10 '17 at 14:04

Thomas Grsp

482
1
3
14

1

My knowledge ends here, maybe read the docs about copy in pandas. In case, you want to disable the warning you can use `pd.options.mode.chained_assignment = None` – Thomas Grsp Aug 10 '17 at 14:32
24

@coolboyjules Sometimes you can get the warning even on a line that uses `loc` (as here) because the DataFrame you are working with already (`df`) is already ambiguously a copy or view before it goes into your function, so the line you'd need to change would be in the code above somewhere (usually adding a `.copy()` on some other operation). It's annoying, but there it is. – Ajean Aug 10 '17 at 15:59
2

This answer didn't solve my issue, instead, I found another answer here (https://stackoverflow.com/a/60885847/8046546), which resolves the error by adding .reset_index(drop=True) before to the dataframe – Mapotofu Jan 19 '21 at 15:47

SettingWithCopyWarning using pandas apply

2 Answers2

Linked