Trying to create a new dataframe first spliting the original one in two:
df1 - that contains only rows from original frame which in selected colomn has values from a given list
df2 - that contains only rows from original which in selected colomn has other values, with these values then changed to a new given value.
Return new dataframe as concatenation of df1 and df2
This works fine:
l1 = ['a','b','c','d','a','b']
l2 = [1,2,3,4,5,6]
df = pd.DataFrame({'cat':l1,'val':l2})
print(df)
 cat  val
0   a    1
1   b    2
2   c    3
3   d    4
4   a    5
5   b    6
df['cat'] = df['cat'].apply(lambda x: 'other')
print(df)
     cat  val
0  other    1
1  other    2
2  other    3
3  other    4
4  other    5
5  other    6
Yet when I define function:
def create_df(df, select, vals, other):
    df1 = df.loc[df[select].isin(vals)]
    df2 = df.loc[~df[select].isin(vals)]
    df2[select] = df2[select].apply(lambda x: other)
    result = pd.concat([df1, df2])
    return result
and call it:
df3 = create_df(df, 'cat', ['a','b'], 'xxx')
print(df3)
Which results in what I actually need:
   cat  val
0    a    1
1    b    2
4    a    5
5    b    6
2  xxx    3
3  xxx    4
And for some reason in this case I get a warning:
..\usr\conda\lib\site-packages\ipykernel\__main__.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
So how this case (when I assign value to a column in a function) is different from the first one, when I assign value not in a function?
What is the right way to change column value?
 
    