I have a dataset similar to this one:
    Mother ID ChildID    ethnicity
0     1       1          White Other
1     2       2          Indian
2     3       3          Black
3     4       4          Other
4     4       5          Other
5     5       6          Mixed-White and Black
To simplify my dataset and make it more relevant to the classification I am performing, I want to categorise ethnicities into 3 categories as such:
- White: within this category I will include 'White British' and 'White Other' values
- South Asian: the category will include 'Pakistani', 'Indian', 'Bangladeshi'
- Other: 'Other', 'Black', 'Mixed-White and Black', 'Mixed-White and South Asian' values
So I want the above dataset to be transformed to:
    Mother ID ChildID    ethnicity
0     1       1          White
1     2       2          South Asian
2     3       3          Other
3     4       4          Other
4     4       5          Other
5     5       6          Other
To do this I have run the following code, similar to the one provided in this answer:
    col         = 'ethnicity'
    conditions  = [ (df[col] in ('White British', 'White Other')),
                   (df[col] in ('Indian', 'Pakistani', 'Bangladeshi')),
                   (df[col] in ('Other', 'Black', 'Mixed-White and Black', 'Mixed-White and South Asian'))]
    choices     = ['White', 'South Asian', 'Other']
        
    df["ethnicity"] = np.select(conditions, choices, default=np.nan)
    
But when running this, I get the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any idea why I am getting this error? Am I not handling the string comparison correctly? I am using a similar technique to manipulate other features in my dataset and it is working fine there.
 
     
    