I need to preprocess a column for ML, but my feature contains more than one genre - in alphabetical order for each string (that is, my idea to use .startswith for the first genre isn't working). I'm new to Python, and this function is the only way I figured out - but it produces too many 'Other's since most of the movies in a database have more than one genre. Can you kindly suggest more optimal solutions?
cols_to_check = ['Action','Drama','Comedy', 'Romance', 'History', 'War']
def update_genre(row):
        x = row['genre']
        if x == 'Action':
            row["Genre"] = 'Action' 
        elif  x == 'Comedy':
            row["Genre"] = 'Comedy'
        elif  x == 'Drama':
            row["Genre"] = 'Drama'
        elif  x == 'Romance':
            row["Genre"] = 'Romance'
        elif  x == 'War':
            row["Genre"] = 'War'
        else:
            row["Genre"] = 'Other'
        return row
df[["Genre"]] = 0
df= df.apply(update_genre, axis=1)  
So above is what I've tried, and I expect to somehow take out a genre - whether is a standalone genre or a substring. My skills aren't sufficient, I suppose.
Data looks like this
Drama                         8498
Comedy                        5420
Comedy, Drama                 2654
Drama, Romance                2529
Comedy, Romance               1777
                              ... 
War, Action, Adventure           1
Romance, Thriller, Western       1
Action, Thriller, Western        1
Horror, Comedy, Music            1
Comedy, Sci-Fi, Sport            1
 
     
    