I have a dataframe that includes personal data, including a column called First Name. Due to the way the data was entered, some of the rows in this column contain two individuals, i.e. "Dan and Sally". I am able to filter these rows out based on the other criteria I need. What I would like to do is duplicate everything in those rows and split them so I get one for "Dan" and one for "Sally". Simplified, something like:
df1 = pd.DataFrame({'First Name':['Steve','Jim','Bob','Dan and Sally',
                                  'Mary','Jane','Bill and Kathy'],
                    'Last Name':['Jobs','Croce','Dylan','Blapper',
                                 'Carpenter','Goodall','Johnson']})
This results in:
>>> df1
       First Name  Last Name
0           Steve       Jobs
1             Jim      Croce
2             Bob      Dylan
3   Dan and Sally    Blapper
4            Mary  Carpenter
5            Jane    Goodall
6  Bill and Kathy    Johnson
Then, I can filter down using:
doubles = df1[df1['First Name'].str.contains(' and ')]
       First Name Last Name
3   Dan and Sally   Blapper
6  Bill and Kathy   Johnson
What I now want to do is duplicate the entire row for each of these, but have the First Name field in the first row be everything before ' and ' and the second one be everything after it:
First Name   Last Name
       Dan     Blapper
     Sally     Blapper
      Bill     Johnson
     Kathy     Johnson
These entries would replace the existing "and" entries in the original dataframe, so I have only true individuals (recognizing that I'm going to be duplicating contact info between two members of a household).
I thought I could use repeat and then filter them pairwise, but it seems like there's probably an obvious solution I'm missing.
 
     
    