I am attempting to construct dataframes using large amount of data stored in txt files. I did not construct the data, however, so I am having to work with the frustrating formatting contained within. I couldn't get my code to work within the large data (and almost crashed my computer doing so), so set up a smaller dataframe like so:
    'Value'             ID_1                ID_2
0   11122222            ABC42123            33333
1   21219299            YOF21233            88821
2   00022011            ERE00091            23124
3   75643311;21233332   ROB21288            99421
4   12412421            POW94277            12231;33221
5   54221721            IRS21231;YOU28137   13123
My frustration lies in the use of semicolons in the data. The data is meant to represent IDs, but multiple IDs have been assigned to multiple variables. I want to repeat these rows so that I can search through the data for individual IDs and have a datatable that looks like so:
    'Value'             ID_1                ID_2
0   11122222            ABC42123            33333
1   21219299            YOF21233            88821
2   00022011            ERE00091            23124
3   75643311            ROB21288            99421
4   21233332            ROB21288            99421
5   12412421            POW94277            12231
6   12412421            POW94277            33221
7   54221721            YOU28137            13123
8   54221721            IRS21231            13123
Reindexing is not a problem, so long as the different IDs stay linked to each other and to their correct values.
Unfortunately, all my attempts to split the data have, so far, ended in abject failure. I have managed to set up a function that repeats data containing a semicolon, parse that through my function for each column, but then fail to split the data afterwards.
def delete_dup(df,column):
for a in column:
    location = df.loc[df.duplicated(subset= column, keep=False)]
    for x in location:
        semicolon = df.loc[df[column].str.contains(';', regex=True)]
        duplicate = semicolon.duplicated(subset= column, keep='first')
        tiny_df = semicolon.loc[duplicate]
        split_up = tiny_df[column].str.split(';')
        return pd.concat([df, split_up])
  'Value'              ID_1              ID_2           0
  11122222             ABC42123          33333          NaN
  21219299             YOF21233          88821          NaN
  00022011             ERE00091          23124          NaN
  75643311;21233332    ROB21288          99421          NaN
  12412421             POW94277          12231;33221    NaN
  54221721             IRS21231;YOU28137 13123          NaN
  75643311;21233332    ROB21288          99421          NaN
  54221721             IRS21231;YOU28137 13123          NaN
  12412421             POW94277          12231;33221    NaN
  NaN                  NaN               NaN            [75643311, 21233332]
I feel like this is the closest I've come and it's still nowhere near what I want. Any "If" statements I try to do on dataframes are met with the "ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." error, which is so frustrating to read. Any ideas on how to make pandas do what I want?
 
     
    