I want to join these two dataframes:
df1 = pd.DataFrame({"text": ["example one", "example word one", "example two", "example sentance one"],
                    "label_1": ["O O", "O W O", "O O", "O S O"]})
df2 = pd.DataFrame({"text": ["example one", "example word one", "total example", "example sentance one"],
                    "label_2": ["O N", "O O N", "O O", "O O N"]})
                   text label_1
0           example one     O O
1      example word one   O W O
2           example two     O O
3  example sentance one   O S O
                   text label_2
0           example one     O N
1      example word one   O O N
2         total example     O O
3  example sentance one   O O N
My desired output is this:
                   text label_1 label_2
0           example one     O O     O N
1      example word one   O W O   O O N
2           example two     O O     NaN
3  example sentance one   O S O   O O N
5         total example     NaN     O O
So text column should have unique values, and final dataframe should have label_1 and label_2 columns with matching values. NaN can be in the result dataframe, but the best thing would be to replace NaN with O values that matches the number of words in text column (for that row). So if that row has 3 words, NaN should be replaced with O O O.
I have tried this, but I haven't got the desired output:
df = pd.concat([df1, df2])
                   text label_1 label_2
0           example one     O O     NaN
1      example word one   O W O     NaN
2           example two     O O     NaN
3  example sentance one   O S O     NaN
0           example one     NaN     O N
1      example word one     NaN   O O N
2         total example     NaN     O O
3  example sentance one     NaN   O O N
 
    