I have a large group of data with various names and sources, in a large dataframe.
Reproducible data by Anshul Jain
First_Name  Last_Name   Source
      Matt      Jones       XX
     James      Smith       YY
     Smith     Weston       AA
    Weston  Supermare       CC
      Matt      Jones       YY
    Weston  Supermare       FF
# copy in with:
df = pd.read_clipboard(sep='\\s+')
The data looks as follows:
+------------+-----------+--------+
| First Name | Last Name | Source |
+------------+-----------+--------+
| Matt       | Jones     | XX     |
| James      | Smith     | YY     |
| Smith      | Weston    | AA     |
| Weston     | Supermare | CC     |
| Matt       | Jones     | YY     |
| Weston     | Supermare | FF     |
+------------+-----------+--------+
I need it to look like this:
+------------+-----------+--------+
| First Name | Last Name | Source |
+------------+-----------+--------+
| Matt       | Jones     | XX, YY |
| James      | Smith     | YY     |
| Smith      | Weston    | AA     |
| Weston     | Supermare | CC, FF |
+------------+-----------+--------+
I can get the deduplication process to work using:
Conn_df = Conn_df.drop_duplicates(subset=['First Name', 'Last Name'])
However, before I deduplicate, I need to record all the sources for the same data on the same row.
 
     
     
    