I have a large dataframe with 2 columns: The one has a discrete number of values that appear repetitively, while the other only has unique values. Essentially multiple values in column 2 will correspond to one value in column 1.
As the data has currently been acquired, it lists each unique variable in column 2 as a row, which means there are repeated values in column 1.
I want to transform (essentially flip) the data so that I can see which column 2 values fall under each unique value in column 1.
For example, the df is:
| Contig | Gene |
|---|---|
| C20 | G1 |
| C10 | G2 |
| C40 | G3 |
| C20 | G4 |
| C40 | G5 |
| C30 | G6 |
And I want:
| Contig | Gene |
|---|---|
| C10 | G2 |
| C20 | G1, G4 |
| C30 | G6 |
| C40 | G3, G5 |
If I only get the number of unique values that will also be okay:
| Contig | Gene(s) |
|---|---|
| C10 | 1 |
| C20 | 2 |
| C30 | 1 |
| C40 | 2 |
I hope it makes sense. I've been struggling to find the right keywords to explain this issue and really don't know where to begin. Although I get the feeling I should maybe turn the data into a list.