I want to keep the column (="person" in the dataframe) after performing a join in a data.table. I was able to get something close to the desired output but it required switching between data.table and dplyr, because of my limited experience with data.table :
Here the dataframe :
df<-structure(list(person = c("p1", "p1", "p1", "p1", "p1", "p1", 
"p1", "p2", "p2", "p2", "p3", "p3", "p3", "p4", "p4", "p4", "p5", 
"p5", "p5", "p6", "p6", "p6", "p7", "p7", "p7"), hp_char = c("hp1", 
"hp2", "hp3", "hp4", "hp5", "hp6", "hp7", "hp8", "hp9", "hp10", 
"hp1", "hp2", "hp3", "hp5", "hp6", "hp7", "hp8", "hp9", "hp10", 
"hp3", "hp4", "hp5", "hp1", "hp2", "hp3")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -25L), .Names = c("person", 
"hp_char"), spec = structure(list(cols = structure(list(person = structure(list(), class = c("collector_character", 
"collector")), hp_char = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("person", "hp_char")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
I am doing a self-join  to get the number of instances of co-occurence of any two "hp_id" as follows (similar to what is  elaborated in  this question). I am  keeping the "person" in by=.(...) to see who were involved the cooccurence combinations (e.g. hp1 and hp2 co-ocurred in individuals p1,p3 and p7)  :
df_by2<- setDT(df)[df, on = "person", allow = TRUE][
    hp_char < i.hp_char, .N, by = .(person ,HP_ID1 = hp_char, HP_ID2 = i.hp_char)]
However because of including "person" in by =.(person,..., the count (= N) is separated according to combination of "person","hp_id" and "hp_id2". So I switched to dplyr to get close to what I want as follows. 
dfx<- df_by2 %>% group_by(HP_ID1,HP_ID2) %>% mutate (counts=length(person)) %>% spread(person,person) %>% select (-N) %>% unique() %>% filter(counts>1) %>% unite(person,p1:p7, sep="") %>% mutate (involved_id=gsub('?NA', ' ', person)) %>% select (-person)
This is the output I get:
# A tibble: 12 x 4
   HP_ID1 HP_ID2 counts   involved_id
    <chr>  <chr>  <int>      <chr>
 1    hp1    hp2      3 p1 p3   p7
 2    hp1    hp3      3 p1 p3   p7
 3   hp10    hp8      2   p2  p5  
 4   hp10    hp9      2   p2  p5  
 5    hp2    hp3      3 p1 p3   p7
 6    hp3    hp4      2  p1    p6 
 7    hp3    hp5      2  p1    p6 
 8    hp4    hp5      2  p1    p6 
 9    hp5    hp6      2  p1  p4   
10    hp5    hp7      2  p1  p4   
11    hp6    hp7      2  p1  p4   
12    hp8    hp9      2   p2  p5 
This is close but the desired output (with properly formatted albeit untidy "involved_id" column) is:
# A tibble: 12 x 4
   HP_ID1 HP_ID2 counts   involved_id
    <chr>  <chr>  <int>      <chr>
 1    hp1    hp2      3 p1, p3, p7
 2    hp1    hp3      3 p1, p3, p7
 3   hp10    hp8      2     p2, p5
 4   hp10    hp9      2     p2, p5
 5    hp2    hp3      3 p1, p3, p7
 6    hp3    hp4      2     p1, p6
 7    hp3    hp5      2     p1, p6
 8    hp4    hp5      2     p1, p6
 9    hp5    hp6      2     p1, p4
10    hp5    hp7      2     p1, p4
11    hp6    hp7      2     p1, p4
12    hp8    hp9      2     p2, p5
All of this is very cumbersome and I was wondering if there is a simpler approach to this. I have just recently came across data.table and enjoying learning it. Any help using data.table is highly appreciated.