I need to run a multiple join query with sqldf. I have an original code written in SQL that has to be replicated in R, and so I'm using sqldf. My problem is that among the nested joins, there is a FULL JOIN, and FULL JOIN is still not supported in sqldf . So I'm looking for a workaround.
The nested joins I have are a full join followed by a left join. My solution was to bring the full join out of the query, run it, and with the resulting data set, run the next left join. The original code says:
select
  i1.ID as id1, 
  i2.ID as id2,
  i3.type
  from Blad1 i1 
  full join Blad2 ii3 on i1.ID_pers = ii3.ID_pers_1
  left join Blad1 i2 on ii3.ID_pers_2 = i2.Id_pers 
  left join Blad1 i3 on i1.ID_pers = i3.Id_pers
  where i1.Type = 'OCCUPATION'
and my solution was to run the full join with the dplyr function full_join:
fulljoin <- full_join(Blad1, Blad2, by = c("ID_pers" = "ID_pers_1"))
and then, the query:
B <- sqldf ("select
             i1.ID as id1, 
             i2.ID as id2,
             i3.type
             from fulljoin i1
             left join INDIVI i2 on i1.ID_pers_2 = i2.Id_pers
             left join Blad1 i3 on i1.ID_pers = i3.Id_pers
             where i1.Type = 'OCCUPATION'")
The two data sets are:
Blad1
ID  ID_D      ID_pers   SOURCE           TYPE        VALUE
1   STANDARD    1   PARISH RECORD   ARRIVAL_FROM    Bijmeer
2   STANDARD    2   PARISH RECORD   OCCUPATION      Almere
3   STANDARD    2   PARISH RECORD   ARRIVAL_FROM    WISKUNDE
4   STANDARD    3   PARISH RECORD   OCCUPATION      BILDERDIJK
5   STANDARD    4   PARISH RECORD   ARRIVAL_FROM    Enschede
Blad2
ID      ID_D    ID_pers_1   ID_pers_2   RELATION
933104  STANDARD    4        2021643    Father
1478    STANDARD    4        1          Child
1664    STANDARD    2118979  2021107     Child
6918    STANDARD    2481830  2          Child
7088    STANDARD    2        2011817     Child
1268853 STANDARD    1        3          Mother
The problem is that my solution is not working properly as the original clause after the left join uses table Blad 2
(ii3.ID_pers_2 = i2.Id_pers), 
whereas my solution, for that same clause I can only do it with the Full Join resulting table
(i1.ID_pers_2 = i2.Id_pers)
These two conditions are not the same, so the outcome of both left joins are different, and so the nested queries, and so the final result.
Any tips, welcome.
Thank you for reading.
 
    