I have a data frame consisting of apx. 20 Mio. lines! The table looks as follows:
Output_I
    cusip_id    price
    uidiso      100.5
    undnns      90.2
    xsodeid     45.5
    uidiso      99.5
    xsodeid     45.1
    undnns      90.0
Now I have a second data frame consisting of the cusip_id plus additional information:
ouput_II
    cusip_id    ISIN
    uidiso      xs987346325
    undnns      ch438763282
    xsodeid     xs937349494
I'd like to merge output_I with output_II based on cusip_id in order to get the following results:
    output_III
    cusip_id    price   ISIN
    uidiso  100.5   xs987346325
    undnns  90.2    ch438763282
    xsodeid 45.5    xs937349494
    uidiso  99.5    xs987346325
    xsodeid 45.1    xs937349494
    undnns  90.0    ch438763282
The challenge is the size of output_I which consists of apx. 20 mio. lines. I have tried following codes:
library(dplyr)
output_III= left_join(output_I, output_II, by="cusip_id")
library(data.table)
ldt = data.table::data.table(output_I)
rdt = data.table::data.table(output_II, key = c("cusip_id", "ISIN")
output_III= merge(ldt, rdt)
Is there probably a way to use dplyr and group_by?
Or is this not possible with such a big data frame?
I appreciate your feedback.
 
    