I have one data frame of length 5923 and the second data frame of length 68709. The first data frame looks like this and the second data frame looks like this
Their common column is the first column "people_id".
So far i've done this:
#
# This R function merges two tables and returns the resulting table in a new data frame.
    # inputs
# 1. tbl1 loaded from a csv file.
# 2. tbl2 is output from an query containing people_id and repository_id
# There can be multiple repository ids associated to each people id
#
mergetbl <- function(tbl1, tbl2)
{
  # tbl1 -- from csv file
  # tbl2 -- from sql query
  # 1. create an empty data frame
  # 2. go through tbl1 row by row
  # 3. for each row in tbl1, look at the current people_id in tbl2 and extract all associated repository_id
  # 4. duplicate the same row in tbl1 the same number of times there are associated repository ids
  # 5. merge duplicate rows with the column repository ids
  # 6. merge duplicate rows into new data frame
  # 7. repeat from 2. until last row in tbl1
  newtbl = data.frame(people_id=numeric(),repoCommitted=numeric(),isAuthor=numeric(),repoAuthor=numeric(),commonRepo=numeric())
  ntbl1rows<-nrow(tbl1)
  tbl2patched<-tbl2[complete.cases(tbl2),]
  for(n in 1:ntbl1rows)
  {
    ndup<-nrow(tbl2patched[tbl2patched$people_id==tbl1$people[n],])
    duprow<- tbl1[rep(n,ndup),]
    newtbl<-rbind(newtbl,duprow)
  }
}
Im stuck at step 5 where it merges the column from tbl2patched of "repository_id" to the newtbl where the id matches. The first data frame looks like this:
    people  committers  repositoryCommitter authors repositoryAuthor
 1  1       921         183                896      178
 2  2       240         18                 209      22
 3  3       3           2                  28       11
 4  4       6548        23                 6272     29
 5  5       3557        146                3453     146
and so on... until 5923 rows return.
The second data frame:
    people_id repository_id
    1           1
    1           2
    1           6
    1           7
    1           10
and so on till 68709 rows.
The output should look like this: This is what the sample looks like:
    people_id committers   repoCommitter authors   repoAuthors  commonRepo
1    1        921          183            896       178           1
2    1        921          183            896       178           2
3    1        921          183            896       178           6
4    1        921          183            896       178           7
5    1        921          183            896       178           10
 
    