1

Assume there are two tables a and b.

Table a:

ID AGE
 1  20
 2  empty
 3  40
 4  empty

Table b:

ID AGE
 2  25
 4  45
 5  60

How to merge the two table in R so that the resulting table becomes:

ID AGE
 1  20
 2  25
 3  40
 4  45
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
xyin
  • 417
  • 2
  • 7
  • 19

3 Answers3

2

Assuming you have NA on every position in the first table where you want to use the second table's age numbers you can use rbind and na.omit.

Example

x <- data.frame(ID=c(1,2,3,4), AGE=c(20,NA,40,NA))
y <- data.frame(ID=c(2,4,5), AGE=c(25,45,60))
na.omit(rbind(x,y))

Results in what you're after (although unordered and I assume you just forgot ID 5)

ID AGE
1  20
3  40
2  25
4  45
5  60

EDIT

If you want to merge two different data.frames's and keep the columns its a different thing. You can use merge to achieve this.

Here are two data frames with different columns:

x <- data.frame(ID=c(1,2,3,4), AGE=c(20,NA,40,NA), COUNTY=c(1,2,3,4))
y <- data.frame(ID=c(2,4,5), AGE=c(25,45,60), STATE=c('CA','CA','IL'))

Add them together into one data.frame

res <- merge(x, y, by='ID', all=T)

giving us

ID AGE.x COUNTY AGE.y STATE
 1    20      1    NA  <NA>
 2    NA      2    25    CA
 3    40      3    NA  <NA>
 4    NA      4    45    CA
 5    NA     NA    60    IL

Then massage it into the form we want

idx <- which(is.na(res$AGE.x))                     # find missing rows in x
res$AGE.x[idx] <- res$AGE.y[idx]                   # replace them with y's values
names(res)[agrep('AGE\\.x', names(res))] <- 'AGE'  # rename merged column AGE.x to AGE
subset(res, select=-AGE.y)                         # dump the AGE.y column

Which gives us

ID AGE COUNTY STATE
 1  20      1  <NA>
 2  25      2    CA
 3  40      3  <NA>
 4  45      4    CA
 5  60     NA    IL
while
  • 3,602
  • 4
  • 33
  • 42
  • Thanks for the idea! Another question is: If table a and table b both have another column which is not shared in common, is there anyway to do the merge, fill the empty cells, and keep both extra columns in the resulting table? For example, table a has an extra column of STATE, table b has an extra column of COUNTY? Thanks! – xyin Apr 17 '15 at 15:43
  • Sure but then this wouldn't work. Ill add an edit with another solution to that problem. – while Apr 17 '15 at 15:52
2

You could try

library(data.table)
setkey(setDT(a), ID)[b, AGE:= i.AGE][]
#   ID AGE
#1:  1  20
#2:  2  25
#3:  3  40
#4:  4  45

data

a <- data.frame(ID=c(1,2,3,4), AGE=c(20,NA,40,NA))
b <- data.frame(ID=c(2,4,5), AGE=c(25,45,60))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Provide setting key functionality to `setDT`directly. – Arun Apr 17 '15 at 15:54
  • @Arun You meant something like `setDT(a, key=ID)`, but not yet implemented ? – akrun Apr 17 '15 at 15:56
  • Right. `setDT(dat, col1, col2, ...)` and probably `setDTv(dat, key = c("col1", "col2"))`... – Arun Apr 17 '15 at 16:00
  • 2
    @Arun That looks more convenient – akrun Apr 17 '15 at 16:04
  • @akrun Thanks for the help! I am just wondering about the meaning of "AGE := i.AGE". I didn't find corresponding explanations in the help file of data.table package. – xyin Apr 20 '15 at 17:41
  • @xyin May be this link helps http://stackoverflow.com/questions/22976527/assigning-a-subset-of-data-table-rows-and-columns-by-join – akrun Apr 20 '15 at 17:49
1

The package in the other answer will work. Here is a dirty hack if you don't want to use the package:

x$AGE[is.na(x$AGE)] <- y$AGE[y$ID %in% x$ID]
> x
  ID AGE
1  1  20
2  2  25
3  3  40
4  4  45

But, I would use the package to avoid the clunky code.

Pierre L
  • 28,203
  • 6
  • 47
  • 69