I want to match 2 controls for every case with two conditions:
the
agedifference should between ±2;the
incomedifference should between ±2.
If there are more than 2 controls for a case, I just need to select 2 controls randomly. And then, how do I generate a new variable that indicates the control that each case matches? For example, Control1 and Control2 matched by Case1 are encoded as group 1, and Control1 and Control2 matched by Case2 are encoded as group 2.
DATA
dat = structure(list(id = c(1, 2, 3, 4, 111, 222, 333, 444, 555, 666,
777, 888, 999, 1000),
age = c(10, 20, 44, 11, 12, 11, 8, 12, 11, 22, 21, 18, 21, 18),
income = c(35, 72, 11, 35, 37, 36, 33, 70, 34, 74, 70, 44, 76, 70),
group = c("case", "case", "case", "case", "control", "control",
"control", "control", "control", "control", "control",
"control", "control", "control")),
row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
EXPECTED OUTPUT
| id | age | income | group | index |
|---|---|---|---|---|
| 1 | 10 | 35 | case | 1 |
| 2 | 20 | 72 | case | 2 |
| 3 | 44 | 11 | case | 3 |
| 4 | 11 | 35 | case | 4 |
| 111 | 12 | 37 | control | 1 |
| 222 | 11 | 36 | control | 1 |
| 333 | 8 | 33 | control | 4 |
| 555 | 11 | 34 | control | 4 |
| 777 | 21 | 70 | control | 2 |
| 1000 | 18 | 70 | control | 2 |
This is similar to my previous question, but I want the output to have an extra variable called index to indicate the specific controls for case matching. If a case and a control have the same index, it means that specific controls is matched with that case.
The question is how can I create the index, preferably with an approach based on the previous question.