I have a data set which has patient diagnostic (ICD-9) codes, which can have a length between 3-5 digits, where the first three digits represent a classification of diagnosis, and the 4th and 5th represent a further refinement of the classification. For example:
zz<-"     dx1   dx2   dx3
1  64251 82381  8100
2   8052  8730 51881
3  64421   431 81601
4   3041 29690  9920
5  72888  8782 59080
6   7245 60886  8479
7    291  4659  4739
8  30410 30400 95901
9   2929 30500  8208
10  7840  6268  8052"
df<-read.table(text=zz, header=TRUE)
Each row of codes represents multiple diagnoses of the same individual. I have written a series of ifelse statements to create a new variable with the codes I’m interested in so they are mapped to numbers representing different diagnoses of interest:
df$x<-ifelse(grepl("^291", dx1),1, ifelse(grepl("^292", dx1),1 
       ifelse(grepl("^3040", dx1),2,ifelse(grepl("^3047", dx1),2,
       ifelse(grepl("^3051", dx1),3,ifelse(grepl("^98984", dx1),3,0))))))
Where I run into trouble is when I want to check for these select codes across each of the columns containing diagnostic codes. I attempted to write a function for this:
df$alldx<-apply(df[,c(1:3)],MARGIN = 2, function(dx){
  ifelse(grepl("^291", dx),1, ifelse(grepl("^292", dx),1 
  ifelse(grepl("^3040", dx),2,ifelse(grepl("^3047", dx),2,
  ifelse(grepl("^3051", dx),3,ifelse(grepl("^98984", dx),3,0)))))) 
})
The problem is I only want to count an individual once if they have one of the codes of interest; in the case of multiple code matches, then that person’s code should be whichever diagnosis was given first. I feel like there must be a way to do this, but it’s well beyond my coding abilities!
 
    