I have a dataset in long format with multiple start and end dates for patients with unique id. Dates represent hospital admission and discharge. Some patients have multiple overlapping stays in hospital, some have stays that don't overlap, and other cases have start (admission) and end dates (discharge) on same day.
Building on a post that used a lagged start date and the cummax function, I wish to do 3 things:
- For cases with overlapping
startandenddates, combine/merge cases, keeping the earlieststartdate and lastenddate. - For cases with
startandenddates that are the same date, maintain that observation (don't merge). - Create new variable
surgdaysthat is calculated from max days in surgical unit (surg), for both merged and non-merged cases.
I have data like this:
id start end surg
1 A 2013-01-01 2013-01-05 0
2 A 2013-01-08 2013-01-12 1
3 A 2013-01-10 2013-01-14 6
4 A 2013-01-20 2013-01-20 3
5 B 2013-01-15 2013-01-25 4
6 B 2013-01-20 2013-01-22 5
7 B 2013-01-28 2013-01-28 0
What I've tried:
library(dplyr)
data %>%
arrange(data, id, start) %>%
group_by(id) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(id, indx) %>%
summarise(start = first(start), end = last(end), surgdays = max(surg))
What I get:
id indx start end surgdays
1 A 0 2013-01-01 2013-01-05 0
2 A 1 2013-01-08 2013-01-14 7
3 A 2 2013-01-20 2013-01-20 3
The problem: the number of rows examined with this code is limited to the number of columns in my dataset. For example, with 4 variables/columns, it worked with data from only first 4 rows (including merging two rows with overlapping dates) then stopped...even though there are 7 rows in example (and thousands of rows in actual dataset).
Similarly, when I try same code with 70 columns (and hundreds of rows), it combines overlapping dates but only based on the first 70 rows. My initial thought was to create as many placeholder columns as there are observations in my dataset but this is clunky workaround.
What I'd like to get:
id indx start end surgdays
1 A 0 2013-01-01 2013-01-05 0
2 A 1 2013-01-08 2013-01-14 7
3 A 2 2013-01-20 2013-01-20 3
4 B 0 2013-01-15 2013-01-22 9
5 B 1 2013-01-28 2013-01-28 0