I've got some code looking like this:
library(stringi)
df_values <- data.frame(value = stri_rand_strings(n = 500,
length = 30))
df_keys <- tibble(key = sample(x = 1:500,
size = 25000,
replace = TRUE))
# start timer
start_time <- Sys.time()
df_keys |>
rowwise() |>
mutate(value = df_values$value[key])
# end timer
end_time <- Sys.time()
end_time - start_time
Which requires very much time to run, but I can't figure out why. The code above only requires 0.3003931 seconds. For my real code I subsetted the tibble with head(n) and got following times:
| n | time in secs |
|---|---|
| 50 | 1.993536 |
| 100 | 3.731 |
| 200 | 6.550074 |
| 300 | 9.500864 |
| 500 | 15.68515 |
| 1,000 | 32.19306 |
| ... | seems to be linear |
| 20,000 | maybe 10 minutes |
Does someone have an idea what could be wrong with my code? I guess it's the indexing-part df_values$value[key]? But my original df_values also is a data.frame with 500 obs.