I have a very large data frame (think 30-50 million records), hence I'm using data.table for this problem. I am much more familiar with dplyr than with data.table.
Let's consider the following small example. Note that there are many more columns in my actual data set.
library(data.table)
library(magrittr)
library(stringi)
set.seed(42)
format_pct <- function(x){
paste0(formatC(x * 100, digits = 1, format = 'f'), "%")
}
df <- data.frame(x = c(1, NA, 2, 4, NA),
y = c(0, 1, NA, 2, 5),
huge_numsf = sample.int(500000:1000000, size = 5),
huge_numsg = sample.int(500000:1000000, size = 5),
percent_a = format_pct(runif(5)),
percent_b = format_pct(runif(5)))
> df
x y huge_numsf huge_numsg percent_a percent_b
1 1 0 457404 259548 45.8% 94.0%
2 NA 1 468537 368294 71.9% 97.8%
3 2 NA 143070 67334 93.5% 11.7%
4 4 2 415222 328495 25.5% 47.5%
5 NA 5 320871 352530 46.2% 56.0%
I would like to apply prettyNum() to all columns except x, y, and columns with the string 'percent'.
If this data frame weren't large, I would do
df[,colnames(df)[
!(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
]] <-
apply(X = df[,colnames(df)[
!(colnames(df) %in% c("x", "y", colnames(df)[stri_detect_fixed(colnames(df), "percent")]))
]],
MARGIN = 2,
FUN = prettyNum,
big.mark = ",")
> df
x y huge_numsf huge_numsg percent_a percent_b
1 1 0 457,404 259,548 45.8% 94.0%
2 NA 1 468,537 368,294 71.9% 97.8%
3 2 NA 143,070 67,334 93.5% 11.7%
4 4 2 415,222 328,495 25.5% 47.5%
5 NA 5 320,871 352,530 46.2% 56.0%
Assume now that we've made df a data.table; i.e.:
df <- data.frame(x = c(1, NA, 2, 4, NA),
y = c(0, 1, NA, 2, 5),
huge_numsf = sample.int(500000:1000000, size = 5),
huge_numsg = sample.int(500000:1000000, size = 5),
percent_a = format_pct(runif(5)),
percent_b = format_pct(runif(5))) %>%
data.table(.)
Is there a way to do the above using data.table syntax?