How to read .xlsm files in a folder, a folder that is present in different folders using R

Question

I have directory with a list of folders which contains a folder named "ABC" . This "ABC" has '.xlsm' files. I want to use a R code to read '.xlsm' files in the folder "ABC", which under different folders in a directory.

Thank you for your help

DanY · Answer 1 · 2018-08-08T16:15:20.440

If you already know the paths to each file, then simply use read_excel from the readxl package:

library(readxl)
mydata <- read_excel("ABC/myfile.xlsm")

If you first need to get the paths to each file, you can use a system command (I'm on Ubuntu 18.04) to find all of the paths and store them in a vector. You can then import them one at a time:

myshellcommand <- "find /path/to/top/directory -path '*/ABC/*' -type d"
mypaths <- system(command = myshellcommand, intern = TRUE)

score 0 · Answer 2 · answered Aug 08 '18 at 16:17

Because of your directory requirements, one method for finding all of the files can be a double list.files:

ld <- list.files(pattern="^ABC$", include.dirs=TRUE, recursive=TRUE, full.names=TRUE)
lf <- list.files(ld, pattern="\\.xlsm$", ignore.case=TRUE, recursive=TRUE, full.names=TRUE)

To read them all into a list (good ref for dealing with a list-of-frames: http://stackoverflow.com/a/24376207/3358272):

lstdf <- sapply(lf, read_excel, simplify=FALSE)

This defaults to opening the first sheet in each workbook. Other options in readxl::read_excel that might be useful: sheet=, range=, skip=, n_max=.

Konrad · Answer 3 · 2018-08-08T16:34:55.570

Given a list of *.xlsm files in your working directory you can do the following:

list.files(
    path = getwd(),
    pattern = glob2rx(pattern = "*.xlsm"),
    full.names = TRUE,
    recursive = TRUE
) -> files_to_read

lst_dta <- lapply(
    X = files_to_read,
    FUN = function(x) {
        cat("Reading:", x, fill = TRUE)
        openxlsx::read.xlsx(xlsxFile = x)
    }
)

Results

Given two files, each with two columns A, B and C, D the generated list corresponds to:

>> lst_dta
[[1]]
  C D
1 3 4

[[2]]
  A B
1 1 2

Notes

This will read all .xlsm files found in the directory tree starting from getwd().
openxlsx is efficient due to the use of Rcpp. If you are going to be handling a substantial amount of MS Excel files this package is worth exploring, IMHO.

Edit

As pointed out by @r2evans in comments, you may want to read *.xlsm files that reside only within ABC folder ignoring *.xlsm files outside the ABC folder. You could filter your files vector in the following manner:

grep(pattern = "ABC", x = files_to_read, value = TRUE)

Unlikely, if you have *.xlsm files that have ABC string in names and are saved outside ABC folder you may get extra matches.

Given the premise that only the XLSM files within "ABC" directories are of interest, this may identify extra files if other non-ABC-based files exist. — r2evans, Aug 08 '18 at 16:23

How to read .xlsm files in a folder, a folder that is present in different folders using R

3 Answers3

Results

Notes

Edit