I have created an RMarkdown that checks for errors, which outputs print statements that specify the error and what row numbers need to be corrected (which would check the errors in df below). I have created another dataframe (df.index in the example below) to track the rows that need to be corrected for each column (that is in df). Essentially, I need to add a column that stores a list of the rows that needs to be corrected for each column in df. Then, as I do more error checks, I will need to append to the list in a given row in df.index and add new lists to other rows for the rows column in the newly created summary dataframe.
I have looked through dozens of SO entries on lists, but cannot find a good answer. Here is what I have tried, which I show with this minimal example. This code does work and it gives me the output that I want. However, it is extremely verbose and will probably be hard for others on my project team to be able to read and make sense of it.
Minimal Example
Data
library(dplyr)
# Dataframe that contains the dataset that I'm checking for errors.
df <-
structure(
list(
`1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
NA, NA, "Jason"),
`1.2.` = c(1, 2, 2, NA, NA, 5, 3, NA),
`1.3.` = c(
"cool",
"amazing",
"wonderful",
"okay",
NA,
"sweet",
"chocolate",
"fine"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
# Dataframe that contains the column numbers and names, which will be used to create a summary of what rows need to be changed for each column.
df.index <-
structure(list(
number = c("1.1.", "1.2.", "1.3."),
name = c("name",
"number", "category")
),
class = "data.frame",
row.names = c(NA, -3L))
What I have tried
obs <- "1.1."
na.index <- which(is.na(df$`1.1.`))
summary <- df.index %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index), NA))
# Check to see if there are any numeric values in this character column. Adding 6 just to have a duplicate for this example.
na.index2 <-
c(which(!is.na(as.numeric(
as.character(df$`1.1.`)
))), 6)
# Append new list from na.index2 to the existing list in row 1 (or 1.1.), and keep only the unique values, excluding NAs.
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(unique(na.omit(
unlist(append(rows, list(na.index2)))
))), NA))
# Column 1.2. in df.
obs <- "1.2."
na.index3 <- which(df$`1.2.` > 2)
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index3), rows))
na.index4 <- which(df$`1.2.` == 2)
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(unique(na.omit(
unlist(append(rows[2], list(na.index4)))
))), rows))
# Column 1.3. in df.
obs <- "1.3."
na.index5 <- which(df$`1.3.` == "okay")
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index5), rows))
Output (which is also the expected output)
summary
number name rows
1 1.1. name 4, 6, 7, 5
2 1.2. number 6, 7, 2, 3
3 1.3. category 4
I get all of the correct rows in the example above, but there has to be a much simpler way to do this, and without having to create obs and having to specify the row number (e.g., rows[2]) when appending a list.
As you can see, not every column has the same error checks. So, I'm hoping to have an easy way to add a list to the rows column in summary as I go through similar checks for each category (like 1.2., 1.3., etc.), as well as being able to append additional lists (like shown here).