Suppose I have a dataframe with multiple boolean columns representing certain conditions:
df = DataFrame(
id = ["A", "B", "C", "D"],
cond1 = [true, false, false, false],
cond2 = [false, false, false, false],
cond3 = [true, false, true, false]
)
| id | cond1 | cond2 | cond3 | |
|---|---|---|---|---|
| 1 | A | 1 | 0 | 1 |
| 2 | B | 0 | 0 | 0 |
| 3 | C | 0 | 0 | 1 |
| 4 | D | 0 | 0 | 0 |
Now suppose I want to identify rows where any of these conditions are true, ie "A" and "C". It is easy to do this explicitly:
df[:, :all] = df.cond1 .| df.cond2 .| df.cond3
But how can this be done when there are an arbitrary number of conditions, for example something like:
df[:, :all] = any.([ df[:, Symbol("cond$i")] for i in 1:3 ])
The above fails with DimensionMismatch("tried to assign 3 elements to 4 destinations") because the any function is being applied column-wise, rather than row-wise. So the real question is: how to apply any row-wise to multiple Boolean columns in a dataframe?
The ideal output should be:
| id | cond1 | cond2 | cond3 | all | |
|---|---|---|---|---|---|
| 1 | A | 1 | 0 | 1 | 1 |
| 2 | B | 0 | 0 | 0 | 0 |
| 3 | C | 0 | 0 | 1 | 1 |
| 4 | D | 0 | 0 | 0 | 0 |