Apply "any" or "all" function row-wise to arbitrary number of Boolean columns in Julia DataFrames.jl

Question

Suppose I have a dataframe with multiple boolean columns representing certain conditions:

df = DataFrame(
         id = ["A", "B", "C", "D"], 
         cond1 = [true, false, false, false], 
         cond2 = [false, false, false, false], 
         cond3 = [true, false, true, false]
)

	id	cond1	cond3
1	A	1	1
2	B	0	0
3	C	0	1
4	D	0	0

Now suppose I want to identify rows where any of these conditions are true, ie "A" and "C". It is easy to do this explicitly:

df[:, :all] = df.cond1 .| df.cond2 .| df.cond3

But how can this be done when there are an arbitrary number of conditions, for example something like:

df[:, :all] = any.([ df[:, Symbol("cond$i")] for i in 1:3 ])

The above fails with DimensionMismatch("tried to assign 3 elements to 4 destinations") because the any function is being applied column-wise, rather than row-wise. So the real question is: how to apply any row-wise to multiple Boolean columns in a dataframe?

The ideal output should be:

	id	cond1	cond3	all
1	A	1	1	1
2	B	0	0	0
3	C	0	1	1
4	D	0	0	0

score 0 · Accepted Answer · answered Feb 21 '22 at 15:40

Here is one way to do it:

julia> df = DataFrame(
                id = ["A", "B", "C", "D", "E"],
                cond1 = [true, false, false, false, true],
                cond2 = [false, false, false, false, true],
                cond3 = [true, false, true, false, true]
       )
5×4 DataFrame
 Row │ id      cond1  cond2  cond3
     │ String  Bool   Bool   Bool
─────┼─────────────────────────────
   1 │ A        true  false   true
   2 │ B       false  false  false
   3 │ C       false  false   true
   4 │ D       false  false  false
   5 │ E        true   true   true

julia> transform(df, AsTable(r"cond") .=> ByRow.([maximum, minimum]) .=> [:any, :all])
5×6 DataFrame
 Row │ id      cond1  cond2  cond3  any    all
     │ String  Bool   Bool   Bool   Bool   Bool
─────┼───────────────────────────────────────────
   1 │ A        true  false   true   true  false
   2 │ B       false  false  false  false  false
   3 │ C       false  false   true   true  false
   4 │ D       false  false  false  false  false
   5 │ E        true   true   true   true   true

Note that it is quite fast even for very wide tables:

julia> df = DataFrame(rand(Bool, 10_000, 10_000), :auto);

julia> @time transform(df, AsTable(r"x") .=> ByRow.([maximum, minimum]) .=> [:any, :all]);
  0.059275 seconds (135.41 k allocations: 103.038 MiB)

In the examples I have used a regex column selector, but of course you can use any row selector you like.

Thanks. One way I just thought of which also works is: ```df[:, :all] = vec(any(Matrix(select(df, r"cond")), dims=2))```, but it is perhaps not as idiomatic as your solution. — jsinai, Feb 21 '22 at 15:45
It works, but for large data frames it will be slower and allocate much more memory (in my large test - it is 8x smaller and allocates 200x more memory). — Bogumił Kamiński, Feb 21 '22 at 15:55

Apply "any" or "all" function row-wise to arbitrary number of Boolean columns in Julia DataFrames.jl

1 Answers1