Context
I'm working on a DataFrame df with lots of columns filled with numerical values
df
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2
150.0        |     3.14    |  ...  | 1.008
By another mean, I have a list_cols of columns:
list_cols = ['lorem ipsum', 'dolor sic', ... ]  # arbitrary length, of course len(list_cols ) <= len(df.columns), and contains valid columns of my df
I want to obtain 2 dataframes :
- 1 that contains all rows where value < 0for at least one oflist_cols(corresponds to aOR). let's call itnegative_values_matches
- 1 that corresponds to the remaining of dataframe, lets call it positive_values_matches
Expected result example
for list_cols = ['lorem ipsum', 'dolor sic'], I shall obtain dataframes were at least 1 value in list_cols is strictly negative:
negative_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2
positive_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
150.0        |     3.14    |  ...  | 1.008
I don't want to write myslef this kind of code:
negative_values_matches = df[ (criterion1 | criterion2 | ... | criterionn)]
positive_values_matches = df[~(criterion1 | criterion2 | ... | criterionn)]
(where criterionk is a boolean evaluation for column k such as for instance: (df[col_k]>=0), parenthesis intended here since its the Pandas syntax)
The idea is to have a programmatic approach. I'm mainly looking for an array of booleans, so I can then use Boolean indexing (see Pandas documentation).
As far as I can tell, these posts are not exactly what I am talking about:
- Filtering DataFrame on multiple conditions in Pandas
- Drop rows on multiple conditions in pandas dataframe
- Pandas: np.where with multiple conditions on dataframes
- Pandas DataFrame : How to select rows on multiple conditions? This one is a little bit closer to what I am looking for. However, it relies on generating a string that might not work with "exotic" column names (spaces) (or at least I don't know how to do it)
I can't figure out how to chain the booleans evaluations on my DataFrame altogether with ORoperator anbd obtain the correct rows splitting.
What can I do ?
