I'm trying to subset (retrieve a set of rows) a python pandas data frame by using pd.filter with a regex string to identify the columns of interest before performing a subset based on the values in those columns.
For example, this is my mock data frame:
id status status_drug_use drugA drugA_use drugB drugB_use
0 1 analgesic 0 None 1 hypertensive
1 0 analgesic 1 analgesic 1 hypertensive
2 0 analgesic 1 hypertensive 0 None
3 1 analgesic 0 None 1 analgesic
I would like all rows that contain the values in columns drugA_use or drugB_use which match the value in status_drug_use. As per the example, this would return the two rows:
id status status_drug_use drugA drugA_use drugB drugB_use
1 0 analgesic 1 analgesic 1 hypertensive
3 1 analgesic 0 None 1 analgesic
There are a few column name conventions to stick with:
status_drug_useis always there.- The matching columns (
drugA_useanddrugB_use) always follow the template<ANYTHING>_use.
Alteration
There is a second scenario, one in which I would like to perform a comparison between a user defined string eg analgesic and the two columns drugA_use and drugB_use. This is different from using the content of status_drug_use.