Right now, I'm just converting the df from a Snowpark df to a Pandas one and then removing columns with all 0s:
df = df.to_pandas()
df.loc[:, (df != 0).any(axis=0)]
Any way to do this without first converting to pandas?
Right now, I'm just converting the df from a Snowpark df to a Pandas one and then removing columns with all 0s:
df = df.to_pandas()
df.loc[:, (df != 0).any(axis=0)]
Any way to do this without first converting to pandas?
It is possible to achieve this effect in Snowpark:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, sum, sql_expr
def main(session: snowpark.Session): 
    df = session.create_dataframe([[0, 1, 2, 0], [0, 5, 0, 0], [0, 9, 9, 0], [0, 7, 1, 0]], \
                                  schema=["q", "x", "y", "z"])
    return df
/*
Q   X   Y   Z
0   1   2   0
0   5   0   0
0   9   9   0
0   7   1   0
*/
Using conditional aggregation to build column list that is later dropped:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, sum, sql_expr
def main(session: snowpark.Session): 
    df = session.create_dataframe([[0, 1, 2, 0], [0, 5, 0, 0], [0, 9, 9, 0], [0, 7, 1, 0]], \
                                  schema=["q", "x", "y", "z"])
    
    dropcols = []
    for colname in df.columns:
        if df.agg(sum(sql_expr("CASE WHEN {colname}=0 THEN 0 ELSE 1 END". \
                               format(colname=colname)))).collect()[0][0]==0:
            dropcols.append(colname)
            
    return df.drop(dropcols)
/*
X   Y
1   2
5   0
9   9
7   1
*/
Output:
Compact version:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, sum, sql_expr
def main(session: snowpark.Session): 
    df = session.create_dataframe([[0, 1, 2, 0], [0, 5, 0, 0], [0, 9, 9, 0], [0, 7, 1, 0]], \
                                  schema=["q", "x", "y", "z"])
    
    dropcols = [c for c in df.columns if df.agg(sum(sql_expr("CASE WHEN {c}=0 THEN 0 ELSE 1 END". \
                                                             format(c=c)))).collect()[0][0]==0 ]
    return df.drop(dropcols)