I have parquet dirs named like so:
parquetNames = [NAME1,NAME1_MS,NAME2, NAME2_MQ]
I want to load only the parquets in NAME1 and NAME2, but I'm having trouble with the negative lookahead and alternation. If I do:
s3BaseDir+'NAME*'
then as expected all parquet dirs are loaded. From here and here I could do a negative lookahead with alternation like so to avoid either full substrings "_MS" or "_MQ":
s3BaseDir+'NAME*(?!{_MS,_MQ})'
But I'm getting
AnalysisException: 'Path does not exist'.
It seems its taking the more complex regex literally.
Are negative lookaheads doable in pyspark spark.read.parquet? Is it possible to combine it with alternation too? How?