I am getting very strange error in PySpark and also in Synapse data flow.
I am reading JSON file with below query but getting duplicate column error even though there is no duplicate column. I can read it using other tools and JSON validator and also with data flow but not in PySpark.
PySpark query is as below:
df = (
    spark.read.option("multiline", "true")
    .options(encoding="UTF-8")
    .load(
        "abfss://<Container>]@<DIR>.dfs.core.windows.net/export28.json", format="json"
    )
)
This is stacktrace I get:
AnalysisException: Found duplicate column(s) in the data schema:
amendationcommentkey,amendationreasonkey,amendationregulatoryproofkeyTraceback (most recent call last):File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 204, in load return self._df(self._jreader.load(path))
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in call return_value = get_return_value(
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco raise converted from None
pyspark.sql.utils.AnalysisException: Found duplicate column(s) in the data schema:
amendationcommentkey,amendationreasonkey,amendationregulatoryproofkey
 
     
    