Trying to cast StringType to ArrayType of JSON for a dataframe generated form CSV.
Using pyspark on Spark2
The CSV file I am dealing with; is as follows -
date,attribute2,count,attribute3
2017-09-03,'attribute1_value1',2,'[{"key":"value","key2":2},{"key":"value","key2":2},{"key":"value","key2":2}]'
2017-09-04,'attribute1_value2',2,'[{"key":"value","key2":20},{"key":"value","key2":25},{"key":"value","key2":27}]'
As shown above, it contains one attribute "attribute3" in literal string, which is technically a list of dictionary(JSON) with exact length of 2. 
(This is the output of function distinct)
Snippet from the printSchema()
attribute3: string (nullable = true)
I am trying to cast the "attribute3" to ArrayType as follows
temp = dataframe.withColumn(
    "attribute3_modified",
    dataframe["attribute3"].cast(ArrayType())
)
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: __init__() takes at least 2 arguments (1 given)
Indeed, ArrayType expects datatype as argument. I tried with "json", but it did not work. 
Desired Output -
In the end, I need to convert attribute3 to ArrayType() or plain simple Python list. (I am trying to avoid use of eval)
How do I convert it to ArrayType, so that I can treat it as list of JSONs?
Am I missing anything here?
(The documentation,does not address this problem in straightforward way)
 
     
    