I have created data frame like below:
from pyspark.sql import Row
l = [('Ankit','25','Ankit','Ankit'),('Jalfaizy','2.2','Jalfaizy',"aa"),('saurabh','230','saurabh',"bb"),('Bala','26',"aa","bb")]
rdd = sc.parallelize(l)
people = rdd.map(lambda x: Row(name=x[0], ages=x[1],lname=x[2],mname=x[3]))
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.show()
+----+--------+-----+--------+
|ages|   lname|mname|    name|
+----+--------+-----+--------+
|  25|   Ankit|Ankit|   Ankit|
| 2.2|Jalfaizy|   aa|Jalfaizy|
| 230| saurabh|   bb| saurabh|
|  26|      aa|   bb|    Bala|
+----+--------+-----+--------+
I want find each column avg length for all comuns i.e below my expected output.i.e total number of character in particular column/ number of rows
+----+--------+-----+--------+
|ages|   lname|mname|    name|
+----+--------+-----+--------+
|2.5 | 5.5    | 2.75 |  6    |
+----+--------+-----+--------+