From the pyspark docs, wholeTextFiles():
Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.
So your code:
files = sc.wholeTextFiles ("file:///data/*/*/")
creates an rdd which contains records of the form:
(file_name, file_contents)
Getting the contents of the files is then just a simple map operation to get the second element of this tuple:
message = files.map(lambda x: x[1])
message is now another rdd that contains only the file contents.
More relevant information about wholeTextFiles() and how it differs from textFile() can be found at this post.