My pyspark version is 2.2.0. I came to a strange problem. I try to simplify it as the following. The files structure:
|root
|-- cast_to_float.py
|-- tests
    |-- test.py
In cast_to_float.py, my code:
from pyspark.sql.types import FloatType
from pyspark.sql.functions import udf
def cast_to_float(y, column_name):
    return y.withColumn(column_name, y[column_name].cast(FloatType()))
def cast_to_float_1(y, column_name):
    to_float = udf(cast2float1, FloatType())
    return y.withColumn(column_name, to_float(column_name))
def cast2float1(a):
    return 1.0
In test.py:
from pyspark.sql import SparkSession
import os
import sys
parentPath = os.path.abspath('..')
if parentPath not in sys.path:
    sys.path.insert(0, parentPath)
from cast_to_float import *
spark = SparkSession.builder.appName("tests").getOrCreate()
df = spark.createDataFrame([
            (1, 1),
            (2, 2),
            (3, 3),
        ], ["ID", "VALUE"])
df1 = cast_to_float(df, 'ID')
df2 = cast_to_float_1(df, 'ID')
df1.show()
df1.printSchema()
df2.printSchema()
df2.show()
Then I run the test in tests folder, I get the error message, which is from the last line, saying:
+---+-----+
| ID|VALUE|
+---+-----+
|1.0|    1|
|2.0|    2|
|3.0|    3|
+---+-----+
root
 |-- ID: float (nullable = true)
 |-- VALUE: long (nullable = true)
root
 |-- ID: float (nullable = true)
 |-- VALUE: long (nullable = true)
    Py4JJavaError                             Traceback (most recent call last)
<ipython-input-4-86eb5df2f917> in <module>()
     19 df1.printSchema()
     20 df2.printSchema()
---> 21 df2.show()
...
Py4JJavaError: An error occurred while calling o257.showString.
...
ModuleNotFoundError: No module named 'cast_to_float'
...
It seems the cast_to_float is imported, otherwise, I cannot get df1 even.
If I put test.py in the same directory of cast_to_float.py, and run it in that directory, then it's OK. Any ideas? Thanks!
I used @user8371915 __file__ method, and found it's OK if I ran it in root folder.
 
    