The piece of code below tries to do the following:
For each customer_code in sdf1, check if this customer code appears in sdf2. If it does, replace the df1.actual_related_customer with the df2.actual_related_customer. 
This code is not working because I access my rows in df2 wrongly. How can I achieve the above goal? (if you have another suggestion than indices, shoot!) 
sdf1 = sqlCtx.createDataFrame(
    [
        ('customer1', 'customer_code1', 'other'),
        ('customer2', 'customer_code2', 'other'),
        ('customer3', 'customer_code3', 'other'),
        ('customer4', 'customer_code4', 'other')
    ],
    ('actual_related_customer', 'customer_code', 'other')
)
sdf2 = sqlCtx.createDataFrame(
    [
        ('Peter', 'customer_code1'),
        ('Deran', 'customer_code5'),
        ('Christopher', 'customer_code3'),
        ('Nick', 'customer_code4')
    ],
    ('actual_related_customer', 'customer_code')
)
def right_customer(x,y):
    for row in sdf2.collect() :
        if x == row['customer_code'] :
            return row['actual_related_customer']
    return y
fun1 = udf(right_customer, StringType())
test = sdf1.withColumn(
    "actual_related_customer",
    fun1(sdf1.customer_code, sdf1.actual_related_customer)
)
And my desired output would look like:
desired_output = sqlCtx.createDataFrame(
    [
        ('Peter', 'customer_code1', 'other'),
        ('customer2', 'customer_code2', 'other'),
        ('Christopher', 'customer_code3', 'other'),
        ('Nick', 'customer_code4', 'other')
    ],
    ('actual_related_customer', 'customer_code', 'other')
)
 
     
    