I want to join 2 pyspark dfs, where df_template has all the columns and rows that I need in the output and df_proc has data in it for some (but not all) of the row/column combinations in df_template. The code I'm using is:
df_blend = df_template.join(df_proc, ["metro_area"],"left").select(df_template["*"])
But all this returns is the original df_template:
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501|       null|       null|       null|       null|       null|       null|
| A10000502|       null|       null|       null|       null|       null|       null|
| A10000503|       null|       null|       null|       null|       null|       null|
| A10000504|       null|       null|       null|       null|       null|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
This seems pretty basic but I just can't figure out how to get my desired result, any suggestions ??? Here is what I want the output to look like ...
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501|       1455|         26|         19|         65|         38|       null|
| A10000502|        654|       1876|       1950|        886|       null|       null|
| A10000503|       null|       null|       null|       null|       null|       null|
| A10000504|        774|        854|       1012|        271|       null|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
For reference, here are the original dataframes. df_template
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000501|       null|       null|       null|       null|       null|       null|
| A10000502|       null|       null|       null|       null|       null|       null|
| A10000503|       null|       null|       null|       null|       null|       null|
| A10000504|       null|       null|       null|       null|       null|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
df_proc
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
|metro_area| option_001| option_002| option_003| option_004| option_005| option_006|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+
| A10000502|        654|       1876|       1950|        886|       null|       null|
| A10000504|        774|        854|       1012|        271|       null|       null|
| Al0000501|       1455|         26|         19|         65|         38|       null|
+----------+-----------+-----------+-----------+-----------+-----------+-----------+