How to read multiple CSV files with different columns and file path names and make a single dataframe.
Here is an example just 2 paths. However there are many paths based on frn and filename.
"wasbs://containername@accountname.blob.core.windows.net/blobname/frn=xyz123/filename=file1/myfile34.csv"
"wasbs://containername@accountname.blob.core.windows.net/blobname/frn=abc567/filename=file2/myfile84.csv"
Columns can be concatenated and filled with NA if there is no value present.
Adding new columns based on
frnand filename as mentioned in path with respective dataframe of CSV file.In each
filenamefolder there is only one single CSV is present but there is multiplefilenamefolder under each singlefrnfolder.
For example:
myfile34.csv
a b frn filename
0 3 4 xyz123 file1
1 4 5 xyz123 file1
2 5 6 xyz123 file1
myfile84.csv
a c frn filename
0 1 3 abc567 file2
1 2 4 abc567 file2
2 3 5 abc567 file2
final df
a b c frn filename
0 3 4 NA xyz123 file1
1 4 5 NA xyz123 file1
2 5 6 NA xyz123 file1
3 1 NA 3 abc567 file2
4 2 NA 4 abc567 file2
5 3 NA 5. abc567 file2
I tried this :
import databricks.koalas as ks
path = "wasbs://containername@accountname.blob.core.windows.net/blobname/*/*/"
df = ks.read_csv(path, sep="','",header='infer')
but how can i concat these different csv files and create new columns (like frn and filename) as well?

