I am working on creating an automated script to download files from a FTP and store them into BigQuery.
Problem is that BigQuery accepts only .csv files. For this reason I am looking for ways to handle also .xlsx and .xls files, conditional on that I am planning to put this batch code in the Cloud.
I am mentioning the latter as one way of going about to convert .xlsx files to .csv is to use something like:
import pandas as pd
data_xls = pd.read_excel('file_on_ftp.xlsx')
data_xls.to_csv('csvfile.csv', encoding='utf-8', index=False)
However, this will create local files somewhere in the temporary storage. Obviously with Cloud Functions I have to then monitor whether the file has been deleted afterwards or not, making it not reliable when one cloud function would Crash.
Are there therefore better ways to handle .xlsx loading into BigQuery? Or is this the way to go?