I have a large dictionary that I want to iterate through to build a pyarrow table. The values of the dictionary are tuples of varying types and need to be unpacked and stored in separate columns in the final pyarrow table. I do know the schema ahead of time. The keys also need to be stored as a column. I have a method below to construct the table row by row - is there another method that is faster? For context, I want to parse a large dictionary into a pyarrow table to write out to a parquet file. RAM usage is less of a concern than CPU time. I'd prefer not to drop down to the arrow C++ API.
import pyarrow as pa
import random
import string 
import time
large_dict = dict()
for i in range(int(1e6)):
    large_dict[i] = (random.randint(0, 5), random.choice(string.ascii_letters))
schema = pa.schema({
        "key"  : pa.uint32(),
        "col1" : pa.uint8(),
        "col2" : pa.string()
   })
start = time.time()
tables = []
for key, item in large_dict.items():
    val1, val2 = item
    tables.append(
            pa.Table.from_pydict({
                    "key"  : [key],
                    "col1" : [val1],
                    "col2" : [val2]
                }, schema = schema)
            )
table = pa.concat_tables(tables)
end = time.time()
print(end - start) # 22.6 seconds on my machine
 
     
    