I have a huge JSON file (lots of smaller .log (JSON format) files combined together to a total of 8Gb), composed of multiple different objects (where every object takes a row). I want to read this file into a pandas dataframe. I am only interested in collecting the JSON entries for one specific object (this would drastically reduce the file size to read). Can this be done with pandas or python before reading in a dataframe?
My current code is as follows:
import pandas as pd
import glob
df = pd.concat([pd.read_json(f, encoding = "ISO-8859-1", lines=True) for f in glob.glob("logs/sample1/*.log")], ignore_index=True)
As you might imagine, this is very computationally heavy, and takes a lot of time to complete. Is there a way to process this before reading it in a dataframe?
Sample of Data:
{"Name": "1","variable": "value","X": {"nested_var": 5000,"nested_var2": 2000}}
{"Name": "2","variable": "value","X": {"nested_var": 1222,"nested_var2": 8465}}
{"Name": "2","variable": "value","X": {"nested_var": 123,"nested_var2": 865}}
{"Name": "1","variable": "value","X": {"nested_var": 5500,"nested_var2": 2070}}
{"Name": "2","variable": "value","X": {"nested_var": 985,"nested_var2": 85}}
{"Name": "2","variable": "value","X": {"nested_var": 45,"nested_var2": 77}}
I want to only read instances where name = 1