I am trying to load JSON files that are too big for json.load.  I have spent a while looking into ijson and many stack overflow posts, and used the following code, mostly stolen from https://stackoverflow.com/a/58148422/11357695 :
def extract_json(filename):
    listJ=[]
    with open(filename, 'rb') as input_file:
        jsonobj = ijson.items(input_file, 'records.item', use_float=True)
        jsons = (o for o in jsonobj)
        for j in jsons:
            listJ.append(j)
    return listJ
My JSON file is read in as a dict, with 6 keys, one of which is 'records'.  The above function only replicates the contents of this 'records' key's value.  I looked into this a bit more and came to the conclusion that ijson.items uses a prefix ('records.item').  So it's not surprising it's only replicating this key's value.  But I'd like to get everything.
To achieve this, I looked at using ijson.parse to give a list of prefixes.  When I fed all of the prefixes made by the weird generator parser object below into ijson.items() using an iterative loop, I got a MemoryError pretty quickly from the json.items() statement.  I also got IncompleteJSONError in earlier iterations of the code, which does not appear with the current version.  However, if I remove the except ijson.IncompleteJSONError statement I get a Memory Error:
def loadBigJsonBAD(filename):
    with open(filename, 'rb') as input_file:
        parser = ijson.parse(input_file)
        prefixes=[]
        for prefix , event, value in parser:
            prefixes.append(prefix)
    listJnew=[]
    with open(filename, 'rb') as input_file:
        for prefix in prefixes:
            jsonobjn = ijson.items(input_file, prefix, use_float=True)
            try:
                jsonsn = (o for o in jsonobjn)
                for jn in jsonsn:
                    listJnew.append(jn)
            except ijson.IncompleteJSONError:
                continue
    return listJnew
I tried what would happen if I just searched for prefixes without 'record', to see if this would at least give me the rest of the dictionary.  However, it actually worked perfectly and made a list whose first object is the same as the object generated for json.load (which worked in this case as I was using a small file to test the code):
def loadBigJson(filename):
    with open(filename, 'rb') as input_file:
        parser = ijson.parse(input_file)
        prefixes=[]
        for prefix , event, value in parser:
            if prefix[0:len('records')] != 'records':
                prefixes.append(prefix)
    listJnew=[]
    with open(filename, 'rb') as input_file:
        for prefix in prefixes:
            jsonobjn = ijson.items(input_file, prefix, use_float=True)
            try:
                jsonsn = (o for o in jsonobjn)
                for jn in jsonsn:
                    listJnew.append(jn)
            except ijson.IncompleteJSONError:
                continue
    return listJnew
When this is tested:
path_json=r'C:\Users\u03132tk\.spyder-py3\antismashDB\GCF_010669165.1\GCF_010669165.1.json'
extractedJson=extract_json(path_json) #extracts the 'records' key value
loadedJson=json.load(open(path_json, 'r'))  #extracts entire json file
loadedJsonExtracted=loadedJson['records']   #the thing i am using to compare to the extractedJson item
bigJson=loadBigJson(path_json)  #a list whose single object is the same as loaded json. 
print (bigJson[0]==loadedJson)#True
print (bigJson[0]['records']==loadedJsonExtracted)#True
print (bigJson[0]['records']==extractedJson)#True
This is great, but it highlights that I don't really understand what's going on - why is the records prefix necessary for the the extract_json function (I tried the other keys in the json dictionary, there were no hits) but counterproductive for loadBigJson?  What is generating the Error statements and why does an except IncompleteJSONError statement prevent a MemoryError?
As you can tell I'm pretty unfamiliar with working with JSONs, so any general tips/clarifications would also be great.
Thanks for reading the novel, even if you don't have an answer!
Tim
 
    