I get a very large JSON stream (several GB) from curl and try to process it with jq.
The relevant output I want to parse with jq is packed in a document representing the result structure:
{
"results":[
{
"columns": ["n"],
// get this
"data": [
{"row": [{"key1": "row1", "key2": "row1"}], "meta": [{"key": "value"}]},
{"row": [{"key1": "row2", "key2": "row2"}], "meta": [{"key": "value"}]}
// ... millions of rows
]
}
],
"errors": []
}
I want to extract the row data with jq. This is simple:
curl XYZ | jq -r -c '.results[0].data[0].row[]'
Result:
{"key1": "row1", "key2": "row1"}
{"key1": "row2", "key2": "row2"}
However, this always waits until curl is completed.
I played with the --stream option which is made for dealing with this. I tried the following command but is also waits until the full object is returned from curl:
curl XYZ | jq -n --stream 'fromstream(1|truncate_stream(inputs)) | .[].data[].row[]'
Is there a way to 'jump' to the data field and start parsing row one by one without waiting for closing tags?