I was following this answer but after some discussion with it's writer, it seems it only gives a solution to orient='records' data format.
This is the difference:
# orient='records'
[
    {"Product":"Desktop Computer","Price":700},
    {"Product":"Tablet","Price":250},
    {"Product":"iPhone","Price":800},
    {"Product":"Laptop","Price":1200}
]
# orient='index'
{
    "0":{"Product":"Desktop Computer","Price":700},
    "1":{"Product":"Tablet","Price":250},
    "2":{"Product":"iPhone","Price":800},
    "3":{"Product":"Laptop","Price":1200}
}
I have the index format because my data is from an SQL database read into a dataframe and the index field is needed to specify every records.
My json file is 2.5 GB, had been exported from the dataframe with orient='index' format.
df.to_json('test.json', orient='index')
This means that the whole file is actually one huge string and not a list like collection of records:
{"0":{"Product":"Desktop Computer","Price":700},"1":{"Product":"Tablet","Price":250},"2":{"Product":"iPhone","Price":800},"3":{"Product":"Laptop","Price":1200}}
This means I can't use any line or chunck based iterative solution like this:
df = pd.read_json('test.json', orient='index', lines=True, chunksize=5)
According to the documentation, lines=True can only be used if the records are in a list like format, this is why pandas.DataFrame.to_json does not even accept this argument unless the orient is not orient='records'. The restriction for chunksize= comes from this as well, it says:
"This can only be passed if lines=True. If this is None, the file will be read into memory all at once."
And exactly this is the reason of the question, trying to read such a huge .json file gives back:
df = pd.read_json('test.json', orient='index')
File "C:\Users\Username\AppData\Local\Programs\Python\Python37\lib\site-
packages\pandas\io\json\_json.py", line 1100,
in _parse_no_numpy                                                                                          
loads(json, precise_float=self.precise_float),
MemoryError 
I was thinking about adding the index values as a first column as well, this case it wouldn't be lost with the records format; or maybe even store an index list separately. Only I fear it would decrease the search performance later on.
Is there any solution to handle the situation strictly using the .json file and no other database or big-data based technology?
Update #1
For request here is the actual structure of my data. The SQL table:
          Serial           Date                   PatientID     Type Gender  YearWeek
0         425571118001461E 2011-06-30 20:59:30    186092        3    1.0     2011-w26
1         425571118001461E 2011-06-30 20:55:30    186092        3    1.0     2011-w26
2         425571118001461E 2013-08-28 09:29:30    186092        3    1.0     2013-w35
3         425571118001461E 2013-08-28 07:44:30    186092        3    1.0     2013-w35
4         425571118001461E 2013-08-27 20:44:30    186092        3    1.0     2013-w35
...                    ...                 ...       ...      ...    ...         ...
32290281  4183116300254921 2020-04-09 08:07:50    217553        8    2.0     2020-w15
32290282  4183116300254921 2020-04-08 10:29:50    217553        8    2.0     2020-w15
32290283  4141119420031548 2020-04-20 10:18:02    217555       12    2.0     2020-w17
32290284  4141119420043226 2020-04-20 12:33:11    217560       12    NaN     2020-w17
32290285  4141119420000825 2020-04-20 17:31:44    217568       12    1.0     2020-w17
The pandas pivot table is almost the same as in the example, but with a 50,000 rows and 4,000 columns:
df = df.pivot_table(index='PatientID', values='Serial', columns='YearWeek', aggfunc=len, fill_value=0)
YearWeek  1969-w01  1969-w02  1969-w03  1969-w04  1969-w05  ...  2138-w17  2138-w18  2138-w19  2138-w20  2138-w21
PatientID
0                0         0         0         0         0  ...         0         0         0         0         0
455              1         0         3         0         0  ...         0         0         0         0         0
40036            0         0         0         0         0  ...         0         0         0         0         0
40070            0         0         0         0         0  ...         0         0         0         0         0
40082            0         0         0         0         0  ...         0         0         0         0         0
...            ...       ...       ...       ...       ...  ...       ...       ...       ...       ...       ...
217559           0         0         0         0         0  ...         0         0         0         0         0
217560           0         0         0         0         0  ...         0         0         0         0         0
217561           0         0         0         0         0  ...         0         0         0         0         0
217563           0         0         0         0         0  ...         0         0         0         0         0
217568           0         1         0         2         0  ...         0         0         0         0         0
And this is how it is saved with an index formatted json:
{
    "0":{"1969-w01":0,"1969-w02":0,"1969-w03":0,"1969-w04":0, ...},
    "455":{"1969-w01":1,"1969-w02":0,"1969-w03":3,"1969-w04":0, ...},
    "40036":{"1969-w01":0,"1969-w02":0,"1969-w03":0,"1969-w04":0, ...},
    ...
    "217568":{"1969-w01":0,"1969-w02":1,"1969-w03":0,"1969-w04":2, ...}
}
Only I could not give the line=True arg, so it is actually cramped into one huge string making it a one-liner json:
{"0":{"1969-w01":0,"1969-w02":0,"1969-w03":0,"1969-w04":0, ...},"455":{"1969-w01":1,"1969-w02":0,"1969-w03":3,"1969-w04":0, ...},"40036":{"1969-w01":0,"1969-w02":0,"1969-w03":0,"1969-w04":0, ...}, ... "217568":{"1969-w01":0,"1969-w02":1,"1969-w03":0,"1969-w04":2, ...}}
 
    