I am using pytorch Dataset class and Dataloader to load data. The class and loader looks like the following.
class Dataset(Dataset):
    def __init__(self):
        self.input_and_label   = json.load(open(path_to_large_json_file)) # Large file
        self.dataset_size      = len(self.input_and_label)
    def __getitem__(self, index): 
        # Convert to tensor
        input_data    = torch.LongTensor(self.input_and_label[index][0]) # Input X 
        label_data    = torch.LongTensor(self.input_and_label[index][1]) # Label y
        
        return input_data, label_data
    
    def __len__(self):
        return self.dataset_size
And the iterator is generated like,
train_loader = torch.utils.data.DataLoader(
    Dataset(),
    # Batch size
    batch_size = 8, # This is expected to be large, 8 is for trial -- didn't work
    shuffle = True,
    pin_memory = False #True 
)
The data-file is a large (json) file. But I am getting memory error as,
<RuntimeError: CUDA out of memory. Tried to allocate... ... ... >
Note:
The large json file content is list of numbers like,
    [0, 1 , 0, 0,..... 4000 numbers]  <-- this is the input_data
    [0, 2, 2, ... 50 numbers ]        <-- this is the label
So, probably batch size 8 (that means 8 such pairs), or 800 ... should not matter much
Can someone please help me, how can I get the iterator without loading the large file at once? Or, any other solution welcome. Thank you very much for your support.
 
    