I am working with large binary files (aprox 2 Gb each) that contain raw data. These files have a well defined structure, where each file is an array of events, and each event is an array of data banks. Each event and data bank have a structure (header, data type, etc.).
From these files, all I have to do is extract whatever data I might need, and then I just analyze and play with  the data. I might not need all of the data, sometimes I just extract XType data, other just YType, etc.
I don't want to shoot myself in the foot, so I am asking for guidance/best practice on how to deal with this. I can think of 2 possibilities:
Option 1
- Define a 
DataBankclass, this will contain the actual data (std::vector<T>) and whatever structure this has. - Define a 
Eventclass, this has astd::vector<DataBank>plus whatever structure. - Define a 
MyFileclass, this is astd::vector<Event>plus whatever structure. 
The constructor of MyFile will take a std:string (name of the file), and will do all the heavy lifting of reading the binary file into the classes above.
Then, whatever I need from the binary file will just be a method of the MyFile class; I can loop through Events, I can loop through DataBanks, everything I could need is already in this "unpacked" object.
The workflow here would be like:
int main() {
    MyFile data_file("data.bin");
    std::vector<XData> my_data = data_file.getXData();
    \\Play with my_data, and never again use the data_file object
    \\...
    return 0;
}
Option 2
- Write functions that take 
std::stringas an argument, and extract whatever I need from the file e.g.std::vector<XData> getXData(std::string),int getNumEvents(std::string), etc. 
The workflow here would be like:
int main() {
    std::vector<XData> my_data = getXData("data.bin");
    \\Play with my_data, and I didn't create a massive object
    \\...
    return 0;
}
Pros and Cons that I see
Option 1 seems like a cleaner option, I would only "unpack" the binary file once in the MyFile constructor. But I will have created a huge object that contains all the data from a 2 Gb file, which I will never use. If I need to analyze 20 files (each of 2 Gb), will I need 40 Gb of ram? I don't understand how these are handled, will this affect performance?
Option number 2 seems to be faster; I will just extract whatever data I need, and that's it, I won't "unpack" the entire binary file just to later extract the data I care about. The problem is that I will have to deal with the binary file structure in every function; if this ever changes, that will be a pain. I will only create objects of the data I will play with.
As you can see from my question, I don't have much experience with dealing with large structures and files. I appreciate any advice.