I have a very large text file(45GB). Each line of the text file contains two space separated 64bit unsigned integers as shown below.
4624996948753406865 10214715013130414417
4305027007407867230 4569406367070518418
10817905656952544704 3697712211731468838 ... ...
I want to read the file and perform some operations on the numbers.
My Code in C++:
void process_data(string str)
{
    vector<string> arr;
    boost::split(arr, str, boost::is_any_of(" \n"));
    do_some_operation(arr);
}
int main()
{
    unsigned long long int read_bytes = 45 * 1024 *1024;
    const char* fname = "input.txt";
    ifstream fin(fname, ios::in);
    char* memblock;
    while(!fin.eof())
    {
        memblock = new char[read_bytes];
        fin.read(memblock, read_bytes);
        string str(memblock);
        process_data(str);
        delete [] memblock;
    }
    return 0;
}
I am relatively new to c++. When I run this code, I am facing these problems.
- Because of reading the file in bytes, sometimes the last line of a block corresponds to an unfinished line in the original file("4624996948753406865 10214" instead of the actual string "4624996948753406865 10214715013130414417" of the main file). 
- This code runs very very slow. It takes around 6secs to run for one block operations in a 64bit Intel Core i7 920 system with 6GB of RAM. Is there any optimization techniques that I can use to improve the runtime? 
- Is it necessary to include "\n" along with blank character in the boost split function? 
I have read about mmap files in C++ but I am not sure whether it's the correct way to do so. If yes, please attach some links.
 
     
     
     
     
    