I'm constructing CSV text files containing hundreds of millions of lines. Each call to the record function forms a line of text and buffers it into a stringstream. Periodically, depending on the input to the record function, the buffered line(s) will either be written to file or discarded. I would guess that approximately 75% of the buffered lines end up being written to file most of the time.
So, what I'm really doing is forming a bunch of lines of text, deciding whether to throw them away or write them to a file, and then repeating over and over again many times.
Below is a simplified example of my code. Assume that CONDITION1 and CONDITION2 are just simple boolean expressions involving x, y, and z; they don't take significant time to evaluate. The code is really slow, and I can see a couple of reasons: the use of stringstreams in general, and the repeated calls to stringstream::str() and stringstream::str(const string&) in particular.
Question: how could I make this faster?
Note: I assume (or know) that using a std::string to hold a bunch of text would be faster, but I'm concerned about the additional conversions that would be needed in order to construct the text using double variables such as x. (In the real case, there are about 10 different double variables that get concatenated delimited by commas.)
std::ofstream outf;
stringstream ss;
// open outf
void record(const double x, const bool y, const int z) {
ss << x << ", ";
if(y) ss << "YES, ";
else ss << "NO, ";
ss << z << "\n";
if(CONDITION1) {
if(CONDITION2)
outf << ss.str();
ss.str(std::string());
}
}