This program:
#include <iostream>
#include <cstdlib>
#include <string>
int main(int argc, const char *argv[])
{
   using ::std::cerr;
   using ::std::cout;
   using ::std::endl;
   if (argc < 2 || argc > 3) {
      cerr << "Usage: " << argv[0] << " [<count>] <message>\n";
      return 1;
   }
   unsigned long count = 10000;
   if (argc > 2) {
      char *endptr = 0;
      count = ::std::strtoul(argv[1], &endptr, 10);
      if ((argv[1][0] == '\0') || (*endptr != '\0')) {
         cerr << "Usage: " << argv[0] << " [<count>] <message>\n";
         return 1;
      }
   }
   const ::std::string msg((argc < 3) ? argv[1] : argv[2]);
   for (unsigned long i = 0; i < count; ++i) {
      cout << i << ": " << msg << '\n';
   }
   return 0;
}
when timed like so:
$ time ./joe 10000000 fred >/dev/null
real  0m15.410s
user  0m10.551s
sys   0m0.166s
takes 15.4 seconds of real time to execute.  Replace the output line with this: cout << i << ": " << msg << endl; and you end up with something like this:
$ time ./joe 10000000 fred >/dev/null
real  0m39.115s
user  0m16.482s
sys   0m15.803s
As you can see, the time to run more than doubles, and the program goes from spending minimal time in the OS to spending nearly half of it's time in the OS.
Both versions of the program have identical output, and are guaranteed by the standard to have identical output on every platform.
Given this, why do people persist in using endl as a synonym for '\n'?
Edit: In case it isn't obvious, this question is intended to be a leading question and is here for instructional purposes. I know why the performance penalty exists.