There are already several answers here. I'll summarize and also address interactions between them.
Typically,
std::cout and std::cerr will often be funneled into a single stream of text, so locking them in common results in the most usable program.
If you ignore the issue, cout and cerr by default alias their stdio counterparts, which are thread-safe as in POSIX, up to the standard I/O functions (C++14 §27.4.1/4, a stronger guarantee than C alone). If you stick to this selection of functions, you get garbage I/O, but not undefined behavior (which is what a language lawyer might associate with "thread safety," irrespective of usefulness).
However, note that while standard formatted I/O functions (such as reading and writing numbers) are thread-safe, the manipulators to change the format (such as std::hex for hexadecimal or std::setw for limiting an input string size) are not. So, one can't generally assume that omitting locks is safe at all.
If you choose to lock them separately, things are more complicated.
Separate locking
For performance, lock contention may be reduced by locking cout and cerr separately. They're separately buffered (or unbuffered), and they may flush to separate files.
By default, cerr flushes cout before each operation, because they are "tied." This would defeat both separation and locking, so remember to call cerr.tie( nullptr ) before doing anything with it. (The same applies to cin, but not to clog.)
Decoupling from stdio
The standard says that operations on cout and cerr do not introduce races, but that can't be exactly what it means. The stream objects aren't special; their underlying streambuf buffers are.
Moreover, the call std::ios_base::sync_with_stdio is intended to remove the special aspects of the standard streams — to allow them to be buffered as other streams are. Although the standard doesn't mention any impact of sync_with_stdio on data races, a quick look inside the libstdc++ and libc++ (GCC and Clang) std::basic_streambuf classes shows that they do not use atomic variables, so they may create race conditions when used for buffering. (On the other hand, libc++ sync_with_stdio effectively does nothing, so it doesn't matter if you call it.)
If you want extra performance regardless of locking, sync_with_stdio(false) is a good idea. However, after doing so, locking is necessary, along with cerr.tie( nullptr ) if the locks are separate.