I'd like to post my re-re-inventing version based on @FredOverflow's. I made the following modifications.
fix: 
- Rhs of operator<<should be ofconstreference type. In @FredOverflow's code,h.x >>= 4changes outputh, which is surprisingly not compatible with standard library, and typeTis requared to be copy-constructable.
- Assume only CHAR_BITSis a multiple of 4. @FredOverflow's code assumescharis 8-bits, which is not always true, in some implementations on DSPs, particularly, it is not uncommon thatcharis 16-bits, 24-bits, 32-bits, etc.
improve:
- Support all other standard library manipulators available for integral types, e.g. std::uppercase. Because format output is used in_print_byte, standard library manipulators are still available.
- Add hex_septo print separate bytes (note that in C/C++ a 'byte' is by definition a storage unit with the size ofchar). Add a template parameterSepand instantiate_Hex<T, false>and_Hex<T, true>inhexandhex_seprespectively.
- Avoid binary code bloat. Function _print_byteis extracted out ofoperator<<, with a function parametersize, to avoid instantiation for differentSize.
More on binary code bloat:
As mentioned in improvement 3, no matter how extensively hex and hex_sep is used, only two copies of (nearly) duplicated function will exits in binary code: _print_byte<true> and _print_byte<false>. And you might realized that this duplication can also be eliminated using exactly the same approach: add a function parameter sep. Yes, but if doing so, a runtime if(sep) is needed. I want a common library utility which may be used extensively in the program, thus I compromised on the duplication rather than runtime overhead. I achieved this by using compile-time if: C++11 std::conditional, the overhead of function call can hopefully be optimized away by inline.
hex_print.h:
namespace Hex
{
typedef unsigned char Byte;
template <typename T, bool Sep> struct _Hex
{
    _Hex(const T& t) : val(t)
    {}
    const T& val;
};
template <typename T, bool Sep>
std::ostream& operator<<(std::ostream& os, const _Hex<T, Sep>& h);
}
template <typename T>  Hex::_Hex<T, false> hex(const T& x)
{ return Hex::_Hex<T, false>(x); }
template <typename T>  Hex::_Hex<T, true> hex_sep(const T& x)
{ return Hex::_Hex<T, true>(x); }
#include "misc.tcc"
hex_print.tcc:
namespace Hex
{
struct Put_space {
    static inline void run(std::ostream& os) { os << ' '; }
};
struct No_op {
    static inline void run(std::ostream& os) {}
};
#if (CHAR_BIT & 3) // can use C++11 static_assert, but no real advantage here
#error "hex print utility need CHAR_BIT to be a multiple of 4"
#endif
static const size_t width = CHAR_BIT >> 2;
template <bool Sep>
std::ostream& _print_byte(std::ostream& os, const void* ptr, const size_t size)
{
    using namespace std;
    auto pbyte = reinterpret_cast<const Byte*>(ptr);
    os << hex << setfill('0');
    for (int i = size; --i >= 0; )
    {
        os << setw(width) << static_cast<short>(pbyte[i]);
        conditional<Sep, Put_space, No_op>::type::run(os);
    }
    return os << setfill(' ') << dec;
}
template <typename T, bool Sep>
inline std::ostream& operator<<(std::ostream& os, const _Hex<T, Sep>& h)
{
    return _print_byte<Sep>(os, &h.val, sizeof(T));
}
}
test:
struct { int x; } output = {0xdeadbeef};
cout << hex_sep(output) << std::uppercase << hex(output) << endl;
output:
de ad be ef DEADBEEF