I'm a bit late but...
#include <cstdint>
#include <cstdio>
#include <tuple>
template<uint64_t data, int8_t test_bit= sizeof(data)-1>
struct getMinimalByteSize{
    using type= typename std::conditional< (bool)(data & (uint64_t)0xFFL << (test_bit*8)),
        typename std::tuple_element_t<test_bit, std::tuple<uint8_t, uint16_t, uint32_t, uint32_t, uint64_t, uint64_t, uint64_t, uint64_t>>,
        typename getMinimalByteSize<data, test_bit - 1>::type>::type;};
template<uint64_t data>
struct getMinimalByteSize<data, -1>{using type = uint64_t;};
int main()
{
  static_assert(sizeof(getMinimalByteSize<0x0>::type)==8);
  static_assert(sizeof(getMinimalByteSize<0xFF>::type)==1);
  static_assert(sizeof(getMinimalByteSize<0xFFF>::type)==2);
  static_assert(sizeof(getMinimalByteSize<0xFFFFF>::type)==4);
  static_assert(sizeof(getMinimalByteSize<0xFFFFFFFFF>::type)==8);
}
The difference with all the other methods is on the testing. Instead of testing if the value is bigger than the biggest number possible given N bits, it goes byte for byte, testing if it is the last (most significant) non zero byte. If it is, then this is the minimal number of bits needed. Lastly we use a hand made list to fix the fact that there are not 24, 48, 56 bit integers defined in C++.
This is how this template metaprogram would look as a simple C function:
#include <stddef.h>
int tuple_element_t[]={8,16,32,32,64,64,64,64,64};
int getMinimalByteSize(uint64_t data, int8_t first_hi_byte = sizeof(data)-1){
    if (!data) return 0;
    /* Does the N bit of test is set? If so, we are done*/
    if (data &  (uint64_t)0xFFL << (first_hi_byte*8))
        return tuple_element_t[first_hi_byte];
    else/*Else, we tray with the next bit*/
        return getMinimalByteSize(data, first_hi_byte-1);}
Don't worry if you don't see it the first time, give yourself time . I've being working on AVRs for more than 10 years, in a platform where every byte counts. If you understand it in less than those 10 years, you already beat my.