I am working on a cycle-accurate simulator for a research architecture.  I already have a cross-compiler that generates assembly (based on MIPS).  For debug purposes, we have a printf intrinsic which ultimately, when run in the simulator, calls a builtin method that has access to a list of arguments packed in a contiguous array (such as would be created by this code):
template <typename type> inline static void insert(char buffer[], size_t* i, type value) {
    memcpy(buffer+*i,&value, sizeof(type)); *i+=sizeof(type);
}
int main(int /*argc*/, char* /*argv*/[]) {
    char buffer[512]; size_t i=0;
    insert<double>(buffer,&i, 3.14);
    insert<int>(buffer,&i, 12345);
    insert<char const*>(buffer,&i, "Hello world!");
    return 0;
}
In MSVC, one can then create a va_list and call vprintf like so:
union { va_list list; char* arguments; } un;
un.arguments = buffer;
vprintf(format_string, un.list);
The target architecture is x86-64, which is based on x86, so this produces apparently correct results (the va_list provided by MSVC is just a typedef for char*).
However, on g++ (and presumably Clang; I haven't tried), the code segfaults.  This happens because the underlying type (it's compiler-provided: in gcc 4.9.2, it appears to be typedefed from __gnuc_va_list, which is in turn typedefed from __builtin_va_list, presumably a compiler intrinsic) is different (as the compiler error you get it you just go un.list=buffer; forbodes).
My question is: what is the cleanest way to convert this array of packed arguments into a va_list that is usable by both g++ and Clang in x86-64 mode?
My current thinking is that it may be better to parse out each format specifier individually, then forward it off with the appropriate argument to printf.  This isn't as robust (in the sense of supporting all features of printf; working on a single architecture only is robust enough for our purposes), nor is it particularly compelling, though.