It is not a full bit trick.  Any zero yields a zero product, which gives a zero result.  Negate 0 yields a 1.   Does not deal with overflow.
bool test(int a, int b, int c, int d)
{
    return !((a^d)*(b^d)*(c^d));
}
gcc 7.1 -O3 output.  (d is in ecx, the other inputs start in other integer regs).
    xor     edi, ecx
    xor     esi, ecx
    xor     edx, ecx
    imul    edi, esi
    imul    edx, edi
    test    edx, edx
    sete    al
    ret
It might be faster than the original on Core2 or Nehalem where partial-register stalls are a problem.  imul r32,r32 has 3c latency on Core2/Nehalem (and later Intel CPUs), and 1 per clock throughput, so this sequence has 7 cycle latency from the inputs to the 2nd imul result, and another 2 cycles of latency for test/sete.   Throughput should be fairly good if this sequence runs on multiple independent inputs.
Using a 64-bit multiply would avoid the overflow problem on the first multiply, but the second could still overflow if the total is >= 2**64.  It would still be the same performance on Intel Nehalem and Sandybridge-family, and AMD Ryzen.  But it would be slower on older CPUs.
In x86 asm, doing the second multiply with a full-multiply one-operand mul instruction (64x64b => 128b) would avoid overflow, and the result could be checked for being all-zero or not with or rax,rdx.  We can write that in GNU C for 64-bit targets (where __int128 is available)
bool test_mulwide(unsigned a, unsigned b, unsigned c, unsigned d)
{
    unsigned __int128 mul1 = (a^d)*(unsigned long long)(b^d);
    return !(mul1*(c^d));
}
and gcc/clang really do emit the asm we hoped for (each with some useless mov instructions):
   # gcc -O3 for x86-64 SysV ABI
    mov     eax, esi
    xor     edi, ecx
    xor     eax, ecx
    xor     ecx, edx   # zero-extends
    imul    rax, rdi
    mul     rcx        # 64 bit inputs (rax implicit), 128b output in rdx:rax
    mov     rsi, rax   # this is useless
    or      rsi, rdx
    sete    al
    ret
This should be almost as fast as the simple version that can overflow, on modern x86-64.  (mul r64 is still only 3c latency, but 2 uops instead of 1 for imul r64,r64 that doesn't produce the high-half), on Intel Sandybridge-family.)
It's still probably worse than clang's setcc/or output from the original version, which uses 8-bit or instructions to avoid reading 32-bit registers after writing the low byte (i.e. no partial-register stalls).
See both sources with both compilers on the Godbolt compiler explorer.  (Also included: @BeeOnRope's ^ / & version that risks false positives, with and without a fallback to a full check.)