Compilers are really smart about such simple arithmetic and bitwise operations. They don't do that simply because they can't, as there are no such instructions on those architectures. It doesn't worth wasting valuable opcode space for rarely used operations like that. Most operations are done in the whole register anyway, and working on just a part of a register is very inefficient for the CPU, because the out-of-order execution or register renaming units will need to work a lot harder. That's the reason why x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register, or why modifying the low part of the register in x86 (like AL or AX) can be slower than modifying the whole RAX. INC can also be slower than ADD 1 because of the partial flag update
That said, there are architectures that can do combined SHIFT and XOR in a single instruction like ARM (eor rX, rX, rY, lsl #24), because ARM designers spent a big part of the instruction encoding for the predication and shifting part, trading off for a smaller number of registers. But again your premise is wrong, because the fact that something can be executed in a single instruction doesn't mean that it'll be faster. Modern CPUs are very complex because every instruction has different latency, throughput and the number of execution ports. For example if a CPU can execute 4 pairs of SHIFT-then-XOR in parallel then obviously it'll be faster than another CPU that can run 4 single SHIFT-XOR instructions sequentially, provided that clock cycle is the same
This is a very typical XY problem, because what you thought is simply the wrong way to do. For operations that need to be done thousands, millions of times or more then it's the job of the GPU or the SIMD unit
For example this is what the Clang compiler emits for a loop XORing the top byte of i with c on an x86 CPU with AVX-512
vpslld  zmm0, zmm0, 24                             # shift
vpslld  zmm1, zmm1, 24
vpslld  zmm2, zmm2, 24
vpslld  zmm3, zmm3, 24
vpxord  zmm0, zmm0, zmmword ptr [rdi + 4*rdx]      # xor
vpxord  zmm1, zmm1, zmmword ptr [rdi + 4*rdx + 64]
vpxord  zmm2, zmm2, zmmword ptr [rdi + 4*rdx + 128]
vpxord  zmm3, zmm3, zmmword ptr [rdi + 4*rdx + 192]
By doing that it achieves 16 SHIFT-and-XOR operations with just 2 instruction. Imagine how fast that is. You can unroll more to 32 zmm registers to achieve even higher performance until you saturate the RAM bandwidth. That's why all high-performance architectures have some kind of SIMD which is easier to do fast parallel operations, rather than a useless SHIFT-XOR instruction. Even on ARM with a single-instruction SHIFT-XOR then the compiler will be smart enough to know that SIMD is faster than a series of eor rX, rX, rY, lsl #24. The output is like this
shl     v3.4s, v3.4s, 24       # shift
shl     v2.4s, v2.4s, 24
shl     v1.4s, v1.4s, 24
shl     v0.4s, v0.4s, 24
eor     v3.16b, v3.16b, v7.16b # xor
eor     v2.16b, v2.16b, v6.16b
eor     v1.16b, v1.16b, v4.16b
eor     v0.16b, v0.16b, v5.16b
Here's a demo for the above snippets
That'll be even faster when running in parallel in multiple cores. The GPU is also able to do very high level or parallelism, hence modern cryptography and intense mathematical problems are often done on the GPU. It can break a password or encrypt a file faster than a general purpose CPU with SIMD