In C terms:
For a simple type like int, aligned load and store functions could look like this:
int load(int *p) { return *p; }
void store(int *p, int val) { *p = val; }
(You'd actually use memcpy to get unaligned and strict-aliasing-safe loads and stores.)
__m128i load/store functions mostly exist to communicate aligned vs. unaligned to the compiler, vs. dereferencing __m128i* directly. And for float / double, they also avoid casts because _mm_loadu_ps takes a const float* arg.
Later Intel intrinsics take void* args, avoiding the need for a _mm_loadu_si128((const __m128i*)&my_struct) or whatever, but unfortunately they didn't make that improvement until AVX-512 intrinsics.
In asm terms, a load reads data from memory into a register (or as a source operand for an ALU instruction). A store writes data to memory.
C local variables are normally kept in registers, but of course your compiler is free to optimize intrinsic loads/stores the same way it can optimize dereferences of an int *. e.g. it might optimize away a store/reload so the asm wouldn't contain an instruction to do that.
The fact that there are load and store intrinsics does not mean that __m128i "is a register". It's like int; if/when it can be kept in a register, the compiler will do so, but you can make an array of __m128i or whatever. load/store intrinsics can be optimized away, or a load can be folded into a memory source operand for an ALU instruction like vpaddb.
Related: