In other words, if 32-bit OS can only use 3.5 GB
A 32-bit OS can address 4GB of address space, however in that address space there are not only RAM but various shadowed memories and memory-mapped devices. In x86 port-mapped IOs are more common but still there are lots of memory-mapped things out there
For example some of that may be reserved for the COM and LPT ports, and some remaining would be kept for the network interfaces and graphics card. If you have a 1GB GPU on your system then the OS has only maximum 3GB of address space for its RAM. That's not even 3.5GB
On most RISC architectures memory-mapped IOs are more commonly used so 32-bit RISC OSes may address even fewer RAM
However that's not the entire story because many 32-bit architectures use more than 32 bits for the address, so a 32-bit OS running on those isn't necessarily limited to 4GB of RAM. For example in x86 there's PAE which allows OSes to use at least 36 bits for the address, which means they can address 64GB of RAM or more. The equivalent in ARM is LPAE
But only the OS can use such a huge amount of RAM. A 32-bit process running on that OS still uses 32-bit addresses, so can still address only 232 different memory locations, which are bytes in most modern architectures, so in theory they can access at most 4GB of RAM
In practice only a part of that belongs to the process, because the kernel reserves the remaining address space for themselves. On Windows the default split is 2GB/2GB (before Meltdown). That's why Adobe Premiere CS4 (which is still a 32-bit only version) spawns a process for each 2GB of RAM in the system in order to utilize all the available RAM in the OS
how does having an extra 32 bits help accessing the remaining 0.5 GB
The extra 32 bits help accessing a new space 232 times larger than the 4GB address space, so obviously it overcomes almost all the imaginable limits. 0.5GB is nothing in that space
Would upgrading my OS from 32-bit to 64-bit, while keeping my RAM unchanged at 4GB, help access remaining 0.5 GB that the 32-bit OS cannot access? If yes, how? Isn't that 0.5 GB still have to be shared with other resources, regardless of whether the OS can access 2^23 or 2^64 address space?
Why not? If you have a larger space then there's obviously more space to map that 0.5GB into. If you have so much space available then some devices can be mapped above 4GB. In the memory-map mechanism each device will be mapped into a range in the available address space. It doesn't mean that the RAM space is shrunk if another device is using the address space. The 32-bit OS can't use that not because the address space is smaller, but because some address range is mapped to another device instead of RAM
For example if we have 4GB of address space then RAM might be mapped into the range 0-2GB and a GPU with 1GB VRAM will be mapped into the top 1GB (i.e. address 0xC0000000). But if we have 4294967296GB of address space then that GPU can be mapped into the address at 12345678GB instead, leaving the low 4GB range for RAM and now you can use the whole 4GB memory available (roughly, because there are always small pages of memory that's not available due to shadowing and other things)
Don't you see modern GPUs having 8GBs of VRAM or more? And there can even be multiple GPUs and NICs in a system. Sometimes mapping errors can happen, for example a PCIe device is mapped into the position at 2GB and limits the available RAM. In that case some tuning is necessary to change the mapping to maximize the available RAM. See