Is it possible to perform an RDMA operation between a GPU and a remote host?
Yes, it is possible to move data between GPU and Infiniband card with "GPUDirect RDMA" feature of Nvidia Compute GPUs (Tesla and Quadro) since 2012 (Kepler-class GPUs and CUDA 5.0). There is web page about GPUDirect RDMA in CUDA Toolkit http://docs.nvidia.com/cuda/gpudirect-rdma/
GPUDirect RDMA is a technology introduced in Kepler-class GPUs and CUDA 5.0 that enables a direct path for data exchange between the GPU and a third-party peer device using standard features of PCI Express. Examples of third-party devices are: network interfaces, video acquisition devices, storage adapters.
GPUDirect RDMA is available on both Tesla and Quadro GPUs.
A number of limitations can apply, the most important being that the two devices must share the same upstream PCI Express root complex. Some of the limitations depend on the platform used and could be lifted in current/future products.
A few straightforward changes must be made to device drivers to enable this functionality with a wide range of hardware devices. This document introduces the technology and describes the steps necessary to enable an GPUDirect RDMA connection to NVIDIA GPUs on Linux.
There are some limitations: http://docs.nvidia.com/cuda/gpudirect-rdma/index.html#supported-systems
2.4. Supported Systems
General remarks. Even though the only theoretical requirement for
  GPUDirect RDMA to work between a third-party device and an NVIDIA GPU
  is that they share the same root complex, there exist bugs (mostly in
  chipsets) causing it to perform badly, or not work at all in certain
  setups.
We can distinguish between three situations, depending on what is on
  the path between the GPU and the third-party device: PCIe switches
  only single CPU/IOH CPU/IOH <-> QPI/HT <-> CPU/IOH The first
  situation, where there are only PCIe switches on the path, is optimal
  and yields the best performance. The second one, where a single
  CPU/IOH is involved, works, but yields worse performance ( especially
  peer-to-peer read bandwidth has been shown to be severely limited on
  some processor architectures ). Finally, the third situation, where
  the path traverses a QPI/HT link, may be extremely performance-limited
  or even not work reliably. Tip: lspci can be used to check the PCI
  topology: 
$ lspci -t 
Platform support For IBM Power 8 platform,
  GPUDirect RDMA and P2P are not supported, but are not explicitly
  disabled. They may not work at run-time.
On ARM64, the necessary peer-to-peer functionality depends on both the
  hardware and the software of the particular platform. So while
  GPUDirect RDMA is not explicitly disabled in this case, there are no
  guarantees that it will be fully functional.
IOMMUs GPUDirect RDMA currently relies upon all physical addresses
  being the same from the different PCI devices' point of view. This
  makes it incompatible with IOMMUs performing any form of translation
  other than 1:1, hence they must be disabled or configured for
  pass-through translation for GPUDirect RDMA to work.