I am doing computations in Cuda using floats. Because we do not have enough memory on the GPU, we store the raw data as uint16_t and int16_t on the GPU. Thus, before I use this data I have to convert it to floats.
The number of ints is not that large (approximately 12k of uint16_t and the same number of int16_t). Profiling showed that converting the numbers takes a considerable amount of time (approx. 5-10%). The rest of the calculation cannot be optimized more.
Thus my 3+1 questions are:
- What is the fastest way to convert
ints tofloats. - Is there a substantial difference when converting
int16_toruint16_t. - Is there a substantial difference when converting larger
inttypes, e.g.int32orint64. - Why are all questions on SO about converting
floats toints. Is this something one usually does not do?