How are threads in CUDA thread block split into warps?

Question

NVidia GPU specifies that 1 warp has a fixed number of threads (32), then how are the threads in thread block split to different warps?

For 1 dimension thread block as (128, 1), it looks the threads in x dimension are spit by 32 threads into different warps sequentially, but how does it work for other dimension sizes, like (16, 2), will the 32 threads map to 1 warp in this case?

I wonder how a grid of 3 blocks each with 8 threads are scheduled. 3 warps on same sm unit or single warp or three sm units with 1 warp each.. — huseyin tugrul buyukisik, Sep 23 '19 at 00:48
see also [here](https://stackoverflow.com/questions/14257550/cuda-thread-id-assignment-in-2d-grid/14258158#14258158) for a 2D example (covers your 16,2 case), and it is also covered in [the CUDA programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#simt-architecture). — Robert Crovella, Sep 23 '19 at 01:34
@RobertCrovella: This is not a dupe, because the other question does not specifically deal with the case of dimensions which do not fill complete warps; nor does talonmies' answer clarify what happens in those cases. — einpoklum, May 31 '22 at 08:08
@einpoklum But is there anything preventing somebody posting an answer to the linked duplicate that *does* address those issues? — Adrian Mole, May 31 '22 at 10:58
@AdrianMole: Actually, yes. The fact that the question there has an acceptable and answer. Only someone who visits there through here might consider doing that; but the appropriate course of action would be either to reopen this question, or to generalize/problematize the other question further. I think the former is the way to go. — einpoklum, May 31 '22 at 12:43
I've pulled an additional duplicate link out of the comments and put it in the duplicate stamp. I believe talonmies answer does cover the partial case, but in case not I've added verbiage to the additional duplicate link that does. — Robert Crovella, May 31 '22 at 14:33

How are threads in CUDA thread block split into warps?

0 Answers0