Shared memory is "striped" into banks. This leads to the whole issue of bank conflicts, as we all know.
Question: But how can you determine how many banks ("stripes") exist in shared memory?
(Poking around NVIDIA "devtalk" forums, it seems that per-block shared memory is "striped" into 16 banks.  But how do we know this?  The threads suggesting this are a few years old.  Have things changed?  Is it fixed on all NVIDIA CUDA-capable cards?  Is there a way to determine this from the runtime API (I don't see it there, e.g. under cudaDeviceProp)?   Is there a manual way to determine it at runtime?)