I am working on an attention model, and before running the final model, I was going through the tensor shapes which flow through the code. I have an operation where I need to reshape the tensor. The tensor is of the shape torch.Size([[30, 8, 9, 64]]) where 30 is the batch_size, 8 is the number of attention head (this is not relevant to my question) 9 is the number of words in the sentence and 64 is some intermediate embedding representation of the word. I have to reshape the tensor to a size of torch.size([30, 9, 512]) before processing it further. So I was looking into some reference online and they have done the following x.transpose(1, 2).contiguous().view(30, -1, 512)
whereas I was thinking that this should work x.transpose(1, 2).reshape(30, -1, 512).
In the first case the grad_fn is <ViewBackward>, whereas in my case it is <UnsafeViewBackward>. Aren't these two the same operations? Will this result in a training error?