Suppose we have a Pytorch distributed group object that initialized by torch.distributed.new_group([a,b,c,d]), is there any way to get the global ranks a,b,c,d from this group?
Asked
Active
Viewed 796 times
2
desertnaut
- 57,590
- 26
- 140
- 166
Qin Heyang
- 1,456
- 1
- 16
- 18
1 Answers
0
Pytorch offers an torch.distributed.distributed_c10d._get_global_rank function can be used in this case:
import torch.distributed as dist
def get_all_ranks_from_parallel_group(group):
rank=0
results=[]
try:
while True:
results.append(dist.distributed_c10d._get_global_rank(group, rank))
rank+=1
except RuntimeError:
pass
return results
Qin Heyang
- 1,456
- 1
- 16
- 18