I'm learning how to use fairseq to implement a simple translation model based on Transformer.
I would like to use 2 GeForce RTX 3090 GPUs on my lab server. Which option should I select for --ddp-backend of fairseq-train?
Furthermore, could you explain about the meaning of all following options for --ddp-backend and when to use them respectively?
From
fairseqDocumentation: Command-line Tools =>fairseq-train=> distributed_training
--ddp-backend:Possible choices: c10d, fully_sharded, legacy_ddp, no_c10d, pytorch_ddp, slowmo
DistributedDataParallel backend
Default: “pytorch_ddp”
I'm new to stack exchange community, sorry if there is any inappropriate action.