Hello
Im trying to run a training with 1 client that uses 2 or more gpus with DDP strategy.
I have this issue where the DDP strategy interferes with the ClientApp, it spawns/forks additional processes which re-enter the ClientApp runtime and interfere with the grpc ports/tls flags.
The only solution that I found at this point is to move the training logic from the client into a separate script and run it inside the ClientApp using torch run.
Anybody else encountered this issue? What other solutions do I have?