I wish to report one strange observation. I was able to run my federated code with available CPU. Today, I installed CUDA 11.8 and tried to run the code. It is getting stuck and not able to start training.
To reproduce the issue, I again tried running with CPU only and it was working fine.
I have installed cuda and cudaCNN compatible with pytorch after checking on official pytorch website and they were installed successfully without any warning or error message. I am using an AWS instance windows XP 2022 server with NVIDIA A10G gpu. When i run any other centralized learning code on cuda, it works fine. But when i run federated learning code, it gives error. The training does not start and then grpc channel closes after waiting for sometime.
2 Likes
Hi @shubham.ecebtech14, could you paste here the error you obtain ? What version of flower are you using ? and what version of Python? Have you tried running the examples/simulation-pytorch
?