Server still waiting while all clients crashes?

huongdm · November 28, 2024, 12:31pm

Hello,

I am trying to use Flower in real computer cluster. I realise that with the setting of strategy for example fedAvg, the server will waiting until get the feedback from at least a number of clients min_clients (fit, eva, avai,…).

It means the server still waiting even all clients was crashed (no responde).

Right now I am using manual kill to stop the server. I wrote a script to kill server/clients in failure by frequency check the output of log file but it is a bit annoying to use.

I tried to see the topic about “drop_out” but seem it does not solve my issue. Do we have any “auto dectect and kill task” in this case? For example set a time limit for server waiting?

chongshenng · December 21, 2024, 6:02pm

Hello @huongdm, welcome to Flower Discuss! Apologies for the late reply.

Can I find out a bit more about your setup? Are you running Flower in simulation mode on the cluster (without spinning up SuperNodes)? In our flwr == 1.14.0 release yesterday, we introduced the flwr stop command that you can run to terminate a specific run-id. Presently, users need to explicitly run the flwr stop command.

One possibility is for you to track your experiments using W&B or TensorBoard, and if any experiment is taking longer than expected, you could ssh to the cluster and execute flwr stop for that run.

huongdm · March 5, 2025, 2:59pm

Thank you for your reply, I dont use supernodes. Actually it is not problem of Flower, it belongs to deploying in our cluster. I found the solution for my case: use a mpi parralel processes in stead of subprocess to control our exp. then i can handle my exp by mpi.

system · March 12, 2025, 2:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Server Automatically shutdown during implement Flower on 3 devices Flower Framework	8	124	June 11, 2024
Early Stopping Implementation Flower Help - Intermediate	7	248	November 22, 2024
How to avoid Flower Next from destroying my model on every fit and every evaluate Flower Help - Beginners	5	133	December 21, 2024
It seems that fit() is not called normally Flower Help - Beginners flower	4	18	July 13, 2025
Launching multiple clients in simulation environnement Flower Help - Beginners	1	182	November 3, 2024

Server still waiting while all clients crashes?

Related topics