Stopping serverapp/clientapp on failure

Hi!

In flwr 1.18, I noticed that when using isolation process mode to launch the server and clients, a client process continues running even if the training fails (it does not terminate automatically, same behaviour with the server). I have to execute flwr stop manually, and in practice the cleanest workaround has been to restart the processes.

Is there an existing mechanism - or one currently under development - that supports automatic termination of all components upon failure? For example, if the server receives at least one failure response from any client, it should trigger a coordinated shutdown of all serverapp/clientapp processes and mark the run as stopped.

Is such behavior already supported when using subprocess isolation mode?

P.S.: I would like to implement such a mechanism if it is not already under development. If you have any ideas on how this could be done properly and in the most logical way, I would appreciate it.

Hi! Flower 1.18 is quite old at this point, we made substantial improvements to this part of the system over the last few releases. Would you mind trying again with Flower 1.25?

You can try both isolation mode subprocess (the default) and isolation mode process. subprocess is easier to get started and suitable for simple setups and prototyping, process is more work to set up but gives you more flexibility for production deployments.