Running two or more trainings in parallel

Hi! I’m using flwr v1.18 and I want to run 2 trainings in parallel. For that, I start the superlink, 2 serverapps, 2 supernodes and 2 clients respectively which stay running all the time. The trainings are different, meaning they also contain different implementations for the server and the client. The pods where I run the serverapps/clientapps are based on 2 different environments.

My problem is that when I run flwr run, the run associated with one experiment goes to the wrong serverapp. I’ve noticed the same issue on the client side as well: the FAB appears to be installed randomly, causing my training to crash due to an incompatible environment.

My question is whether there is a way to differentiate between multiple trainings when multiple serverapps and clientapps are already running, or if this behavior has been fixed in newer versions?

Hi @buu2clj

Great to have you here in the community!

Based on the Flower v1.18 architecture, multiple ServerApps and ClientApps can run within the same federation, which consists of a single long-running SuperLink and multiple long-running SuperNodes. However, the way you’ve set up your system might be causing routing issues.

In Flower v1.18, when you run flwr run, it creates a new run with a unique run ID and bundles your Flower app into a FAB (Flower Application Bundle) file. The FAB file is then shipped, via the SuperExec, to both the SuperLink and those SuperNodes that need it.

The problem is that if you have pre-started ServerApps running in process isolation mode, they’re essentially waiting to be matched with runs, but Flower’s routing mechanism in v1.18 may not correctly associate each flwr run command with the appropriate ServerApp when you have multiple ServerApps with different environments already running.

Instead of pre-starting ServerApps, you should:

  1. Use subprocess isolation mode (default) - Let the SuperLink manage ServerApp processes automatically
  2. Run everything through flwr run - Don’t pre-start ServerApps

Here’s the workflow:

# Start SuperLink once (keeps running)

# Start SuperNodes once (keep running)

# Run your different experiments (ServerApps are launched automatically)
# Experiment 1
cd /path/to/experiment1
flwr run . local-deployment

# Experiment 2  
cd /path/to/experiment2
flwr run . local-deployment

If you must use process isolation mode with pre-started ServerApps/ClientApps, this is a known limitation in v1.18. The architecture supports multi-run, but the routing mechanism may not handle multiple pre-running ServerApps with different dependencies reliably.

Has this been fixed in newer versions?

Yes! Flower v1.20+ introduced significant improvements:

  • SuperExec was introduced as a component responsible for scheduling, launching, and managing app processes within the Flower deployment runtime, with a token-based mechanism that improves security by assigning a unique token to each app execution
  • Better run isolation and routing mechanisms

I’d strongly recommend upgrading to the latest version (v1.24 as of now) where these multi-run scenarios are better supported.

Thank you for the quick reply!

Could you share more details about how this works in the latest version? Specifically, how is the entire multi-run flow handled when using SuperExec? How does the FAB corresponding to each run determine which serverapp/clientapp process it should install into, so that the correct execution order and logical separation between runs (each potentially using a different environment) is maintained?

For example, if I have a serverapp–clientapp pair running in process mode and I execute flwr run, will the run instance and both components share the same unique token?

I want to avoid a scenario where multiple components are already running and two runs are active simultaneously, and the FAB associated with one run is mistakenly installed on the wrong serverapp/clientapp instance.