Running two or more trainings in parallel

buu2clj · December 8, 2025, 2:17pm

Hi! I’m using flwr v1.18 and I want to run 2 trainings in parallel. For that, I start the superlink, 2 serverapps, 2 supernodes and 2 clients respectively which stay running all the time. The trainings are different, meaning they also contain different implementations for the server and the client. The pods where I run the serverapps/clientapps are based on 2 different environments.

My problem is that when I run flwr run, the run associated with one experiment goes to the wrong serverapp. I’ve noticed the same issue on the client side as well: the FAB appears to be installed randomly, causing my training to crash due to an incompatible environment.

My question is whether there is a way to differentiate between multiple trainings when multiple serverapps and clientapps are already running, or if this behavior has been fixed in newer versions?

williamlm · December 8, 2025, 11:03pm

Hi @buu2clj

Great to have you here in the community!

Based on the Flower v1.18 architecture, multiple ServerApps and ClientApps can run within the same federation, which consists of a single long-running SuperLink and multiple long-running SuperNodes. However, the way you’ve set up your system might be causing routing issues.

In Flower v1.18, when you run flwr run, it creates a new run with a unique run ID and bundles your Flower app into a FAB (Flower Application Bundle) file. The FAB file is then shipped, via the SuperExec, to both the SuperLink and those SuperNodes that need it.

The problem is that if you have pre-started ServerApps running in process isolation mode, they’re essentially waiting to be matched with runs, but Flower’s routing mechanism in v1.18 may not correctly associate each flwr run command with the appropriate ServerApp when you have multiple ServerApps with different environments already running.

Instead of pre-starting ServerApps, you should:

Use subprocess isolation mode (default) - Let the SuperLink manage ServerApp processes automatically
Run everything through flwr run - Don’t pre-start ServerApps

Here’s the workflow:

# Start SuperLink once (keeps running)

# Start SuperNodes once (keep running)

# Run your different experiments (ServerApps are launched automatically)
# Experiment 1
cd /path/to/experiment1
flwr run . local-deployment

# Experiment 2  
cd /path/to/experiment2
flwr run . local-deployment

If you must use process isolation mode with pre-started ServerApps/ClientApps, this is a known limitation in v1.18. The architecture supports multi-run, but the routing mechanism may not handle multiple pre-running ServerApps with different dependencies reliably.

Has this been fixed in newer versions?

Yes! Flower v1.20+ introduced significant improvements:

SuperExec was introduced as a component responsible for scheduling, launching, and managing app processes within the Flower deployment runtime, with a token-based mechanism that improves security by assigning a unique token to each app execution
Better run isolation and routing mechanisms

I’d strongly recommend upgrading to the latest version (v1.24 as of now) where these multi-run scenarios are better supported.

buu2clj · December 15, 2025, 12:42pm

Thank you for the quick reply!

Could you share more details about how this works in the latest version? Specifically, how is the entire multi-run flow handled when using SuperExec? How does the FAB corresponding to each run determine which serverapp/clientapp process it should install into, so that the correct execution order and logical separation between runs (each potentially using a different environment) is maintained?

For example, if I have a serverapp–clientapp pair running in process mode and I execute flwr run, will the run instance and both components share the same unique token?

I want to avoid a scenario where multiple components are already running and two runs are active simultaneously, and the FAB associated with one run is mistakenly installed on the wrong serverapp/clientapp instance.

Topic		Replies	Views
How to avoid Flower Next from destroying my model on every fit and every evaluate Flower Help - Beginners	5	291	December 21, 2024
How to launch Flower Next style simulation in multinodes Flower Help - Beginners	1	141	September 18, 2024
Issue: Client loses connection to Flower server after long HPC runs Flower Help - Intermediate	9	271	November 10, 2025
Stopping serverapp/clientapp on failure Flower Help - Intermediate	1	72	December 16, 2025
Announcing Flower 1.10 General	2	623	September 13, 2024

Running two or more trainings in parallel

Related topics