Hierarchical Federated Learning

Hi everyone,

We are exploring the idea of introducing edge servers into a federated learning system to sit between clients and the central server. This aligns with hierarchical federated learning (HFL), where edge servers aggregate client models locally before sending updates to the global server. Potential benefits include:

• Improved scalability and reduced communication overhead.

• Better handling of client heterogeneity through localized aggregation.

Opportunities to Contribute

We’d love to hear your input! Here’s how you can contribute:

  1. Ideas and Design: Share thoughts on implementing edge servers or suggest design improvements.

  2. Code and Prototypes: Help experiment with hierarchical aggregation using Flower.

  3. Research and Resources: Share papers, tools, or examples relevant to hierarchical federated learning.

Key Questions

• How can hierarchical federated learning be implemented using Flower?

• What modifications or extensions would be needed?

• Are there best practices for multi-level aggregation strategies (client-edge-server-global)?

Let’s collaborate to explore this concept!

7 Likes

Hi, is this in the works? I’d appreciate an update. This would be very useful!

Hi @marykor, we recently had a presentation about it during our Flower AI Summit 2025.

I’ll try to get the paper if it is public and link it here, from what I know that author might implement the baseline for Hierarchical FL. Will keep you updated.

2 Likes

Here’s the paper I mentioned: Hierarchical Federated Learning for Natural Disaster Management | IEEE Conference Publication | IEEE Xplore

Hi, is there runnable code or illustration of the implementation in HFL?

Currently looking for the open-source code from the authors, will post here when reply.

I have been working on an HFL implementation, but I was wondering if there is any way to change the SuperLink to which a SuperNode is connected to. Does Flower have any support for dynamically changing the federations — for example, if I have two federations that are running two separate training jobs with one SuperLink and two SuperNodes each, can I make one of the SuperNodes connected to each of the SuperLinks switch places and connect to the other SuperLink instead, running the associated ClientApp or handling anything else necessary?

3 Likes

As a quick suggestion, there are two ways to do this:

  • Start/stop the SuperNode: You could run the SuperNode, connect it to one SuperLink, have it do some work, then save some intermediate results, stop the SuperNode, have to connect to the other SuperLink and do some other task from that other federation (repeatedly)
  • Run two SuperNodes on the same machine/data: Essentially, the common mental model is to have one SuperNode represent one data partition (e.g. one hospital in a federation of hospitals). You could start two SuperNodes with the same credentials (public/private key) and access to the same data, one connected to SuperLink #1 and the other connected to SuperLink #2. That way, both would run in parallel and they could share state using any kind of local state (files, DB, Redis, …).

As a side note, we’ll soon have support for running multiple federations on top of a single SuperLink. If you’re interested in becoming an early tester, you can apply for SuperGrid Early Access here: Flower Early Access

2 Likes

Thanks! These are two natural approaches with what is currently possible with Flower.

My current attempt to achieve what I was talking about is pretty much what you mentioned in your first point — stopping a SuperNode that is part of one SuperLink’s already running federation and starting it back up with the address of a different SuperLink which is also already running a federation. I had a couple of questions about the resulting behavior when doing this, though:

  • What happens with regard to the ServerApps and ClientApps that are running? Understandably, the ServerApps just continue running on either SuperLink as nothing on them is changed, but does the newly connecting SuperNode automatically get sent another ClientApp by the new SuperLink?
  • Is the only thing affected by this just the number of clients which the ClientManager samples from for each fit round (changing when a new SuperNode joins or leaves a federation)?

I don’t mean to be redundant or ask about things that may be obvious, so please feel free to point me to any relevant documentation where anything is already explained if it is.

Valid questions! The (somewhat simplified) full process for executing a Flower run is as follows:

  • When you use flwr run, the flwr CLI packages up your local project code (which includes the ServerApp and ClientApp) into a “Flower App Bundle” (FAB). This gets sent to the SuperLink. The SuperLink then starts the Run, which means it starts the ServerApp (in an isolated process). The ServerApp sends Messages to the SuperNodes connected to that SuperLink. When a SuperNode receives the first Message from that Run, it pulls the corresponding FAB and loads the ClientApp to process that Message. When the next Message originating from the same ServerApp/Run arrives, the SuperNode already has the FAB and it can directly load the ClientApp to pass/execute the Message.
  • This means that if your SuperNode shuts down, the ServerApp on SuperLink #1 continues to run. If the SuperNode now connects to SuperLink #2, you have to flwr run something on that second SuperLink. The SuperNode which now connects to this SuperLink will then receive the FAB file to load the other ClientApp that belongs to the Run running on SuperLink #2.
  • The two ServerApps running on SuperLink #1 and #2 should ideally be aware that one of the SuperNodes isn’t available at all times, so they should either continue to work with just one SuperNode or wait for the 2nd SuperNode to reconnect. For the running ServerApp, the only thing that changes is the number of SuperNode node_ids it receives when calling Grid.get_node_ids()(when you use the Message API) or the number of clients that ClientManager can sample (ClientManager is just a higher-level abstraction, Grid is the lower-level API you can use to send/receive Messages directly).