Client selection in a single Flower federation with persistent client identities and metadata-based sampling

Hi,

First of all, thanks for the framework.

I’m relatively new to Flower and currently using the Docker deployment on different devices. My current goal is to have a single federation and then decide, per training run, which clients should participate. I haven’t been able to find a clear or up-to-date answer on whether this is possible, or how best to achieve it.

Why not simply create multiple federations? I’d like to avoid running multiple federation networks with many containers on each client just to sample different subgroups. Instead, I’m aiming for one federation with the ability to select only the necessary clients (for example, clients that have access to specific data).

I’ve searched both GitHub issues and this forum and found several related discussions. Based on what I’ve read, I still have the following questions:

  1. Why is there both FedAvg in flwr.serverapp.strategy and in flwr.server.strategy, and what is the difference between them?

  2. In the client selection logic for flwr.serverapp.strategy
    (https://github.com/adap/flower/blob/main/framework/py/flwr/serverapp/strategy/fedavg.py), I only seem to have access to the cid of each client. In theory, this would allow client selection if I knew the cids of the clients I want to sample.
    This leads to a follow-up question: is it possible to persist the client ID across container restarts? Every time the flower-exec container (if I remember correctly) restarts, it receives a new cid. In that case, this approach would not work, since maintaining a stable mapping from ID → client would be impossible.

  3. In flwr.server.strategy
    (https://github.com/adap/flower/blob/main/framework/py/flwr/server/strategy/fedavg.py), the client manager seems to be available. Is it possible there to access node configuration or other client metadata to perform sampling? I couldn’t find anything indicating this, or does this API also only expose the client ID?
    (https://github.com/adap/flower/blob/main/framework/py/flwr/server/client_manager.py). Alternatively, would this be addressed once the following PR is merged, where client metadata is added to the gRPC connection?
    https://github.com/adap/flower/pull/4349

Thanks in advance!

Resources I considered:

  1. https://discuss.flower.ai/t/deterministic-client-sampling/431

  2. https://discuss.flower.ai/t/custom-client-selection-strategy/63

  3. https://discuss.flower.ai/t/how-do-i-write-a-custom-client-selection-protocol/74/3

2 Likes

Hi @flippchen,

Thanks for using Flower. To control which clients (SuperNodes) participate in each round, one straightforward approach is to override the .configure_train method of the strategy you’re using.

For example, below is the .configure_train method from FedAvg (see the implementation here):

def configure_train(
    self, server_round: int, arrays: ArrayRecord, config: ConfigRecord, grid: Grid
) -> Iterable[Message]:
    """Configure the next round of federated training."""
    # Do not configure federated train if fraction_train is 0.
    if self.fraction_train == 0.0:
        return []

    # Sample nodes
    num_nodes = int(len(list(grid.get_node_ids())) * self.fraction_train)
    sample_size = max(num_nodes, self.min_train_nodes)
    node_ids, num_total = sample_nodes(
        grid, self.min_available_nodes, sample_size
    )
    log(
        INFO,
        "configure_train: Sampled %s nodes (out of %s)",
        len(node_ids),
        len(num_total),
    )

    # Always inject current server round
    config["server-round"] = server_round

    # Construct messages
    record = RecordDict(
        {self.arrayrecord_key: arrays, self.configrecord_key: config}
    )
    return self._construct_messages(record, node_ids, MessageType.TRAIN)

As you can see, the strategy first calls sample_nodes, which internally relies on grid.get_node_ids() to obtain the node IDs (i.e., the identifiers of SuperNodes / clients). It then constructs one message per selected node ID. All messages returned by configure_train are sent to the corresponding SuperNodes.

In other words, this method gives you full control over client selection: you can implement your own logic to choose specific nodes and return messages only for those SuperNodes.

Alternatively, you can use Flower’s low-level Message API, where client selection is entirely explicit. A good example of this approach is shown in the Quickstart Pandas tutorial.

1 Like

Hi @pan-h,

Thanks for the answer.

While overriding configure_train does give full control over which node IDs are selected, I still don’t have a reliable way to know which node ID corresponds to which actual client. Since node IDs are not persistent across container restarts, I can’t maintain a stable mapping (e.g., “this client id has dataset/resources xy”) over time.

This seems to imply that I’d need to re-identify all node IDs each time the federation starts before meaningful client selection is possible, unless I’m overlooking something.

Hi @flippchen, I see your point.

There are two possible solutions here:

  1. Implement a simple query function, similar to the one in the Quickstart Pandas tutorial. At the start of your ServerApp, you can query all SuperNodes once and collect the required information upfront.

  2. Fix the SuperNode ID. While I personally prefer option 1 because it makes the code more deterministic, you can also ensure a stable node ID by enabling node authentication. When authentication is enabled, the node ID is fixed. You can follow this guide to enable SuperNode authentication.

Just FYI:

In auto-authentication mode (default), node IDs are generated randomly if a SuperNode is restarted. SuperLink and SuperNode are designed to be lightweight, long-running agents that manage actual code execution. As a result, the SuperNode’s node ID may change, since Flower does not persist any machine-specific identifiers.

Hi @pan-h,

thanks for your answer!
I’ll try the query approach first, and once the deployment is stable, I can move on to fixed authentication for persistent node IDs.

Much appreciated!

hi @flippchen,

did @pan-h 's suggestion resolve what you were facing?

Best regards,
William

Hi @williamlm,

yes, @pan-h’s suggestion resolved the issue, thanks for following up.

The query approach works well and unblocks us for now. That said, it still feels a bit like a workaround, so native support or a more first-class solution for this use case would be great to see in the future.

Thanks again for the help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.