Clarification on the "results" variable passed between aggregate_fit (server side) and fit (client side)

*This question was migrated from Github Discussions.

Original questions:
"I’m a bit confused regarding the results parameter returned in the aggreagate_fit class:

class SaveModelAndMetricsStrategy(fl.server.strategy.FedAvg):
    #
    def aggregate_fit(
        self,
        rnd: int,
        results: List[Tuple[fl.server.client_proxy.ClientProxy, fl.common.FitRes]], # FitRes is like EvaluateRes and has a metrics key 
        failures: List[BaseException],
    ) -> Optional[fl.common.Weights]:

        losses = [r.metrics["cumulative_loss"] for _, r in results]

(I picked the logic for this snippet of code up from your docs: Flower Framework main)

In the corresponding fit method that occurs on the client side, the “results” variable is defined as a dictionary (Flower Framework main). However, on the server side (in aggregate_fit as we defined it above) it becomes a List of tuples. I’m not sure how to interpret it, or how to separate what comes from which client.

I tried to pickle this “results” variable to see if I can understand it better; however, I was unable to do so as it seems to be incompatible with this storing method and gave me back this error:
TypeError: can't pickle _thread.RLock objects"

Answer:
" Great question! Let me try to explain.

On the server-side strategy, we have two “pairs” of methods that (conceptually) belong together:

  • configure_fit and aggregate_fit
  • configure_evaluate and aggregate_evaluate

Let’s look at how Flower performs a single federated round by looking at fit (it works the same way for evaluate).

First, it’s important to mention that the Flower server represents each “real” client by a ClientProxy object. So if there’s a process that lives somewhere, potentially on a different device/machine, and that process calls start_client (which causes it to connect to the server), then the server (the gRPC stack to be precise) generates a ClientProxy object representing that client on the server-side.

When the server starts a round, it asks the strategy to configure that round, so it calls configure_fit on the strategy. configure_fitreturns a list of pairs List[Tuple[ClientProxy, FitIns]]. Each pair represents the instructions FitIns sent to one particular client ClientProxy. So if the strategy selects five clients, then there are five pairs in the list. The FitIns object within those pairs can either be the same across all pairs (which is usually the case), or it can be different for different clients. This allows the strategy to send different instructions to different clients.

The server then takes care of sending those instructions to the clients. In the case of our five clients, we’d have five clients who receive model parameters and config dictionaries in their fit methods. Each client computes their update and returns the updated model parameters, along with the number of local examples and the (unfortunately-named) metrics dict back to the server.

The server collects those results from all clients, and when this results collection phase is over, it passes the results to the strategy for aggregation. To do so, it calls aggregate_fit with a list of pairs (List[Tuple[ClientProxy, FitRes]]), each pair representing the results coming from a single client in the form of a Tuple[ClientProxy, FitRes]. The strategy then goes ahead and uses the updated model parameters in each of those FitRes objects to aggregate the new global model. If anything goes wrong, for example, a single client drops out, then we’d only have four results and one object in the list of failures.

By now, it’s probably obvious what the answer to the original question is: the results that are returned from fit and passed to aggregate_fit as FitRes objects. And because we select multiple clients per round, there’s a list of FitRes objects that get passed to aggregate_fit (always associated with the ClientProxy that represents the client which returned the FitRes result).

And, for the sake of completeness, the configure_evaluate/aggregate_evaluate pair follows the same logic."