Metrics_distributed always stays empty {}

Hi, thank you for the detailed report!

The behaviour described above is expected behaviour.

Let me explain why: the History object only records aggregated metrics, but not individual metrics dicts coming from single clients. There are currently four types of metrics recorded:

  • loss centralized: no need to aggregate, just a single value
  • metrics centralized: no need to aggregate, just a single value for each key
  • loss distributed: can be automatically aggregated, strategy knows how
  • metrics distributed: must be aggregated, but can not be done automatically

Distributed metrics are the “odd one out”, they cannot be automatically aggregated because the strategy can not know which keys (and value types) to expect. This is why metrics_distributed is empty by default.

How can custom metric dicts be aggregated on the server-side?

The built-in strategies all support passing both a fit_metrics_aggregation_fn and an evaluate_metrics_aggregation_fn. The concept is quite easy to understand: Flower calls these functions and hands them the metrics dicts it received from the clients, the functions aggregate those dictionaries, and Flower records the aggregated result in the History. Here’s an example: flower/examples/quickstart-pytorch/server.py at main · adap/flower · GitHub

PS

Regarding the code in question: evaluate_round returns the aggregated loss and the aggregated metrics dict, which means that at this point in the code aggregation has already happened. The type signature reflects that:

    def evaluate_round(
        self,
        server_round: int,
        timeout: Optional[float],
    ) -> Optional[
        Tuple[Optional[float], Dict[str, Scalar], EvaluateResultsAndFailures]
    ]:
1 Like