How to return aggregated confusion matrix and classification report

matt08 · May 23, 2024, 6:48am

Hello,

I wanted to ask how to return an aggregated confusion matrix and an aggregated classification report. I see that the evaluate function in my defined Flower Client class expects three return values: (loss: float, num_examples: int, and metrics: Dict[str, Scalar]). This is from the typing.py EvaluateRes definition.

This means that it is easy to return scalar values in the evaluate class, such as {'accuracy': acc, 'f1-score': f1, ...}, but it doesn’t work for the classification report from sklearn and the confusion matrix. For example, the classification report has the structure, if output_dict=True, of {'0': {'precision': 0....., 'recall': 0...., 'f1-score': 0...., 'support': xxxx}, '1': {'precision': xxxxx, ...}, so this doesn’t work because it isn’t compatible with Dict[str, Scalar].

My idea was to extend the allowed return values from the evaluate function in my Flower Client class to four. To do this, many changes in the Python files are required, such as in typing.py, client.py, numpy_client.py, etc. For example, in typing.py, my idea was to create a new definition for metrics_custom. As mentioned before, this requires many changes in many files in Flower. Of course, you could save each value as a key-value pair, resulting in numerous values, but I wanted to ask if there is a different way to achieve this.

@dataclass
class EvaluateRes:
    """Evaluate response from a client."""

    status: Status
    loss: float
    num_examples: int
    metrics: Dict[str, Scalar]
    ****metrics_custom = Dict[str,Dict[str,float]] ****

adam-narozniak · May 23, 2024, 11:33am

Hi,
We plan to keep the metrics (here and in the newer versions of it) simple to provide some out-of-the-box aggregation in the future.

I see a few solutions to your current problem:

Convert the nested dict to a single dict
The conversation could look such that the keys are concatenated e.g. “0_precission”: value, …
Serialize the dict and return in metrics
Serialize it using: pickle.dumps(my_dict) and deserialize it using pickle.loads(serialized_dict) return as e.g. “confusion_matrix” : serialized_dict
Serialize the dict and save it in ConfigsRecord (when using low-level API)

I hope it helps

adam-narozniak · May 24, 2024, 2:28pm

Also, note that averaging f1 in the standard way (simple/weighted avg) will give incorrect results. (Same case as in cross-validation). I’d recommend recalculating the F1 score based on TP, FP, FN, and TN values.

system · May 31, 2024, 2:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Communicate List[Float] in fit() function from Clients Flower Help - Beginners metrics , flower	2	65	October 26, 2024
Clarification on the "results" variable passed between aggregate_fit (server side) and fit (client side) Flower Help - Intermediate	0	164	February 21, 2024
Metrics_distributed always stays empty {} Flower Framework	1	283	March 5, 2024
How do I generate a confusion matrix for each client at the end of each round? Flower Help - Beginners faq , metrics	1	175	March 1, 2024
How to get the model parameters without aggregating? Flower Help - Intermediate	7	230	March 22, 2024

How to return aggregated confusion matrix and classification report

Related topics