Hello,
I wanted to ask how to return an aggregated confusion matrix and an aggregated classification report. I see that the evaluate
function in my defined Flower Client class expects three return values: (loss: float, num_examples: int, and metrics: Dict[str, Scalar])
. This is from the typing.py
EvaluateRes
definition.
This means that it is easy to return scalar values in the evaluate
class, such as {'accuracy': acc, 'f1-score': f1, ...}
, but it doesn’t work for the classification report from sklearn and the confusion matrix. For example, the classification report has the structure, if output_dict=True
, of {'0': {'precision': 0....., 'recall': 0...., 'f1-score': 0...., 'support': xxxx}, '1': {'precision': xxxxx, ...}
, so this doesn’t work because it isn’t compatible with Dict[str, Scalar]
.
My idea was to extend the allowed return values from the evaluate
function in my Flower Client class to four. To do this, many changes in the Python files are required, such as in typing.py
, client.py
, numpy_client.py
, etc. For example, in typing.py
, my idea was to create a new definition for metrics_custom
. As mentioned before, this requires many changes in many files in Flower. Of course, you could save each value as a key-value pair, resulting in numerous values, but I wanted to ask if there is a different way to achieve this.
@dataclass
class EvaluateRes:
"""Evaluate response from a client."""
status: Status
loss: float
num_examples: int
metrics: Dict[str, Scalar]
****metrics_custom = Dict[str,Dict[str,float]] ****