Hello there,
I run the siumulation-tensorflow example from flower/examples/simulation-tensorflow at main · adap/flower · GitHub
I have adapted the script slightly as I am also interested in the loss per client and per epoch.
I have added the lines in the fit method of the client as follows:
def fit(self, parameters, config):
self.model.set_weights(parameters)
history = self.model.fit(self.trainset, epochs=2, verbose=VERBOSE)
loss = history.history['loss']
loss_per_epoch = json.dumps(loss)
return self.model.get_weights(), len(self.trainset),
{'loss_per_client_and_epoch': f'{self.cid}:{loss_per_epoch}'}
So the history of tensorflow training is serialized, because flower metrics only accepts data type Scalar, in which lists are not included, but strings.
Then I formulated a fit_metrics_aggregation_fn and named it loss_per_client_epoch_fn:
def loss_per_client_epoch_fn(metrics: List[Tuple[int, Metrics]]) -> Metrics:
loss_epochs = [m["loss_per_client_and_epoch"] for num_examples, m in metrics]
loss_per_client_and_epoch = {}
for item in loss_epochs:
key, value = item.split(':')
value_list = [float(x) for x in value.strip('[]').split(',')]
new_key = f"Client_{key}"
loss_per_client_and_epoch[new_key] = value_list
return {"loss_per_client_and_epoch": loss_per_client_and_epoch}
Running the code, for 3 rounds and 3 clients, the output of Flower history is this:
[SUMMARY]
INFO : Run finished 3 rounds in 26.77s
INFO : History (loss, distributed):
INFO : ('\tround 1: 0.7618376016616821\n'
INFO : '\tround 2: 0.46412525574366253\n'
INFO : '\tround 3: 0.3493043581644694\n')History (loss, centralized):
INFO : ('\tround 0: 121.7120361328125\n'
INFO : '\tround 1: 0.7046558260917664\n'
INFO : '\tround 2: 0.4510457515716553\n'
INFO : '\tround 3: 0.3422752916812897\n')History (metrics, distributed, fit):
INFO : {'loss_per_client_and_epoch': [
INFO : (1,
INFO : {'Client_0': [6.12161111831665,
INFO : 0.9191960692405701],
INFO : 'Client_1': [6.517727375030518,
INFO : 0.8578547835350037],
INFO : 'Client_2': [6.666688919067383,
INFO : 0.9658933281898499]}),
INFO : (2,
INFO : {'Client_0': [0.979604184627533,
INFO : 0.6871640682220459],
INFO : 'Client_1': [0.9863572120666504,
INFO : 0.7082828879356384],
INFO : 'Client_2': [0.9567165970802307,
INFO : 0.6330591440200806]}),
INFO : (3,
INFO : {'Client_0': [0.6875190138816833,
INFO : 0.5470158457756042],
INFO : 'Client_1': [0.6803959608078003,
INFO : 0.5474984049797058],
INFO : 'Client_2': [0.6760547757148743,
INFO : 0.5368837714195251]})]}
INFO : History (metrics, distributed, evaluate):
INFO : {'accuracy': [(1, 0.8008333245913187),
INFO : (2, 0.874999980131785),
INFO : (3, 0.9045000076293945)]}History (metrics, centralized):
INFO : {'accuracy': [(0, 0.11739999800920486),
INFO : (1, 0.807699978351593),
INFO : (2, 0.881600022315979),
INFO : (3, 0.9101999998092651)]}
So my point is the History (loss, distributed). In my understanding this is the fit loss aggregated from all clients per round. For example for round 1 it is 0.7618… Looking at the loss per client and epoch in round 1, there are no values even smaller than 0.9… so my question is: How is loss, distributed calculated exactly? The num_examples among the clients is equal, so the aggregated loss distributed should be the average of the losses in the dictionary loss_per_client_and_epoch or am I understanding it entirely wrong?
Thanks for your help.