Not clear how loss distributed is calculated

vikwal · June 7, 2024, 12:33pm

Hello there,

I run the siumulation-tensorflow example from flower/examples/simulation-tensorflow at main · adap/flower · GitHub

I have adapted the script slightly as I am also interested in the loss per client and per epoch.
I have added the lines in the fit method of the client as follows:

def fit(self, parameters, config):

        self.model.set_weights(parameters)

        history = self.model.fit(self.trainset, epochs=2, verbose=VERBOSE)
        loss = history.history['loss']
        loss_per_epoch = json.dumps(loss)
        
        return self.model.get_weights(), len(self.trainset), 
{'loss_per_client_and_epoch': f'{self.cid}:{loss_per_epoch}'}

So the history of tensorflow training is serialized, because flower metrics only accepts data type Scalar, in which lists are not included, but strings.

Then I formulated a fit_metrics_aggregation_fn and named it loss_per_client_epoch_fn:

def loss_per_client_epoch_fn(metrics: List[Tuple[int, Metrics]]) -> Metrics:

    loss_epochs = [m["loss_per_client_and_epoch"] for num_examples, m in metrics]
    
    loss_per_client_and_epoch = {}
    
    for item in loss_epochs:
        key, value = item.split(':')
        value_list = [float(x) for x in value.strip('[]').split(',')]
        new_key = f"Client_{key}"
        loss_per_client_and_epoch[new_key] = value_list
        
    return {"loss_per_client_and_epoch": loss_per_client_and_epoch}

Running the code, for 3 rounds and 3 clients, the output of Flower history is this:

[SUMMARY]
INFO :      Run finished 3 rounds in 26.77s
INFO :      History (loss, distributed):
INFO :          ('\tround 1: 0.7618376016616821\n'
INFO :           '\tround 2: 0.46412525574366253\n'
INFO :           '\tround 3: 0.3493043581644694\n')History (loss, centralized):
INFO :          ('\tround 0: 121.7120361328125\n'
INFO :           '\tround 1: 0.7046558260917664\n'
INFO :           '\tround 2: 0.4510457515716553\n'
INFO :           '\tround 3: 0.3422752916812897\n')History (metrics, distributed, fit):
INFO :          {'loss_per_client_and_epoch': [
INFO :                (1,
INFO :                 {'Client_0': [6.12161111831665,
INFO :                               0.9191960692405701],
INFO :                  'Client_1': [6.517727375030518,
INFO :                               0.8578547835350037],
INFO :                  'Client_2': [6.666688919067383,
INFO :                               0.9658933281898499]}),
INFO :                (2,
INFO :                 {'Client_0': [0.979604184627533,
INFO :                               0.6871640682220459],
INFO :                  'Client_1': [0.9863572120666504,
INFO :                               0.7082828879356384],
INFO :                  'Client_2': [0.9567165970802307,
INFO :                               0.6330591440200806]}),
INFO :                (3,
INFO :                 {'Client_0': [0.6875190138816833,
INFO :                               0.5470158457756042],
INFO :                  'Client_1': [0.6803959608078003,
INFO :                               0.5474984049797058],
INFO :                  'Client_2': [0.6760547757148743,
INFO :                                0.5368837714195251]})]}
INFO :         History (metrics, distributed, evaluate):
INFO :          {'accuracy': [(1, 0.8008333245913187),
INFO :                        (2, 0.874999980131785),
INFO :                        (3, 0.9045000076293945)]}History (metrics, centralized):
INFO :          {'accuracy': [(0, 0.11739999800920486),
INFO :                        (1, 0.807699978351593),
INFO :                        (2, 0.881600022315979),
INFO :                        (3, 0.9101999998092651)]}

So my point is the History (loss, distributed). In my understanding this is the fit loss aggregated from all clients per round. For example for round 1 it is 0.7618… Looking at the loss per client and epoch in round 1, there are no values even smaller than 0.9… so my question is: How is loss, distributed calculated exactly? The num_examples among the clients is equal, so the aggregated loss distributed should be the average of the losses in the dictionary loss_per_client_and_epoch or am I understanding it entirely wrong?

Thanks for your help.

javier · June 11, 2024, 8:09am

Hi @vikwal, this is a great first question!

In fit() you can return as metrics (i.e. the last return element) anything you want. It could be a loss, accuracy, something else. Therefore it is up to you and your particular setting how to define the distributed loss. From my previous experience what I typically do is:

A client returns the average train loss (so just a single scalar)
in the aggregate fit metrics function, i take the average of all training losses.

To do something similar to the above, you could try to edit the code in your client’s fit() and only return the average loss.

system · June 18, 2024, 8:10am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Metrics_distributed always stays empty {} Flower Framework	1	283	March 5, 2024
Clarification on the "results" variable passed between aggregate_fit (server side) and fit (client side) Flower Help - Intermediate	0	164	February 21, 2024
Communicate List[Float] in fit() function from Clients Flower Help - Beginners metrics , flower	2	65	October 26, 2024
Client fail to call fit function General flower	2	18	May 20, 2025
Using Linear Regression Model With Flower Flower Help - Beginners	1	157	March 5, 2024

Not clear how loss distributed is calculated

Related topics