Synchronization Before Aggregation

This question was migrated from Github Discussions.

Original questions 1:
Could anybody specify where (module location), the synchronization happens on the server-side. In other words, the clients execute the local training of the model, updating the weights. After that, they send it back to the server where a FedAvg Strategy is used to aggregate the results. I need to know at what part this process takes place, and by process, I mean, the server waiting for the updated models, and after it receives them, he aggregates the results.

Answer 1:
" Look into src/py/flwr/server/ , methods fit(.) , fit_round(.) , and fit_clients(.) . The synchronization happens there."

Question 2:
“Thank you. I have one further question if you might be able to help. Say that I would like to send a metric per epoch during the local model trainig. Do you have any Idea about how I could accomplish the synchronization of such data. In specific, I would like to send a norm from each client participating in the training, and using this metric, I would like to achieve a sort of sync, by calculating a sum over all the norms received which needs to be within a thresholded range in order not to stop the round”

Answer 2:
" I’m not sure I got the question: so each of your clients send a norm to the server after their local training, and the server use that values to stop the round if one of these are outside a certain threshold? If that’s the case, you can achieve this by implementing a custom strategy. By supposing that you use FedAvg as a starting point, you just modify the method aggregate_fit(.) to retrieve the norm values from the FitRes objects associated with each client, so you can inspect them and return (None, {}) if the values are outside the threshold."

Question 3:
“Exactly. That’s what I thought too, yet, my problem is that my interest would be to send the norms after each local epoch of the clients and the aggregate fit is triggered once the round is finisihed. So I was thinking that maybe was not the way to go.”

Answer 3:
"That’s exactly how FL works. Just enforce 1 local epoch for each round. I don’t see any problem in implementing that solution :slight_smile: "