Hi together, I wanted to ask you a question regarding determinism in Flower. When trying to ensure it, I was working only with 2 clients and 5 rounds to make debugging faster. After some effort, I managed to get 100% identical results when running my experiment twice. Then, I assumed I had achieved determinism. But when I switched back to 5 clients and 20 rounds (my usual setting), I noticed that my results were only 100% identical in the first round. In the second round, the last decimals of the results started to change and then with each round the results started changing more. Does anyone have an idea of what could be causing this non-determinism only when the number of clients is increased? I appreciate very much any help. Thanks a lot in advance!
Hi @pgcampana, welcome to the Flower community!
Thanks for your question. It’s good to see that you’re able to get reproducible results for 2 clients. For the 5 client setup, is it reasonable to assume you’ve 100% client participation?
It is indeed a bit strange that you can’t reproduce the results with more clients. Can you share some details of your implementation so that we can identify a possible cause for this?
Hi @pgcampana, I encountered a similar issue a few months ago.
In my case, the problem was caused by the order in which the clients were aggregated. My explanation is that since model weights are stored in finite precision, summing values in different orders can lead to tiny discrepancies due to intermediate rounding errors. Although small, these differences accumulate over multiple rounds of federated learning, causing two seemingly identical runs to diverge over time.
In Flower, clients operate as separate threads, meaning their arrival order is not guaranteed. By enforcing a fixed order when processing clients in the aggregator, I was able to resolve the issue.
To illustrate the problem, consider this simple example in NumPy
import numpy as np
np.random.seed(42)
num_clients = 5
numbers = np.random.rand(num_clients)
sorted_numbers = np.array(sorted(numbers))
print(numbers)
print(sorted_numbers)
for round in range(10):
print(numbers.sum() - sorted_numbers.sum())
np.random.seed(42)
noise = np.random.normal(np.mean(numbers), 0.00001, num_clients)
np.random.seed(42)
sorted_noise = np.random.normal(np.mean(sorted_numbers), 0.00001, num_clients)
numbers = numbers + noise
sorted_numbers = sorted_numbers + sorted_noise
Hi @chongshenng and @mgarofalo, thank you very much for your help. I indeed have 100% client participation. So if I am using FedAvg as strategy, I suppose I should change its aggregate_fit function so that clients are processed in a fixed order, right?
I am already passing the client_id
s as a metric to the aggregation, so I guess I could do something like this?
class NewFedAvg(FedAvg):
def aggregate_fit(
self,
server_round: int,
results: list[tuple[ClientProxy, FitRes]],
failures: list[Union[tuple[ClientProxy, FitRes], BaseException]],
) -> tuple[Optional[Parameters], dict[str, Scalar]]:
"""Aggregate fit results using weighted average."""
# Sort results by client_id to ensure determinism
results.sort(key=lambda x: x[1].metrics["client_id"])
# Rest of aggregate_fit