Non-determinism only when number of clients is increased

pgcampana · March 13, 2025, 12:59pm

Hi together, I wanted to ask you a question regarding determinism in Flower. When trying to ensure it, I was working only with 2 clients and 5 rounds to make debugging faster. After some effort, I managed to get 100% identical results when running my experiment twice. Then, I assumed I had achieved determinism. But when I switched back to 5 clients and 20 rounds (my usual setting), I noticed that my results were only 100% identical in the first round. In the second round, the last decimals of the results started to change and then with each round the results started changing more. Does anyone have an idea of what could be causing this non-determinism only when the number of clients is increased? I appreciate very much any help. Thanks a lot in advance!

chongshenng · March 23, 2025, 9:46pm

Hi @pgcampana, welcome to the Flower community!

Thanks for your question. It’s good to see that you’re able to get reproducible results for 2 clients. For the 5 client setup, is it reasonable to assume you’ve 100% client participation?

It is indeed a bit strange that you can’t reproduce the results with more clients. Can you share some details of your implementation so that we can identify a possible cause for this?

mgarofalo · March 24, 2025, 8:25am

Hi @pgcampana, I encountered a similar issue a few months ago.

In my case, the problem was caused by the order in which the clients were aggregated. My explanation is that since model weights are stored in finite precision, summing values in different orders can lead to tiny discrepancies due to intermediate rounding errors. Although small, these differences accumulate over multiple rounds of federated learning, causing two seemingly identical runs to diverge over time.

In Flower, clients operate as separate threads, meaning their arrival order is not guaranteed. By enforcing a fixed order when processing clients in the aggregator, I was able to resolve the issue.

To illustrate the problem, consider this simple example in NumPy

import numpy as np

np.random.seed(42)
num_clients = 5
numbers = np.random.rand(num_clients)
sorted_numbers = np.array(sorted(numbers))

print(numbers)
print(sorted_numbers)

for round in range(10):
    print(numbers.sum() - sorted_numbers.sum())

    np.random.seed(42)
    noise = np.random.normal(np.mean(numbers), 0.00001, num_clients)

    np.random.seed(42)
    sorted_noise = np.random.normal(np.mean(sorted_numbers), 0.00001, num_clients)
    
    numbers = numbers + noise
    sorted_numbers = sorted_numbers + sorted_noise

pgcampana · March 24, 2025, 5:49pm

Hi @chongshenng and @mgarofalo, thank you very much for your help. I indeed have 100% client participation. So if I am using FedAvg as strategy, I suppose I should change its aggregate_fit function so that clients are processed in a fixed order, right?

I am already passing the client_ids as a metric to the aggregation, so I guess I could do something like this?

class NewFedAvg(FedAvg):

        def aggregate_fit(
                self,
                server_round: int,
                results: list[tuple[ClientProxy, FitRes]],
                failures: list[Union[tuple[ClientProxy, FitRes], BaseException]],
            ) -> tuple[Optional[Parameters], dict[str, Scalar]]:
                """Aggregate fit results using weighted average."""
        
                # Sort results by client_id to ensure determinism
                results.sort(key=lambda x: x[1].metrics["client_id"])
        
                # Rest of aggregate_fit

mgarofalo · March 25, 2025, 4:00am

Yes, that is exactly what I did: I sent the client_id as a metric and sorted the client models by client_id in the aggregate_fit. In my case, this was the key to achieving reproducible results across rounds (in addition to setting the random state in Torch, NumPy, etc.).

system · April 1, 2025, 4:00am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to dynamically update num_rounds on the server and ensure clients keep training Flower Help - Intermediate	3	39	February 21, 2025
How Does Flower Select Clients for Each Federated Learning Round? Flower Help - Beginners flower	1	146	November 21, 2024
Client not getting selected? How to debug? Flower Help - Beginners flower	2	67	April 4, 2025
Tensorflow Implementation with More than 2 clients Flower Help - Beginners	0	125	February 26, 2024
RandomForestRegression Flower Help - Beginners	5	88	February 20, 2025

Non-determinism only when number of clients is increased

Related topics