Increasing or in general suspicious high loss after first round of training

johanrubak · January 14, 2025, 8:12am

Hi
I have been simulating my federated setup a couple of times now, trying to investigate performance changes due to different stuff.

My question though, is more related to a suspicious tendency I have noticed. So before training, the model is initiated with random weights, and tested on the test data, which gives a poor initial performance, representing round 0 in the pictures below. Then the federated training begins, and the loss of the aggregated model on the test data decreases. But what I have noticed is an increasing loss from first initial model to the aggregated model after round 1. This seems a bit odd to me. I have reviewed the code a lot of times, and can’t really see anything that should lead to wrong kind of aggregation after the first round…

But maybe someone, has experienced the same or is able to figure out what is going on?

Hope to hear from someone!

// Johan

pan-h · January 15, 2025, 3:48pm

Seems weird. May I know how did you calculate the federated loss? Was it aggregated from local test losses? Could you pick the data of one client and let it train on its own data (ignore FL for now). Will its loss go up and then down too?

johanrubak · January 16, 2025, 12:24am

The federated loss is calculated from evaluating the aggregated model on a testset after each round. When just training a normal centralized model the loss is decreasing and converging in a normal way. I am tracking the progress while training on each client aswell on a local train and val dataset. Each client has normal decrease in loss for train and val aswell.

pan-h · January 16, 2025, 1:28pm

I believe this is likely due to divergences among local models. In the first round, since the model is initialized randomly, training on different local datasets can cause the local models to diverge significantly. As a result, the test error increases after the first round. One potential solution is to reduce the number of local epochs, particularly for the first round. However, it’s not so surprising to see the loss increase in the first round especially with non-IID data, so no need to worry too much about it, I think.

johanrubak · January 16, 2025, 1:56pm

Thanks for the response! That makes sense! Another thing i have noticed, which I can’t really explain is something related to experimenting with one bad client.

So i am simulating 5 clients, one of them is considered bad, so they have drastically worse labels to train on than the others. So the bad properties learned from the bad client is aggregated, and makes the aggregated model (federated loss) worse on the testset as expected. But what is quite weird is when i track the training progress on each client, the bad performance is shifting between clients for each round, which can be seen in the image.

But the way i have set it up, all clients is getting the newest same aggregated model before training next round, so it doesn’t really make sense, why the bad performance is shifting between clients and not just the same (worse) ish for all clients?

pan-h · January 16, 2025, 4:55pm

That’s indeed an interesting case. I’m not entirely sure, but it might be because your model was trying to learn from the hard cases, which caused the test loss on another client to increase. But I’m not sure why it seems to show a periodic pattern. Perhaps you could try setting the number of local epochs to a very small number, say 1, to see if the pattern persists. I can’t really tell too much from a single plot, I’m afraid.

pan-h · January 18, 2025, 1:08pm

Btw, have you registered for the Flower Summit 2025? Welcome to join us online or in persion! Flower AI Summit 2025

johanrubak · January 20, 2025, 2:17pm

Sounds nice, will definetly consider it, thanks!

system · January 27, 2025, 2:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Model aggregation before all clients have finished the round Flower Help - Intermediate	0	50	January 22, 2025
FL basics question: labelling for local model training in FL Flower Help - Beginners faq	5	200	March 22, 2024
Non-determinism only when number of clients is increased Flower Help - Intermediate flower	5	59	March 25, 2025
Random number of clients in every round of federated training Flower Help - Intermediate	0	157	February 21, 2024
Synchronization Before Aggregation Flower Help - Beginners	0	130	February 21, 2024

Increasing or in general suspicious high loss after first round of training

Related topics