Accuracy difference in local evaluation and central evaluation

tomyfrancis25 · October 24, 2025, 8:20pm

I trained a distilbert , using 3 raspberry as clients , training only the final classifier ,while keeping base model frozen, after aggregation and after testing , there is small difference in the accuracy of local evaluation and centralized server evaluation by 4%,i used the same batch size in testing in both these evaluations.any one has an idea why there is a shift in accuracy.??

daniel · October 30, 2025, 12:27pm

There are several possibilities:

Are you using the same evaluation dataset on both the server and the client? Using different datasets would naturally lead to different evaluation results.
Are you evaluating the locally trained model on the client and the aggregated model on the server? If you use the same dataset on both the client and the server, but you evaluate the locally trained model on the client vs the aggregated model on the server, then you should also expect a difference in result.

tomyfrancis25 · October 30, 2025, 10:02pm

Dear Daniel, thank you for the response. I was using a different dataset for local and server evaluation, which was causing this shift,the issues is now solved.

Topic		Replies	Views
flower-NLP-Raspberry PI General flower	7	158	October 9, 2025
Increasing or in general suspicious high loss after first round of training Flower Help - Intermediate	7	349	January 20, 2025
Client models after federation Flower Framework	0	196	February 26, 2024
Metrics_distributed always stays empty {} Flower Framework	1	397	March 5, 2024
Simulation succeeding, but only showing eval metric (no train metric) Flower Help - Intermediate flower , metrics	2	206	January 17, 2026

Accuracy difference in local evaluation and central evaluation

Related topics