Round 0 for data scaling

johannes · October 31, 2025, 2:55pm

Hi,

In my federated learning setup, I need all clients to apply the same feature scaling (same mean/std per feature) so that the global model trains in a consistent feature space.

Right now I’m thinking of doing a “Round 0” before training:

Each client computes local summary stats for each feature (e.g. count, sum, sum of squares, or min/max).
The server aggregates these stats across clients to get global mean/std (or global min/max).
The server sends those global scaling parameters back to every client.
All later FL rounds use those shared scaling parameters to preprocess the data locally.

Is this the right pattern in Flower? Is there already a recommended / built-in way to do this (e.g. using custom Fit / Evaluate instructions, or a pre-training phase), or should I just implement this as a custom initial round myself?

williamlm · November 3, 2025, 6:48am

Hi @johannes,

I suppose that it can be done in the way that you’re proposing, could you calculate this as a pre-training step and maybe pass as parameters to the system itself through the pyproject.toml to the Context object that is used to set parameters at server- and client side?

johannes · November 3, 2025, 11:07am

Hi @williamlm ,

Thank you for your reply!

It sounds like there are two valid approaches:

In-run “Round 0” for feature statistics
Use the first FL-round to request per-feature statistics from each client, aggregate them on the server to get the global scaler, and then use them for scaling in all subsequent training rounds.
Pre-compute & pass via Context
Run a separate 1-round FL step to collect/aggregate stats, then store the resulting scaler parameters in the pyproject.toml of the “real” project so they’re available via the Flower Context.

Does that match what you had in mind? What would be your preferred way of doing this?

williamlm · November 3, 2025, 11:55am

Hi @johannes,

I’d opt in the shorter term for a slight modification of option 2 (more or less what you say), but it wouldn’t really be an FL step per-se, more like a data statistics collection → then store relevant metrics to use as input context when starting FL.

That said, in the longer term, maybe it would be interesting to look at a more dynamical approach, where feature statistics are computed upon demand? Say you not only have a data pool that doesn’t change, but incoming stream data or datasets that evolve over time? This would then probably require a more sophisticated baseline strategy.

I know I went on a bit of a tangent there but hope it helps somewhat.

system · November 10, 2025, 11:55am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Client models after federation Flower Framework	0	192	February 26, 2024
How to Manage Learning Rate Scheduling (and similar operations) in Flower? Flower Help - Beginners flower	5	226	July 24, 2025
How to dynamically update num_rounds on the server and ensure clients keep training Flower Help - Intermediate	3	134	February 21, 2025
How to get the model parameters without aggregating? Flower Help - Intermediate	7	451	March 22, 2024
Synchronization Before Aggregation Flower Help - Beginners	0	149	February 21, 2024

Round 0 for data scaling

Related topics