Hi,
In my federated learning setup, I need all clients to apply the same feature scaling (same mean/std per feature) so that the global model trains in a consistent feature space.
Right now I’m thinking of doing a “Round 0” before training:
-
Each client computes local summary stats for each feature (e.g. count, sum, sum of squares, or min/max).
-
The server aggregates these stats across clients to get global mean/std (or global min/max).
-
The server sends those global scaling parameters back to every client.
-
All later FL rounds use those shared scaling parameters to preprocess the data locally.
Is this the right pattern in Flower? Is there already a recommended / built-in way to do this (e.g. using custom Fit / Evaluate instructions, or a pre-training phase), or should I just implement this as a custom initial round myself?
Hi @johannes,
I suppose that it can be done in the way that you’re proposing, could you calculate this as a pre-training step and maybe pass as parameters to the system itself through the pyproject.toml to the Context object that is used to set parameters at server- and client side?
Hi @williamlm ,
Thank you for your reply!
It sounds like there are two valid approaches:
-
In-run “Round 0” for feature statistics
Use the first FL-round to request per-feature statistics from each client, aggregate them on the server to get the global scaler, and then use them for scaling in all subsequent training rounds.
-
Pre-compute & pass via Context
Run a separate 1-round FL step to collect/aggregate stats, then store the resulting scaler parameters in the pyproject.toml of the “real” project so they’re available via the Flower Context.
Does that match what you had in mind? What would be your preferred way of doing this?
Hi @johannes,
I’d opt in the shorter term for a slight modification of option 2 (more or less what you say), but it wouldn’t really be an FL step per-se, more like a data statistics collection → then store relevant metrics to use as input context when starting FL.
That said, in the longer term, maybe it would be interesting to look at a more dynamical approach, where feature statistics are computed upon demand? Say you not only have a data pool that doesn’t change, but incoming stream data or datasets that evolve over time? This would then probably require a more sophisticated baseline strategy.
I know I went on a bit of a tangent there but hope it helps somewhat.
1 Like