FL basics question: labelling for local model training in FL

marcoesposito890 · March 21, 2024, 3:59pm

Hello, I am just starting now with studying FL and something is very unclear to me regarding labels.

From what I understand of FL, the idea is to train models on devices so that only model weights are shared with a central server. What is unclear to me is how devices would label their data to then train their models?

For example, let’s say I have a federated vision system with multiple cameras for object detection. Cameras have their initial model, then acquire new data for training, but how would they generate the labels? With the global model? Wouldn’t that make them learn possible errors?

ruairi.j.grant · March 21, 2024, 7:38pm

Fl doesn’t have anything unique with data labelling. The paradigm shift of FL is that the training is done on multiple different devices (cameras) and then aggregated on a central server. That process is then repeated iteratively, The actual training, be it supervised or unsupervised is unchanged.

So in your example on the vision dataset, you would need to provide labeled training data for each of the cameras if you wanted to do supervised learning.

Hopefully that helps.

marcoesposito890 · March 22, 2024, 7:18am

Thank you a lot for your explanation. To delve into the decentralized nature of Federated Learning a bit more, could you elaborate on how those devices in a FL setup, such as cameras in a vision system, generate or receive labels for their data in a decentralized manner? Considering the absence of a central authority for labeling, how do these devices effectively annotate their data for training without relying on a centralized labeling process? Since sending data to a server for labelling would somehow go against the whole premise of mantaining data decentralized.

I’ve seen many examples online in which they simply split a dataset (labels+images) among different clients, and while this was helpful to understand the nature of FL from a theoretical point of view, it still leaves me puzzled about a production-ready architecture and how clients would produce data for training by themselves.

ruairi.j.grant · March 22, 2024, 10:26am

Hi Marco, the specifics of labelling images in real time are a little outside what I know about. Perhaps someone else who knows more can jump in here.

Could that central labelling authority exist on each client and label the data there?

daniel · March 22, 2024, 2:16pm

There’s no fundamental difference between federated learning and centralized learning when it comes to data labelling.

The biggest real-world difference is to decide which approach is practical in which setting. In federated learning, you can have very different scenarios, and depending on the scenario, some approaches are either practical or not. One dimension here is cross-device vs cross-silo FL.

The example you gave is a typical cross-device setting. In that case, manually labelling the data on the device might not be practical. One option that would not require data labelling would be to use self-supervised learning to pre-train a useful backbone on real-world video data.

system · March 29, 2024, 2:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Client models after federation Flower Framework	0	160	February 26, 2024
How can I implement a YOLO model using the Flower framework? Flower Help - Intermediate	6	275	April 2, 2025
How to Manage Learning Rate Scheduling (and similar operations) in Flower? Flower Help - Beginners flower	5	23	July 24, 2025
Feedback on Federated Learning Uni-seminar Project Research metrics , faq	1	93	July 21, 2025
How to test data in VFL? Flower Framework	2	121	April 12, 2024

FL basics question: labelling for local model training in FL

Related topics