Contributing to the Flower Github repository

mistersunshine · November 2, 2024, 9:09am

Hi,

I’ve been working on a small implementation of the Scaffold algorithm on the flower simulation framework. It’s a personal project and i’ve been wondering if it’s possible to contribute to the official flower repository since i’ve not seen the Scaffold implementation in the baselines folder or examples folder.

My question is quite general, how do you contribute to the official repo and is this possible ? I’m also worried about maintenance on any API’s break for futur version of the framework.

Best regards,
Mr.Sunshine.

javier · November 5, 2024, 8:06pm

Hi @mistersunshine ,

It’s great to hear you are interested in contributing to the repo. To contribute a new baseline (as it would be the case for SCAFFOLD) you can see the instructions here: Contribute Baselines - Flower Baselines 1.13.0

I note that recently @wittenator also brought up the idea of adding SCAFFOLD (See this post you also commented on: Sending truly arbritrary data from client to server)

wittenator · November 5, 2024, 8:33pm

I’ve also implemented SCAFFOLD already here GitHub - wittenator/flower at rework_fedprox_baseline , but due to paper deadlines I currently don’t have time to do the reproduction of the paper results (they use custom partitioning and models). Maybe that’s a good starting point?

mistersunshine · November 8, 2024, 9:09pm

Hi @javier and @wittenator,

Thank you both for your answers. It’s finally week-end, so i’ve time to respond.

First of all, i congrats you @wittenator for your implementation, it’s not easy to implement this algorithm in the Flower simulation environnement and i find it great on what you did for the marshal_numpy and unmarshal_numpy functions.
I still have few questions on the equations you wrote, since i also implemented those equations but in a different way.

For c_global equations (strategy.py)

Your computation of c_global is as follow :

aggregated_control = [(float(num_clients)/self.total_num_clients) * reduce(np.add, c_delta) / num_clients for c_delta in zip(*c_delta_results)]
self.global_control = aggregated_control

And it’s defined in the original article as follow :
image_2024-11-08_212632783
I think you miss the original weight on the aggregated_control. And you erase them each round, i did it like that on my side :

for i in range(len(self.c_global)):
  c_delta_avg = c_delta_sum[i] / self.num_clients
  self.c_global[i] += torch.tensor(c_delta_avg)

Have you some sources on why you did it like that ? Maybe i missed some mathematical proof on what you did.

For the weight regularization (model.py)

I’ve seen some implementation also doing what you did, so i’m not a hundred percent sure.

Your computation of the weight regularization is as follow :

for _ in range(epochs):
    for batch in trainloader:
        images = batch["img"]
        labels = batch["label"]`Preformatted text`
        if images.shape[0] == 1:
            # Skip batches with a single image
            continue
        optimizer.zero_grad()
        loss = criterion(net(images.to(device)), labels.to(device))
        loss.backward()
        for (name,param), c, c_i in zip(
            net.state_dict().items(), global_control, local_control
        ):
            if param.requires_grad:
                # The global control does not have batchnorm dimensions at the beginning, but is zero at this point in time
                if c.shape == c_i.shape:
                    param.grad.data += torch.tensor(c - c_i).to(device)
                else:
                    param.grad.data += torch.tensor(-c_i).to(device)
        optimizer.step()

And it’s defined in the original article as follow :

I feel like it is expressed as weight regularization should be outside of batch loop. And i find the learning and and y_i missing from your loop.

I implemented it like that :

# Train the global model with local data
for _ in range(epochs):
    prebatch_params = [param.detach().clone() for param in self.net.parameters()]
    
    for batch in trainloader:
        images = batch["image"]
        labels = batch["label"]
        self.optimizer.zero_grad()
        self.criterion(net(images.to(device)), labels.to(device)).backward()
        self.optimizer.step()

    # Adds Scaffold computation of c_diff in parameters
    for param, y_i, c_l, c_g in zip(self.net.parameters(), prebatch_params, c_local, c_global):
        param.grad.data = y_i - (learning_rate * (param.grad.data - c_l + c_g))

I did it this way so my weights would be updated after i learned from my local gradient g_i(y_i). I’m still unsure about my implementation ! Since i saw the few implementation available on Internet did it your way.

That’s all for my remarks.

Just to be aware, be careful if you use the Flower simulation environnement, because you store c_local in the client constructor but it is erased each rounds. On my side i had to serialize every c_local in a file.

See this post for more information.

For my full implementation you can find it here on Github. I will soon push force a whole new version, so be careful if you see any errors in my code, it may already been corrected.

Would be a pleasure to read more from you.

Best regards,
Mr. Sunshine.

wittenator · November 11, 2024, 9:53am

Hi @mistersunshine ,

thanks for deeply looking into my implementation! My implementation didn’t get much love yet and I didn’t replicate the results from the paper, but I’ll try to give my understanding of the paper. During my implementation I mainly followed the algorithmic description of SCAFFOLD:

For c_global equations:
I actually missed adding the control deltas to the global control! Thanks for catching this! The algorithm says that only the delta control is communicated from the client to the server. So (c^+ - c_i) is computed on the client, communicated to the server and then averaged there. This is equivalent to sending both old and new control to the server and performing the subtraction there, but is more communication efficient.

For the weight regularization:
As far as I know methods like e.g. weight decay (L2 weight regularization) add the weight to the loss function to influence the gradient. That’s why I would say that we do need to add the global and local control during each step/each batch. The paper has a bit of unusual notation at this point imo. If we replace the y with w and we remove the local and global control from line 10 in the algorithm we get the usual formulation of a SGD iteration step. This is where the learning takes place. In my experience most of these algorithms are initially proposed with SGD since the analysis is much easier, but most of them benefit from the usual training tricks (momentum, weight decay or other optimizers like Adam), so I decided to use the Pytorch optimizers to implement line 10. If we choose torch.optim.SGD with no momentum und no weight decay, we get exactly line 10 from the SCAFFOLD algorithm, but we do have the option to try additional features this way as well. Disclaimer at this point: In my experiments on very heterogeneous cifar10 data, Scaffold performs still pretty bad in my implementation, so it is very possible that there is still a bug or an error in my understanding.

Thanks for the remark regarding the destruction of the clients. I did in fact forget about this, but I implemented it now by sending the whole client state to the server back and forth. This was actually more painful in Flower than I expected it to be.

mistersunshine · November 11, 2024, 11:37am

Good to read more from you @wittenator.

I didn’t noticed the equivalence of the weight regularization with SGD, so it makes sense to do it on every batch.

From previous experience, i’ve stopped trying to use cifar10 data, it’s harder to train with heterogeneous clients distribution, instead the original article uses EMNIST (Extended MNIST), you can find one of the dataset on hugging face (EMNIST by letters), i’ve no idea what they used, they didn’t mentionned it in the paper.

Also, if you want a easy way to get the clients c_local you can find a code here on my Github, it only serialize on disk the bytes of the vector for each clients. I also clean up at every restart so no woops ! You only needs to have write and read permissions for your process Python and on the disk.

I’m finishing an experiment for a personal project, could i cite you in my work ? It would be great to share personnal info’s in DM if you are okay with that.

Best regards,
Mr. Sunshine.

Topic		Replies	Views
Scaffold implementation Flower Help - Intermediate	6	208	December 30, 2024
Client models after federation Flower Framework	0	160	February 26, 2024
Announcing Flower 1.11 General	8	236	October 17, 2024
Announcing Flower 1.10 General	2	405	September 13, 2024
Skeleton Project General	2	40	July 3, 2024

Contributing to the Flower Github repository

Related topics