Hi @javier and @wittenator,
Thank you both for your answers. It’s finally week-end, so i’ve time to respond.
First of all, i congrats you @wittenator for your implementation, it’s not easy to implement this algorithm in the Flower simulation environnement and i find it great on what you did for the marshal_numpy
and unmarshal_numpy
functions.
I still have few questions on the equations you wrote, since i also implemented those equations but in a different way.
- For c_global equations (
strategy.py
)
Your computation of c_global is as follow :
aggregated_control = [(float(num_clients)/self.total_num_clients) * reduce(np.add, c_delta) / num_clients for c_delta in zip(*c_delta_results)]
self.global_control = aggregated_control
And it’s defined in the original article as follow :
I think you miss the original weight on the aggregated_control. And you erase them each round, i did it like that on my side :
for i in range(len(self.c_global)):
c_delta_avg = c_delta_sum[i] / self.num_clients
self.c_global[i] += torch.tensor(c_delta_avg)
Have you some sources on why you did it like that ? Maybe i missed some mathematical proof on what you did.
- For the weight regularization (
model.py
)
I’ve seen some implementation also doing what you did, so i’m not a hundred percent sure.
Your computation of the weight regularization is as follow :
for _ in range(epochs):
for batch in trainloader:
images = batch["img"]
labels = batch["label"]`Preformatted text`
if images.shape[0] == 1:
# Skip batches with a single image
continue
optimizer.zero_grad()
loss = criterion(net(images.to(device)), labels.to(device))
loss.backward()
for (name,param), c, c_i in zip(
net.state_dict().items(), global_control, local_control
):
if param.requires_grad:
# The global control does not have batchnorm dimensions at the beginning, but is zero at this point in time
if c.shape == c_i.shape:
param.grad.data += torch.tensor(c - c_i).to(device)
else:
param.grad.data += torch.tensor(-c_i).to(device)
optimizer.step()
And it’s defined in the original article as follow :
I feel like it is expressed as weight regularization should be outside of batch loop. And i find the learning and and y_i missing from your loop.
I implemented it like that :
# Train the global model with local data
for _ in range(epochs):
prebatch_params = [param.detach().clone() for param in self.net.parameters()]
for batch in trainloader:
images = batch["image"]
labels = batch["label"]
self.optimizer.zero_grad()
self.criterion(net(images.to(device)), labels.to(device)).backward()
self.optimizer.step()
# Adds Scaffold computation of c_diff in parameters
for param, y_i, c_l, c_g in zip(self.net.parameters(), prebatch_params, c_local, c_global):
param.grad.data = y_i - (learning_rate * (param.grad.data - c_l + c_g))
I did it this way so my weights would be updated after i learned from my local gradient g_i(y_i). I’m still unsure about my implementation ! Since i saw the few implementation available on Internet did it your way.
That’s all for my remarks.
Just to be aware, be careful if you use the Flower simulation environnement, because you store c_local in the client constructor but it is erased each rounds. On my side i had to serialize every c_local in a file.
See this post for more information.
For my full implementation you can find it here on Github. I will soon push force a whole new version, so be careful if you see any errors in my code, it may already been corrected.
Would be a pleasure to read more from you.
Best regards,
Mr. Sunshine.