In the tutorial there is a section that describes how arbritrary data can be send from client to server and the other way around via the dictionary that it returned in the e.g. fit method of the client. I tried sending numpy arrays like this return ( get_weights(self.net), len(self.trainloader.dataset), {"train_loss": train_loss, "c_delta": c_delta, "y_delta": np.array(y_delta)},, but the client dies with an exception that only rudimentary types like lists of floats etc. are supported.
TypeError("Not all values are of valid type. Expected `typing.Union[int, float, str, bytes, bool, list[int], list[float], list[str], list[bytes], list[bool]]` but `<class 'numpy.ndarray'>` was passed."), TypeError("Not all values are of valid type. Expected `typing.Union[int, float, str, bytes, bool, list[int], list[float], list[str], list[bytes], list[bool]]` but `<class 'numpy.ndarray'>` was passed."),
Is there a reason and a workaround for this limitation? I am currently implementing SCAFFOLD in Flower and this makes it very hard to do that otherwise.
In short: you can send your arbitrary data strucutres as bytes. Then, you deserialize it on the server-side (e.g. in the strategy) and use it as you normally would.
I also tried packing the data manually to bytes. This approach is fine for small, one-dimensional lists/arrays, but once you need to send bigger and more payloads such as e.g. the control variates from SCAFFOLD, you have to pickle to a BytesIO object which gets immensely slow. What I don’t understand yet is, why the logic that is in place to marshal NDArrays is not simply applied to the return values of the fit method. I think that it would also be really nice to be more forward with this limitation in the documentation and the tutorial since arriving at the root cause takes quite a bit of time and debugging since the debugging experience with Ray and Flower is not awesome.
What do you mean by this? the first return argument of the fit() method expects a list of NDArrays. It will internally get serialized/deserialized. The 3rd/last argument let’s you communicate pretty much anything you want (you could use Python’s pickle to send your own object types etc too).
How large are the control variables in SCAFFOLD in your setting?
I am wondering why the logic that is in place to serialize NDArrays in the first argument is not used to serialize NDArrays in the third argument. From the outside this seems to me like an easy addition that makes a developer’s life much easier. The control variates are essentially another model. In this case I could probably hack it a bit by sending it instead of the model weights, but there are other methods such as federated distillation or FedAUX, where bigger auxiliary data is returned from the client.