Hey everyone. I’m trying to complete the PyTorch tutorial, and I’m stuck on the very last portion of part 4, Communicating Arbitrary Objects (Communicate custom Messages - Flower Framework)
I had to fix quite a few undocumented errors just to get the code to execute. However, I never did get the metadata to print to the screen, and I’m also stuck on a final roadblock: no model aggregation or training was actually happening. As you can see in the screenshot below, the global accuracy and loss values are completely frozen and identical across all 5 rounds.
I’ve compiled a list of all the gaps, bugs, and breaking issues I encountered while stepping through this last section. I have given up at this point and was not able to complete the tutorial. However, the attempts to fix it are documented below.
On the line "train_metadata = train_fn*(…)",** the parameters have a … in the parentheses instead of the actual required parameters*
- Issue:
- There are no visual clues or notations that this should be filled in and someone following the tutorial quickly, or a beginner python programmer, can easily miss this
- What happens when I run it as written:
- Runs to completion with errors printed to the screen about the missing parameters
- no metadata printed to terminal
- Fix: Add the parameters in the function call instead of the ellipses, or add a note that the parameters should replace the ellipses so the user knows to add them in their local code prior to running
The line of code "config_record = ConfigRecord*({“meta”:** train_meta_bytes**})"** causes an error*
- Issue:
- ConfigRecord is never added to the client app’s import statement in the snippet instructions.
- My IDE, Visual Studio code suggests chaining this to config_record, and a beginner or someone revisiting the tutorial after a break, may not realize that this is also wrong
- What happens when I run it as is (or with the config_record suggested fix):
- Does not run with “ConfigRecord”, throws an error prior to terminal execution advising me that ConfigRecord is not defined
- (with the incorrect IDE fix) Runs to completion with errors about accessing config_record with no associated value
- no metadata printed to screen
- Fix:
- Add ConfigRecord to the import statement (from flwr.app import ArrayRecord, Context, Message, MetricRecord, RecordDict, ConfigRecord)
The new data class is not imported into the strategy file
- Issue:
- The new dataclass is not imported on the server side and the tutorial does not show this step so errors occur when the strategy tries to unpack the metadata
- What happens when I run the code as-is:
- The code throws serialization/type errors
- no metadata printed to screen
- Fix:
- Import the newly created data class at the top of the custom strategy file
Then, I experienced issues with asdict, continued frozen metrics, and other errors that I just stopped documenting because I was never able to get to the bottom of them and fix them.
So, after way too much time, I abandoned this tutorial and reverted back to the working code from the first half of part 4.
I invite any tips to send custom config records and process them on the server side as I will most likely need this for my research in the future.
