Unable to run the last part of the PyTorch tutorial, Communicating Arbitrary Objects

Hey everyone. I’m trying to complete the PyTorch tutorial, and I’m stuck on the very last portion of part 4, Communicating Arbitrary Objects (Communicate custom Messages - Flower Framework)

I had to fix quite a few undocumented errors just to get the code to execute. However, I never did get the metadata to print to the screen, and I’m also stuck on a final roadblock: no model aggregation or training was actually happening. As you can see in the screenshot below, the global accuracy and loss values are completely frozen and identical across all 5 rounds.

I’ve compiled a list of all the gaps, bugs, and breaking issues I encountered while stepping through this last section. I have given up at this point and was not able to complete the tutorial. However, the attempts to fix it are documented below.

On the line "train_metadata = train_fn*(…)",** the parameters have a … in the parentheses instead of the actual required parameters*

  • Issue:
    • There are no visual clues or notations that this should be filled in and someone following the tutorial quickly, or a beginner python programmer, can easily miss this
  • What happens when I run it as written:
    • Runs to completion with errors printed to the screen about the missing parameters
    • no metadata printed to terminal
  • Fix: Add the parameters in the function call instead of the ellipses, or add a note that the parameters should replace the ellipses so the user knows to add them in their local code prior to running

The line of code "config_record = ConfigRecord*({“meta”:** train_meta_bytes**})"** causes an error*

  • Issue:
    • ConfigRecord is never added to the client app’s import statement in the snippet instructions.
    • My IDE, Visual Studio code suggests chaining this to config_record, and a beginner or someone revisiting the tutorial after a break, may not realize that this is also wrong
  • What happens when I run it as is (or with the config_record suggested fix):
    • Does not run with “ConfigRecord”, throws an error prior to terminal execution advising me that ConfigRecord is not defined
    • (with the incorrect IDE fix) Runs to completion with errors about accessing config_record with no associated value
    • no metadata printed to screen
  • Fix:
    • Add ConfigRecord to the import statement (from flwr.app import ArrayRecord, Context, Message, MetricRecord, RecordDict, ConfigRecord)

The new data class is not imported into the strategy file

  • Issue:
    • The new dataclass is not imported on the server side and the tutorial does not show this step so errors occur when the strategy tries to unpack the metadata
  • What happens when I run the code as-is:
    • The code throws serialization/type errors
    • no metadata printed to screen
  • Fix:
    • Import the newly created data class at the top of the custom strategy file

Then, I experienced issues with asdict, continued frozen metrics, and other errors that I just stopped documenting because I was never able to get to the bottom of them and fix them.

So, after way too much time, I abandoned this tutorial and reverted back to the working code from the first half of part 4.

I invite any tips to send custom config records and process them on the server side as I will most likely need this for my research in the future.