Custom data splitting/partitioning

Hi,

I am working on contributing a baseline and to recreate an experiment from the original paper I have to implement a custom dataset partitioning scheme. Specifically for MNIST the dataset partitioning has to adhere to the following:

“To simulate a heterogeneous setting, we distribute the data among 1000 devices such that each device has samples of only 2 digits and the number of samples per device follows a power law.”

and for an experiment for FEMNIST:

“To generate heterogeneous data partitions, we subsample 10 lower case characters (‘a’-‘j’) from FEMNIST and distribute only 5 classes to each device.”

Is there a way of doing this using Flower Datasets? I read the documentation but didn’t find a way of implementing such schemes.

Thanks

1 Like

hi @gxenos,

great question! You could use the NaturalDPartitioner to split the amount of assigned labels to each client. Try the property partition_id_to_natural_id.

I think that should work.

Best regards,

William

1 Like