Custom data splitting/partitioning

gxenos · August 13, 2025, 3:20pm

Hi,

I am working on contributing a baseline and to recreate an experiment from the original paper I have to implement a custom dataset partitioning scheme. Specifically for MNIST the dataset partitioning has to adhere to the following:

“To simulate a heterogeneous setting, we distribute the data among 1000 devices such that each device has samples of only 2 digits and the number of samples per device follows a power law.”

and for an experiment for FEMNIST:

“To generate heterogeneous data partitions, we subsample 10 lower case characters (‘a’-‘j’) from FEMNIST and distribute only 5 classes to each device.”

Is there a way of doing this using Flower Datasets? I read the documentation but didn’t find a way of implementing such schemes.

Thanks

williamlm · August 14, 2025, 4:45pm

hi @gxenos,

great question! You could use the NaturalDPartitioner to split the amount of assigned labels to each client. Try the property partition_id_to_natural_id.

I think that should work.

Best regards,

William

Topic	Replies	Views
Fine partitionning by labels Flower Help - Beginners flower , datasets	43	November 4, 2024
Announcing Flower Datasets 0.4.0 Flower Datasets datasets	401	October 22, 2024
Announcing Flower Datasets 0.5.0! General	105	December 20, 2024
Announcing Flower Datasets 0.3.0 General	62	July 26, 2024
Announcing Flower Datasets 0.2.0! General	116	July 9, 2024

Custom data splitting/partitioning

Related topics