Announcing Flower Datasets 0.2.0!

adam-narozniak · July 9, 2024, 9:45am

The Flower Team is excited to announce the release of Flower Datasets 0.2.0!

Flower Datasets (flwr-datasets) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the Flower Labs team that also created Flower: A Friendly Federated Learning Framework.

What’s new?

Add label distribution visualization (#3451)Introduce visualize.plot_label_distributions and visualize.plot_comparison_label_distribution. To learn how to use the new functions visit link.

Add label count utils (#3551)Introduce the public methods for compute_counts and compute_frequencies.
Improve speed of NaturalIdPartitioner (#3276)The datasets with a bigger number of unique IDs and a bigger number of samples especially benefit from this improvement.
Documentation improvements
- Flower Datasets README and index page update Include the visualization plots and update the information about the available partitioners.
- Flower Datasets README update Add a tip section at the beginning that points to complete examples, simplify, and move the installation section to the top.
- Add how to visualization guide Provide a resource to learn how to visualize the label distribution given a Partitioner instance and how to compare the distributions.
Fix utils.divide_dataset for division for more than 2 divisions (#3192) Support the division into more than 2 divisions (exclusive parts of a dataset) in utils.divide_dataset. They incorrectly return a dataset of size 0 for the divisions with an index starting from 2.
Add fixed seed in train_test_split in examples (#3211)Add the random_state parameter to the train_test_split function in the examples to ensure reproducibility. The examples now have constant train-test splits along multiple rounds and runs.
Add telemetry (#3479)Start collecting data about the used datasets, partitioners, the type of load_ function in FederatedDataset, and the visualization utils. To learn more visit link.
Limit the datasets versions (#3607)Avoid passing the obligatory trust_remote_code=True to the datasets.load_dataset function by limiting the datasets library version to datasets = “>=2.14.6 <2.20.0”. This is a temporary change that can be relaxed once we support passing the kwargs via FederatedDataset to datasets.load_dataset

Incompatible changes

Rename resplitter parameter and type to preprocessor (#3476)Also, simplify the naming of the DivideResplitter to Divider and MergeResplitter to Merger. The *Resplitter are now of type Preprocessor not Resplitter.
Rename resplitter in examples (#3485)Update the examples to use the new preprocessor parameter in FederatedDataset instead of the resplitter.

Topic		Replies	Views
Announcing Flower Datasets 0.4.0 Flower Datasets datasets	0	57	October 22, 2024
Announcing Flower Datasets 0.3.0 General	0	36	July 26, 2024
Announcing Flower Datasets 0.5.0! General	0	49	December 20, 2024
The figure is not fully displayed Flower Help - Beginners flower , datasets	3	41	March 11, 2025
LEAF datasets pre-processing General datasets	1	14	July 16, 2025

Announcing Flower Datasets 0.2.0!

What’s new?

Incompatible changes

Related topics