LEAF datasets pre-processing

oabuhamdan · July 15, 2025, 7:44am

Hello guys,
A flwr-datasets question,
I am looking at flwrlabs/shakespeare · Datasets at Hugging Face
And it mentions that it’s a part from LEAF’s benchmark dataset.
The only code mentioned to get and use this data is:

from flwr_datasets import FederatedDataset
from flwr_datasets.partitioner import NaturalIdPartitioner

fds = FederatedDataset(
    dataset="flwrlabs/shakespeare",
    partitioners={"train": NaturalIdPartitioner(partition_by="character_id")}
)
partition = fds.load_partition(partition_id=0)

In the LEAF benchmark, the data has to be pre-processed with their scripts before use.
Is flwrlabs version of this data pre-processed and ready to use?

javier · July 16, 2025, 3:31pm

Hi @oabuhamdan, I believe this is the raw data. I can take a closer look in a couple of days to verify it. Could you help us check this in the mean time? This dataset is ready to be used as shown above. There are 1129 partitions just like in the original LEAF.

Topic	Replies	Views
Announcing Flower Datasets 0.4.0 Flower Datasets datasets	94	October 22, 2024
Announcing Flower Datasets 0.3.0 General	55	July 26, 2024
Announcing Flower Datasets 0.5.0! General	71	December 20, 2024
How to use local datasets with Flower Datasets Flower Help - Beginners datasets	81	August 5, 2025
Announcing Flower Datasets 0.2.0! General	77	July 9, 2024

LEAF datasets pre-processing

Related topics