Are there Flower examples with CSV files?

1 Like

Both this Scikit Learn regression example, or this Torch classifier example, are using tabular data. But they use Flower Datasets to facilitate partitioning instead of manual CSV loading. You can checkout this guide to see how to use a local CSV file with Flower Datasets

1 Like

Hello,
I have modified the code of the first tutorial in the official Flower website.

I got the following:

def get_dataloaders(partition_id: int):
    
    
    train_data = pd.read_csv( IRIS_DATASET_PATH / 'Train' / 'Train.csv' )
    test_data  = pd.read_csv( IRIS_DATASET_PATH / 'Test' / 'Test.csv'   )
    df_data    = pd.DataFrame( np.concatenate( [ train_data.values, test_data.values ], axis = 0 ) )
    
    
    partitioner = IidPartitioner( num_partitions = NUM_CLIENTS )
    partitioner.dataset  = Dataset.from_pandas( df_data )
    
    
    partition   = partitioner.load_partition( partition_id = partition_id )
    
    
    #******************************************************
    # DIVIDE DATA ON EACH NODE: 80% TRAIN, 20% VALID + TEST
    #******************************************************
    partition_train_test = partition.train_test_split( test_size = 0.2, 
                                                       seed      = 42 )
    
    partition_valid_test = partition_train_test['test'].train_test_split( test_size = 0.5, 
                                                                          seed      = 42 )
    
    
    def apply_transforms( batch ):
        # Instead of passing transforms to CIFAR10(..., transform=transform)
        # we will use this function to dataset.with_transform(apply_transforms)
        # The transforms object is exactly the same
        for key in batch.keys():
            
            batch[key] = torch.tensor( batch[key] )
        
        return batch

    partition_train_test = partition_train_test.with_transform( apply_transforms )
    
    partition_valid_test = partition_train_test.with_transform( apply_transforms )
    
    train_dataloader = torch.utils.data.DataLoader( dataset    = partition_train_test["train"], 
                                                    batch_size = BATCH_SIZE, 
                                                    shuffle    = True    )
    
    valid_dataloader =torch.utils.data. DataLoader( dataset    = partition_valid_test["train"], 
                                                    batch_size = BATCH_SIZE )
                   
    test_dataloader  = torch.utils.data.DataLoader( dataset    = partition_valid_test["test"], 
                                                    batch_size = BATCH_SIZE )
    
    return train_dataloader, valid_dataloader, test_dataloader

Well, when I run Flower, it freezes.

I have not been understanding what I am doing wrong.

Regards

1 Like

I’d advise to check the change in the centralized setup first.

1 Like

Dear @adam-narozniak ,
many thanks for your feedback, but I only changed the function “load_datasets” which I renamed “get_dataloaders”.

Best regards,

1 Like

Well, if it worked before and now it does not work, the problem is likely there. Debugging in an FL setup is hard. So, if you want to get a clearer picture of what’s going on and possibly share the error or problem further, that’d be the way to go. But feel free to try to solve it in the whole FL pipeline if that suits you, but can’t future help without any new information :((

1 Like

Dear @adam-narozniak ,

I think you are right. As the first debugging step, I will follow your suggestion and, eventually, I will let you know.

Thanks again!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.