Classification FL model with tabular dataset

damjimenezgu · June 5, 2024, 7:27pm

Where can I find an end-to-end simulation example of running a Federated Learning with Flower using a tabular dataset for training a Multi-layer Perceptron (MLP) or Deep Neural Network (DNN) classifier in TensorFlow.

PS: If this does not exists I’d be happy to create some blogs and Google Colab examples to contribute.

javier · June 5, 2024, 7:32pm

Hi @damjimenezgu, the Flower repo currently doesn’t have an example like the one you propose. Would you like to contribute one ? The closest there is to what you mention is one using scikit-learn on the iris dataset.

What choice of dataset do you have in mind ? It would be cool if the example is built on Flower Datasets also. @adam-narozniak do you have some cool tabular datasets in mind ?

damjimenezgu · June 6, 2024, 6:56am

Hi! Yeah, I’d be happy to contribute with this.

The datasets I have in mind are:

Titanic dataset (included in Tensorflow datasets). It is a simple dataset to perform a binary classification task that runs fast since it has only 1,309 examples and 14 variables.
Optical recognition of handwritten digits dataset (included in Scikit-learn toy datasets). It is a simple dataset to perform a multi-class classification task that runs fast since it has only 1,797 examples and 64 variables.
Adults (included in UCI ML. It is a more complex dataset to perform a binary classification. It has 48,842 examples and 13 variables.

If there are no tabular datasets already partitioned in Flower Datasets (let’s wait for @adam-narozniak reply), we could use the functions there to partition any of the previous mentioned datasets and run the end-to-end example in a notebook (like Google Colab).

adam-narozniak · June 6, 2024, 8:02am

Hi,
If you’re going for multilayer perceptron, then the image-based classification is probably out of question.
Besides that all the datasets you mentioned are fine. And if you went for any dataset that there’s an alternative example in the repo, you could mention it as a reference (as a form of benchmark).

damjimenezgu · June 6, 2024, 8:54am

Hi! Ok, perfect, thanks for your reply. I’ll go for the Titanic dataset. I will develop the notebook using the same format employed in the following (from the Flower repo):

flower/examples/simulation-tensorflow

I’m planning to create a new folder in “flower/examples/” called simulation-tensorflow-tabular to place all the implementation I’ll develop.

Please let me know if you have any advice for me to contribute with this.

adam-narozniak · June 6, 2024, 9:36am

The title looks good. The only tip I have in mind is to make sure that the categorical data is encoded in the same way (either do that on the whole dataset or have a set of fixed categories and encode it on partitioned data). Otherwise, the values and shape of the data can differ in the case of OneHotEncoding or just the value in the case of OridinalEncoding. Especially the second one can be hard to notice and tricky to debug later on (but the metrics can go down easily).

I found the Titanic dataset on HF: julien-c/titanic-survival · Datasets at Hugging Face, so feel free to use Flower Datasets.

That’s it. Good luck!

Topic		Replies	Views
Are there Flower examples with CSV files? Flower Framework csv , example	7	108	November 28, 2024
Announcing Flower Datasets 0.2.0! General	0	52	July 9, 2024
Announcing Flower Datasets 0.4.0 Flower Datasets datasets	0	55	October 22, 2024
Announcing Flower Datasets 0.3.0 General	0	36	July 26, 2024
Announcing Flower Datasets 0.5.0! General	0	49	December 20, 2024

Classification FL model with tabular dataset

Related topics