Feedback on Federated Learning Uni-seminar Project

Hi Flower community!

We are currently working on a project where we forecast hourly energy consumption using “SmartMeter Energy Consumption Data” provided by London Datastore. (SmartMeter Energy Consumption Data in London Households – London Datastore) Our goal is to showcase advantages of federated learning against centralized alternatives. For this purpose, we trained our models both in federated as well as centralized settings and used Flower to Federate our ML process, more specifically we benefited from Flower Quickstart (Simulation with TensorFlow/Keras) tutorial available via the link below: Google Colab

Since we are new to federated learning and the Flower framework, we would like to get some feedback on our use of Flower and if there are any improvements we could make. Some of the questions that we have are as follows:

Currently our training process takes a considerable amount of time and computational resources to run. In order to solve this issue we reduced the size of our dataset by filtering it to year 2013 and to a specific type of customers only (standard customers facing flat rate prices only). Also, for the experimenting stage we limited the number of meters (and thus clients).

Data Partitioning: In our research we consider two scenarios: One where number of meters equals number of clients and second one where number of meters are more than number of clients and thus every client has a set of unique meter IDs.

Since federated learning is new for us we need to understand if it performs good or bad. We “benchmark” it to various centralized scenarios:
Feeding all the data to the LSTM model,
Randomly limiting the amount of data feed to LSTM,
Removing some of the columns from the training dataset.

Model Architecture: Are our LSTM models suitable for federated learning? In the case of stacked LSTM, we selected the number of neurons in each layer as 20 - 15 - 10 - 5, reducing the number of neurons gradually in each subsequent layer.

Right now we still have some open questions in our project since Flower is a new tool for us.
Which parameters we should consider when optimizing our federated model to get better results considering our limited computational resources?
Did we construct our simulation correctly or did we miss something important?
Since we have limited computational resources, is reducing fraction_fit and fraction_evaluate a bad practice in the context of federated learning (since it is not representative of a “real world scenario”) or can it be done?

Our full code for data preprocessing as well as centralized and federated simulations are available at our Github repository (GitHub - LiaOlya/FL_smartcities). We would greatly appreciate your feedback on our implementation and any suggestions that you could give us for our project.

Thank you in advance for your help and insights!

1 Like