RandomForestRegression

Hi dear community! I’m trying to train sklearn.RandomForestRegressor with Flower. I have 1 server machine and 1 client machine. But I am getting the error: File “/home/elena/project/venv3/lib/python3.9/site-packages/flwr/server/strategy/aggregate.py”, line 59, in _try_inplace
np_binary_op(x, y, out=x)
numpy._core._exceptions._UFuncNoLoopError: ufunc ‘multiply’ did not contain a loop with signature matching types (dtype(‘<U400’), dtype(‘float64’)) → None

So I started wondering if it is possible to train and aggregate forest models with Flower. Is there proper aggregation method implemented for them? As forests seem to be more complex things than, say, linear regression. Thank you in advance for replies

1 Like

Hi - Yes, it is possible to train random forest models with Flower. See this example of XGBoost implemented in Flower, which also has a YouTube tutorial: Quickstart XGBoost - Flower Framework

Having said that, can you please share more details about your implementation setting because having just 1 client isn’t really FL per se (because you have only source of data). I assume you have the data split on the server machine as well? Also, which aggregation method were you using?

1 Like

Hi, thank you for answering and for the link! I am trying to use sklearn RFs. I only have 1 client, because I want to learn Flower framework not only in simulation mode but in deployment, but I only have 2 machines at home :slight_smile:
My implementation mostly looks like this example GitHub - Hongwei-Z/Federated-Random-Forest: Using Flower federated learning with scikit-learn random forest except that I only have 1 client, and I tried various ways of implementing set_params() and get_params() functions. But I believe that this is not completely correct anyway, because in that example I can’t see a valid way of passing and aggregating Random Forest estimators… Or perhaps I just misunderstand the whole concept. Could you please comment on the example? If it is correct, then how the aggregation works in it? And how the estimators are being passed between client and server? Shouldn’t we pass them using set_params() and get_params()?

Hi @helenklim, thank you for posting here.

In order to run a federated example, you will need a minimum of 2 clients as conventionally there are more clients than servers, otherwise you don’t need the server.

The example you are referring to is good and I generally like tree-based models, however it is built on legacy/outdated code. The newer versions of Flower use a different way of running clientApps and ServerApps. What you could do is to checkout our sklearn example. Together, we can work to integrate the sklearn random forest here.

Best regards
William