Hello,
I followed the guide at Quickstart with Docker - Flower Framework and everything is running smoothly.
I would like to know if it’s possible to expose the metrics for Prometheus.
Hello,
I followed the guide at Quickstart with Docker - Flower Framework and everything is running smoothly.
I would like to know if it’s possible to expose the metrics for Prometheus.
Hi @fbcale , great to hear you got it working smoothly! What do you mean by metrics ? are you referring to for example the training loss etc ? or are you asking about system-level metrics like RAM/CPU/IO each container consumes?
Hi @Javier,
I try to be more specific, and I’ll write down what I’ve tried in the meantime.
I deployed a federated learning infrastructure with one superlink and three supernodes.
I need to display the following information in Grafana:
For the first one, I used cAdvisor, and for the second one, I implemented prometheus_client on the client app side using the Gauge class. I then used the start_http_server
method to expose the /metrics
endpoint.
Finally, I used Prometheus to scrape both.
However, I noticed that once I run the remote simulation, the client app remains idle (I assume due to the daemon thread created with start_http_server
).
If I run the simulation again, the client app raises an exception because the exposed port is already in use.
Is there a better way to handle this?
Hey @fbcale , I have created simple setups with Grafana
, Prometheus
and cAdvisor
in the past. I also helped putting together this (now outdated) example in our repository.
I’m not sure I quite understand the problem you are describing. Do you have this code in a public repository I could take a look into? if not, could you create a repository with a very simple setup?
Hi @javier,
I published an example on github, using your sklearn quickstarter.
When i run the remote simulation, once it ends, the serverapp continue to stay in idle due the start_http method.
Is there a way to handle prometheus with the deploy engine?
Hi @fbcale , thanks for sharing the example. I think it would be better to put also supernode/superlink etc as services in your compose file. This is so everything can be spawned with a single command.
It is expected that all components remain running (but idle) once the run has finished. The “infrastructure” in Flower is technically detached from the “application”.