Export metrics for prometheus

fbcale · February 14, 2025, 5:24pm

Hello,

I followed the guide at Quickstart with Docker - Flower Framework and everything is running smoothly.
I would like to know if it’s possible to expose the metrics for Prometheus.

javier · February 17, 2025, 5:30pm

Hi @fbcale , great to hear you got it working smoothly! What do you mean by metrics ? are you referring to for example the training loss etc ? or are you asking about system-level metrics like RAM/CPU/IO each container consumes?

fbcale · February 18, 2025, 1:27pm

Hi @Javier,

I try to be more specific, and I’ll write down what I’ve tried in the meantime.

I deployed a federated learning infrastructure with one superlink and three supernodes.
I need to display the following information in Grafana:

CPU/GPU usage
Aggregated metrics

For the first one, I used cAdvisor, and for the second one, I implemented prometheus_client on the client app side using the Gauge class. I then used the start_http_server method to expose the /metrics endpoint.

Finally, I used Prometheus to scrape both.

However, I noticed that once I run the remote simulation, the client app remains idle (I assume due to the daemon thread created with start_http_server).
If I run the simulation again, the client app raises an exception because the exposed port is already in use.

Is there a better way to handle this?

javier · February 21, 2025, 7:10pm

Hey @fbcale , I have created simple setups with Grafana, Prometheus and cAdvisor in the past. I also helped putting together this (now outdated) example in our repository.

I’m not sure I quite understand the problem you are describing. Do you have this code in a public repository I could take a look into? if not, could you create a repository with a very simple setup?

fbcale · February 25, 2025, 2:32pm

Hi @javier,

I published an example on github, using your sklearn quickstarter.

When i run the remote simulation, once it ends, the serverapp continue to stay in idle due the start_http method.
Is there a way to handle prometheus with the deploy engine?

javier · March 1, 2025, 8:00am

Hi @fbcale , thanks for sharing the example. I think it would be better to put also supernode/superlink etc as services in your compose file. This is so everything can be spawned with a single command.

It is expected that all components remain running (but idle) once the run has finished. The “infrastructure” in Flower is technically detached from the “application”.

Topic		Replies	Views
How to avoid Flower Next from destroying my model on every fit and every evaluate Flower Help - Beginners	5	147	December 21, 2024
No metrics being shown at all in the simulations Flower Help - Beginners metrics	1	176	March 1, 2024
Metrics_distributed always stays empty {} Flower Framework	1	300	March 5, 2024
Announcing Flower 1.11 General	8	239	October 17, 2024
Announcing Flower 1.10 General	2	408	September 13, 2024

Export metrics for prometheus

Related topics