Quickstart-Pytorch with Docker Compose

I’m trying to run quickstart-pytorch example with docker compose following the instructions here Run Flower Quickstart Examples with Docker Compose - Flower Framework.
When I execute flwr run . local-deployment --stream, I see the following in the logs of SuperExec service

2024-10-31 16:30:21 ❌ File hashes couldn't be verified.
2024-10-31 16:30:21 ERROR :     Could not start run: 
2024-10-31 16:30:21 ERROR :     Executor failed to start run
2024-10-31 16:30:21 INFO :      ExecServicer.StreamLogs

Moreover, the logs in the terminal where I run the command are the following:

Loading project configuration... 
Success
🎊 Successfully built flwrlabs.pytorchexample.1-0-0.2d1af4fd.fab
🎊 Successfully started run 0
INFO :      Starting logstream for run_id `0`
ERROR :     Invalid run_id `0`, exiting

Do you have any idea what might be wrong ? I am only following the steps from the tutorial and have not modified anything. I’m working on a Windows 11 machine with WSL2 for Docker and the flwr command executed from a terminal where a python virtual environment is enabled. Thanks in advance!

2 Likes

Hi @vagzikopis , many thanks for creating this topic! I wasn’t able to reproduce the problem you mention. However, I note that the compose.yml file will make use of flwr-1.11.1 unless (!!!) you have exported the FLWR_VERSION variable. You can see that logic being present in several places in the .yml.

Maybe the environment where you were doing flwr run . local-deployment --stream was using flwr-1.12.0 but the docker setup was still in flwr-1.11.1?

I can confirm that launching the compose setup like this:

FLWR_VERSION="1.12.0" docker compose up --build -d

and then start the run like this:

# from a python environment with `flwr-1.12.0`
flwr run . local-deployment --stream

It works.

1 Like

I created this PR to explicitly indicate the need for FLWR_VERSION to be exported. Do you think we could add a bit more context in the guide about this ? or maybe something else you wished was there when you went through it for the first time?

1 Like

Hey @javier, thanks for the quick response!

It turns out that the issue wasn’t with the FLWR_VERSION variable after all. When I checked the containers that were created, I could see they were using flwr-1.12.0 as expected.

The real issue was with how I was running the command. I was using Docker Engine through WSL2 but trying to run the command flwr run . local-deployment --stream from a Windows PowerShell, where I had a Python virtual environment set up. The solution was to set up a Python virtual environment in the Linux terminal (inside WSL2) and then run the flwr run command from there. That worked perfectly.

Before finding this fix, I’d tried to use the Windows Python virtual environment in my Linux terminal, but that didn’t work since the virtual environment was set up with the Windows version of Python, not the Linux one.

1 Like

Interesting! I’m unfortunately not very experienced with developing in Windows (outside pure WSL2). Great to hear you identified the issue.

I wonder if there is a way of interfacing between a Python env in windows with what’s running in WSL2. Maybe it’s just a matter of setting up the superexec service’s ports differently? If so, that would be an intersting addition to the documentation I think.

@robert @danny, are you familiar with running Docker in WSL2 and “interacting” with it from outside WSL2?