Remote deployment of XGBoost quickstart project with Docker

Hi, dear Community

I’ve been struggling for a few days to run this project, but I keep getting different connection-related errors and finally decided to ask for help, as I don’t really have experience in such problems and I must be missing something very basic in my setup…
I’m using 2 machines of the same local network as my superlink and supernode.
I took

as baseline, but introduced some changes to make my “local” machine a superlink and my “remote” machine a supernode and not vice versa.
Here is my current state of project code GitHub - helen-mark/distributed

As I had trouble establishing connection, I commented secrets for now and try insecure, but still have issues…
My logs (partial, as they seem to repeat):
~/AT/flower/src/docker/distributed $ docker logs server-superlink-1
WARNING : DEBUG logs enabled. Do not use this in production, as it may expose sensitive details.
INFO 2025-04-24 16:18:40,530: Starting Flower SuperLink
WARNING 2025-04-24 16:18:40,531: Option --insecure was set. Starting insecure HTTP server with unencrypted communication (TLS disabled). Proceed only if you understand the risks.
INFO 2025-04-24 16:18:40,542: Flower Deployment Engine: Starting Exec API on 0.0.0.0:9093
INFO 2025-04-24 16:18:40,545: Flower ECE: Starting ServerAppIo API (gRPC-rere) on 0.0.0.0:9091
DEBUG 2025-04-24 16:18:40,546: Automatic node authentication enabled
INFO 2025-04-24 16:18:40,546: Flower ECE: Starting Fleet API (gRPC-rere) on 0.0.0.0:9092
DEBUG 2025-04-24 16:18:41,051: ServerAppIoServicer.PullServerAppInputs
DEBUG 2025-04-24 16:18:41,052: Using SqliteState
DEBUG 2025-04-24 16:18:42,493: Using SqliteState
INFO 2025-04-24 16:18:42,500: [Fleet.CreateNode] Request ping_interval=30.0
DEBUG 2025-04-24 16:18:42,500: [Fleet.CreateNode] Request: {‘pingInterval’: 30.0}
DEBUG 2025-04-24 16:18:42,502: Using SqliteState
INFO 2025-04-24 16:18:42,635: [Fleet.CreateNode] Created node_id=16166392263872557640
DEBUG 2025-04-24 16:18:42,635: [Fleet.CreateNode] Response: {‘node’: {‘nodeId’: ‘16166392263872557640’}}
DEBUG 2025-04-24 16:18:42,637: Using SqliteState
DEBUG 2025-04-24 16:18:42,845: Using SqliteState
DEBUG 2025-04-24 16:18:42,848: Using SqliteState
INFO 2025-04-24 16:18:42,850: [Fleet.CreateNode] Request ping_interval=30.0
DEBUG 2025-04-24 16:18:42,850: [Fleet.CreateNode] Request: {‘pingInterval’: 30.0}
DEBUG 2025-04-24 16:18:42,851: Using SqliteState
INFO 2025-04-24 16:18:42,854: [Fleet.PullMessages] node_id=16166392263872557640
DEBUG 2025-04-24 16:18:42,854: [Fleet.Ping] Request: {‘node’: {‘nodeId’: ‘16166392263872557640’}, ‘pingInterval’: 30.0}
DEBUG 2025-04-24 16:18:42,854: [Fleet.PullMessages] Request: {‘node’: {‘nodeId’: ‘16166392263872557640’}}
DEBUG 2025-04-24 16:18:42,855: Using SqliteState
INFO 2025-04-24 16:18:42,985: [Fleet.CreateNode] Created node_id=14324122303795084016
DEBUG 2025-04-24 16:18:42,985: [Fleet.CreateNode] Response: {‘node’: {‘nodeId’: ‘14324122303795084016’}}

elena@elena-laptop:~/distributed$ docker logs client-clientapp-1-1
INFO : Start flwr-clientapp process
elena@elena-laptop:~/distributed$ docker logs client-supernode-1-1
WARNING : DEBUG logs enabled. Do not use this in production, as it may expose sensitive details.
INFO 2025-04-24 16:18:42,489: Starting Flower SuperNode
WARNING 2025-04-24 16:18:42,489: Option --insecure was set. Starting insecure HTTP channel to 192.168.2.31:9092.
DEBUG 2025-04-24 16:18:42,490: Isolation mode: process
INFO 2025-04-24 16:18:42,498: Starting Flower ClientAppIo gRPC server on 0.0.0.0:9094
DEBUG 2025-04-24 16:18:42,514: Opened insecure gRPC connection (no certificates were passed)
DEBUG 2025-04-24 16:18:42,515: ChannelConnectivity.IDLE
DEBUG 2025-04-24 16:18:42,521: ChannelConnectivity.CONNECTING
DEBUG 2025-04-24 16:18:42,525: ChannelConnectivity.READY
DEBUG 2025-04-24 16:18:42,545: ClientAppIo.GetToken
DEBUG 2025-04-24 16:18:43,546: ClientAppIo.GetToken
DEBUG 2025-04-24 16:18:44,548: ClientAppIo.GetToken

~/AT/flower/src/docker/distributed $ flwr run ../../../examples/xgboost-quickstart remote-deployment --stream
2025-04-24 19:21:04.548329: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1745511664.568528 685892 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745511664.574796 685892 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-24 19:21:04.595075: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading project configuration…
Success
:confetti_ball: Successfully built flwrlabs.xgboost_quickstart.1-0-0.5490c2fd.fab
E0424 19:21:07.315332025 685892 ssl_transport_security.cc:1654] Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = “failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.2.31:9093: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER”
debug_error_string = “UNKNOWN:Error received from peer {grpc_message:“failed to connect to all addresses; last error: UNKNOWN: ipv4:192.168.2.31:9093: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER”, grpc_status:14, created_time:“2025-04-24T19:21:07.315798651+03:00”}”

(federated_learning_venv) elena@elena-desktop ~/AT/flower/src/docker/distributed $