I’m working with the FlowerTune LMM example in a KubeFlow environment. I can get it to run in simulation mode and I can get it to run in deployment mode on the same node/pod, but I get the error below when I try to start up the second SuperNode. I can’t figure out what is causing the error.
The super link and first super node are both on the same machine (10.42.22.105). Here’s what happens when I try to start the second super node on the machine 10.42.4.111:
(flowertune_llm2_3.9) jovyan@phsionet-test-07-0:~/data-physionet/flowertune-llm2/flowertune-llm$ flower-supernode \
> --insecure \
> --superlink 10.42.22.105:9092 \
> --clientappio-api-address 127.0.0.1:9095 \
> --node-config "partition-id=1 num-partitions=2"
INFO : Starting Flower SuperNode
WARNING : Option --insecure was set. Starting insecure HTTP channel to 10.42.22.105:9092.
INFO : Starting Flower ClientAppIo gRPC server on 127.0.0.1:9095
Traceback (most recent call last):
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/common/retry_invoker.py", line 276, in invoke
ret = target(args, kwargs)
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_interceptor.py", line 277, in call**
response, ignored_call = self._with_call(
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_interceptor.py", line 332, in withcall
return call.result(), call
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_channel.py", line 440, in result
raise self
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_interceptor.py", line 315, in continuation
response, call = self._thunk(new_method).with_call(
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_channel.py", line 1198, in with_call
return endunary_response_blocking(state, call, True, None)
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_channel.py", line 1006, in endunary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "RBAC: access denied"
debug_error_string = "UNKNOWN:Error received from peer {grpc_status:7, grpc_message:"RBAC: access denied"}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/bin/flower-supernode", line 8, in <module>
sys.exit(run_supernode())
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/supernode/app.py", line 83, in run_supernode
start_client_internal(
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/app.py", line 424, in start_client_internal
if (node_id := create_node()) is None:
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/grpc_rere_client/connection.py", line 199, in create_node
create_node_response = retry_invoker.invoke(
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/common/retry_invoker.py", line 290, in invoke
if giveup_check(err) or max_tries_exceeded or max_time_exceeded:
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/common/retry_invoker.py", line 288, in giveup_check
return self.should_giveup(_exception)
File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/grpc_rere_client/connection.py", line 159, in shouldgiveup_fn
raise RunNotRunningException
flwr.common.typing.RunNotRunningException