"RBAC: access denied" error running FlowerTune LLM example in deployment mode on multiple machines

I’m working with the FlowerTune LMM example in a KubeFlow environment. I can get it to run in simulation mode and I can get it to run in deployment mode on the same node/pod, but I get the error below when I try to start up the second SuperNode. I can’t figure out what is causing the error.

The super link and first super node are both on the same machine (10.42.22.105). Here’s what happens when I try to start the second super node on the machine 10.42.4.111:

(flowertune_llm2_3.9) jovyan@phsionet-test-07-0:~/data-physionet/flowertune-llm2/flowertune-llm$ flower-supernode \
>      --insecure \
>      --superlink 10.42.22.105:9092 \
>      --clientappio-api-address 127.0.0.1:9095 \
>      --node-config "partition-id=1 num-partitions=2"
INFO :      Starting Flower SuperNode
WARNING :   Option --insecure was set. Starting insecure HTTP channel to 10.42.22.105:9092.
INFO :      Starting Flower ClientAppIo gRPC server on 127.0.0.1:9095
Traceback (most recent call last):
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/common/retry_invoker.py", line 276, in invoke
    ret = target(args, kwargs)
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_interceptor.py", line 277, in call**
    response, ignored_call = self._with_call(
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_interceptor.py", line 332, in withcall
    return call.result(), call
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_channel.py", line 1198, in with_call
    return endunary_response_blocking(state, call, True, None)
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/grpc/_channel.py", line 1006, in endunary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.PERMISSION_DENIED
        details = "RBAC: access denied"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_status:7, grpc_message:"RBAC: access denied"}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/bin/flower-supernode", line 8, in <module>
    sys.exit(run_supernode())
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/supernode/app.py", line 83, in run_supernode
    start_client_internal(
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/app.py", line 424, in start_client_internal
    if (node_id := create_node()) is None:
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/grpc_rere_client/connection.py", line 199, in create_node
    create_node_response = retry_invoker.invoke(
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/common/retry_invoker.py", line 290, in invoke
    if giveup_check(err) or max_tries_exceeded or max_time_exceeded:
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/common/retry_invoker.py", line 288, in giveup_check
    return self.should_giveup(_exception)
  File "/home/jovyan/data-physionet/flowertune-llm2/flowertune-llm/env/flowertune_llm2_3.9/lib/python3.9/site-packages/flwr/client/grpc_rere_client/connection.py", line 159, in shouldgiveup_fn
    raise RunNotRunningException
flwr.common.typing.RunNotRunningException

Hello @griffith, welcome to the community!

I see that you also asked the question on Flower Slack. Let me answer it here so that it’s searchable for future reference.

That error "RBAC: access denied" is a Kubeflow-specific error (see for example this issue). It looks like the 2nd SuperNode does not have the appropriate Kubeflow permissions to connect to the SuperLink at 10.42.22.105.

To debug the issue, can you setup a Flower deployment using only Docker and not Kubeflow?

On the first machine 10.42.22.105, run

export FLWR_VERSION="1.18.0"

Then, using the following compose.yaml file, start the SuperLink and SuperNode using docker compose up -d --build:

services:
  superlink:
    image: flwr/superlink:${FLWR_VERSION:-1.19.0}
    container_name: superlink
    command:
      - --insecure
    ports:
      - 9092:9092
      - 9093:9093

  supernode:
    image: flwr/supernode:${FLWR_VERSION:-1.19.0}
    command:
      - --insecure
      - --superlink
      - superlink:9092
      - --node-config
      - "partition-id=1 num-partitions=2"
    depends_on:
      - superlink

Now, in your second machine 10.42.4.111, start the second SuperNode:

export FLWR_VERSION="1.18.0"
docker run --rm flwr/supernode:${FLWR_VERSION} --insecure --superlink="10.42.22.105:9092"

Hopefully, with the above steps, your SuperNode will connect since it’s not using Kubeflow. Once you’ve verified that it works with Docker containers and networking, then it’s probably the case that you’ll need to configure your Kubeflow permissions to allow the second SuperNode to connect to the SuperLink.

Hope that helps!

2 Likes