Hi everyone,
What happens (in practice and theoretically) when enabling gradient_based
subsampling (see parameters for more information)?
From my understanding, gradient based subsampling is based on Mimimal Variance Sampling (MVS). My questions are:
- Is the subsampling based on gradient information from each client or global information?
- Can it be used for adaptation to local client data?
- Can we prove convergence using MVS in a federated setting?
Hope to start a great discussion!