LoginSignup
0
0

More than 1 year has passed since last update.

The method of Federated Learning

Last updated at Posted at 2022-09-08

1. Introduction

・The flow of federated learning is as follows.

①:Distribute the global model to each local terminal from a central location.
②:Train using the data held by each local terminal and update the model.
③:Create a new global model by centrally aggregating models updated by each local terminal.
④:Repeat ①〜③.

・When creating a global model by aggregating models from the local to the center in ③, (aggregation), an algorithm called "FedAvg" (obtaining the average of each model) is used. It is used.
・However, this FedAvg is not perfect and may create an inaccurate global model.

Causes of poor global model accuracy
・A model that is clearly different from other models (like an outlier) is confused.
・When updating the model with data on each local side, an appropriate average cannot be obtained due to the bias of each data.

・The following algorithms are used to solve these problems.

2. Krum

Krum has aggregation rules that allow f attackers (those who intentionally interfere with model aggregation)

• There is a "gradient estimate"$(V_1,...,V_n)$ of the cost function, collected from each local terminal.
・Sum the Euclidean distances of a gradient estimate $V$ and the nearest "n-f-2 estimates" to get $ scores(i)$.
・The estimated value $V_{i_{*}}$ when minimizing $scores(i)$ is obtained from the Krum function.

\newcommand{\argmax}{\mathop{\rm arg~max}\limits}
\newcommand{\argmin}{\mathop{\rm arg~min}\limits}
\begin{align}
scores(i)&=\sum_{i=1}^{n-f-2}||V-V_i||^2 \\ 
&=\sum_{i=1}^{n-f-2}\sqrt{(V-V_i)^2}\\
 \\
Krum&=V_{i_{*}} 
\end{align}\\

\left\{
\begin{array}{ll}
V: any gradient estimate\\
V_i: Gradient estimates collected from local models\\
V_{i_{*}}: estimated value obtained by Krum function\\
\ f: Attacker (abnormal model) \\
n: Number of clients\\ 
i_*: i that minimizes scores(i)

\end{array}
\right.

3. Geometric Median

Update the global model with the "median" of the parameters of the aggregated local model.

・In FedAvg, a global model was created by averaging the models, but in geometric median, median is generated to create a global model.
・The calculation of the median is as follows.
・This is meant to be robust mean estimators, such that outlier values from adversarial submission or general error have less of an effect on the final calculation.

\newcommand{\argmax}{\mathop{\rm arg~max}\limits}
\newcommand{\argmin}{\mathop{\rm arg~min}\limits}
\begin{align}
Geometric Median&=\argmin_{y\in\mathbb{R}^{n}}\sum_{i=1}^{m}||x_i-y||_2 \\
&=\argmin_{y\in\mathbb{R}^{n}}\sum_{i=1}^{m}\sqrt{(x_i-y)^2}
\end{align}\\


\left\{
\begin{array}{ll}
\ y: global model\\
x_i: local model\\
\end{array}
\right.

4. Coordinate-Wise Median

Update the global model with the "median" of the parameters of the aggregated local model.

・Same as Geometric Median, instead of averaging, median is used to create a global model.
・Unlike Geometric Median in calculating the median, the distance between two points is calculated as Manhattan distance($||x_i-y||_1$).
・This is meant to be robust mean estimators, such that outlier values from adversarial submission or general error have less of an effect on the final calculation.

\newcommand{\argmax}{\mathop{\rm arg~max}\limits}
\newcommand{\argmin}{\mathop{\rm arg~min}\limits}
\begin{align}
CoordinateWise Median&=\argmin_{y\in\mathbb{R}^{n}}\sum_{i=1}^{m}||x_i-y||_1 \\
&=\argmin_{y\in\mathbb{R}^{n}}\sum_{i=1}^{m}(x_i-y)
\end{align}\\

\left\{
\begin{array}{ll}
\ y: global model\\
x_i: local model\\
\end{array}
\right.

5. Fed Prox

Add a proximal term to the local training loss function to limit local model changes.

・The proximal term is the term $(||w−w_g||^2)$ that represents the difference between the weights obtained locally and the weights of the global model.
・Implement restrictions to prevent performance degradation due to non-uniformity of data on different local terminals due to non-iid data.
・Data Heterogeneity: If there is a gap in the data for each client (one client only has images of dogs and the other only images of cats), the aggregated global model Poor performance. Using FedProx creates an agnostic global model (predicting both dogs and cats) that maintains its performance.

\tilde{L(w)}=L(w)+α||w-w_g||^2 \\

\left\{
\begin{array}{ll}
\tilde{L(w)}: loss function constrained by FedProx\\
L(w): local loss function\\
w: locally derived weight\\
w_g: global model weights\\
α: the hyperparameter controlling the proximal term \\
\end{array}
\right.

6. Fed Curv

We use the Fisher information matrix to evaluate the importance of each parameter in the model and impose greater penalties based on importance.

・Penalties are imposed to reduce performance degradation due to uneven computation processing speeds on different local terminals.
Computation processing speed: On local terminals, the performance of aggregation is degraded due to the difference in computation processing speed due to the difference in machine performance.

\tilde{L(w)_s}=L(w)_s+α\sum_{j∈S-s}(w-w_j)^Tdiag(\hat{I_j})(w-\hat{w_j}) \\

\left\{
\begin{array}{ll}
\tilde{L(w)_s}: Loss function constrained by FedCurv\\
L(w)_s: local loss function\\
S: Node participating in training\\
s: current node\\
\hat{w_j}: Weight obtained in training node j\\
\hat{I_j}: Fisher information matrix corresponding to \hat{w_j}\\
w: locally derived weight\\
α: the hyperparameter controlling the penalty term \\
\end{array}
\right.

7. Trimmed mean

Create a global model by averaging each model using trimmed mean

Trimmed mean is a method of sorting data in ascending order, excluding the top and bottom XX%, and averaging the remaining data.
• Sort the collected models from each local and take the trimmed mean to create a global mean.

Reference

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0