Kann mir bitte jemand den Backpropagation-Algorithmus erklären? [Duplikat]

13

Was ist der Backpropagation-Algorithmus und wie funktioniert er?

algorithms optimization neural-networks Ami
quelle

1

Ich habe hier eine Antwort auf diese Frage zusammengestellt , falls sich jemand dafür interessiert (ich wollte nicht umbuchen).

Phylliida,

14

Der Back-Propagation-Algorithmus ist ein Gradienten-Descent- Algorithmus zum Anpassen eines neuronalen Netzwerkmodells.(wie von @Dikran erwähnt) Lassen Sie mich erklären, wie.

Formal: Wenn Sie die Berechnung des Gradienten am Ende dieses Beitrags in der folgenden Gleichung [1] verwenden (das ist eine Definition des Gradientenabfalls), wird der Algorithmus für die Rückwärtsausbreitung als besonderer Fall für die Verwendung eines Gradientenabfalls angegeben.

Ein neuronales Netzwerkmodell Formal fixieren wir Ideen mit einem einfachen Einschichtmodell:

wobei und für alle , und

f (x) = g (A^{1} (s (A^{2} (x))))

$f(x)=g(A^1(s(A^2(x))))$

g : R \to R

$g:\mathbb{R} \rightarrow \mathbb{R}$

s : R^{M} \to R^{M}

$s:\mathbb{R}^M\rightarrow \mathbb{R}^M$

m = 1 \dots, M

$m=1\dots,M$

s (x) [m] = σ (x [m])

$s(x)[m]=\sigma(x[m])$

A^{1} : R^{M} \to R

$A^1:\mathbb{R}^M\rightarrow \mathbb{R}$ ,

sind unbekannte affine Funktionen. Die Funktion

A^{2} R^{p} \to R^{M}

$A^2\mathbb{R}^p\rightarrow \mathbb{R}^M$

σ : R \to R

$\sigma:\mathbb{R}\rightarrow \mathbb{R}$ wird im Rahmen der Klassifikation Aktivierungsfunktion genannt.

Eine quadratische Verlustfunktion wird verwendet, um Ideen zu fixieren. Daher können die Eingangsvektoren von an die reale Ausgabe von (könnten Vektoren sein) angepasst werden, indem der empirische Verlust minimiert wird: $(x_1,\dots,x_n)$ $\mathbb{R}^p$ $(y_1,\dots,y_n)$ $\mathbb{R}$ bezüglich der Wahl von und .

R_{n} (A^{1}, A^{2}) = \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} [1]

$\mathcal{R}_n(A^1,A^2)=\sum_{i=1}^n (y_i-f(x_i))^2\;\;\;\;\;\;\; [1]$

A^{1}

$A^1$

A^{2}

$A^2$

$\mathcal{R}$

a_{l + 1} = a_{l} - γ_{l} \nabla R (a_{l}), l \geq 0.

$\mathbf{a}_{l+1}=\mathbf{a}_l-\gamma_l \nabla \mathcal{R}(\mathbf{a}_l),\ l \ge 0.$

(γ_{l})_{l}

$(\gamma_l)_l$

R

$\mathcal{R}$

a_{l} = (A_{l}^{1}, A_{l}^{2})

$\mathbf{a}_l=(A^1_{l},A^2_{l})$

$\mathcal{R}$ (for the simple considered neural net model) Let us denote, by $\nabla_1 \mathcal{R}$ the gradient of $\mathcal{R}$ as a function of $A^1$ , and $\nabla_2\mathcal{R}$ the gradient of $\mathcal{R}$ as a function of $A^2$ . Standard calculation (using the rule for derivation of composition of functions) and the use of the notation $z_i=A^1(s(A^2(x_i)))$ give

\nabla_{1} R [1 : M] = - 2 \times \sum_{i = 1}^{n} z_{i} g^{'} (z_{i}) (y_{i} - f (x_{i}))

$\nabla_1 \mathcal{R}[1:M] =-2\times \sum_{i=1}^n z_i g'(z_i) (y_i-f(x_i))$ for all

m = 1, \dots, M

$m=1,\dots,M$

\nabla_{2} R [1 : p, m] = - 2 \times \sum_{i = 1}^{n} x_{i} g^{'} (z_{i}) z_{i} [m] σ^{'} (A^{2} (x_{i}) [m]) (y_{i} - f (x_{i}))

$\nabla_2 \mathcal{R}[1:p,m] =-2\times \sum_{i=1}^n x_i g'(z_i) z_i[m]\sigma'(A^2(x_i)[m]) (y_i-f(x_i))$

Here I used the R notation: $x[a:b]$ is the vector composed of the coordinates of $x$ from index $a$ to index $b$ .

robin girard
quelle

11

Back-propogation is a way of working out the derivative of the error function with respect to the weights, so that the model can be trained by gradient descent optimisation methods - it is basically just the application of the "chain rule". There isn't really much more to it than that, so if you are comfortable with calculus that is basically the best way to look at it.

If you are not comfortable with calculus, a better way would be to say that we know how badly the output units are doing because we have a desired output with which to compare the actual output. However we don't have a desired output for the hidden units, so what do we do? The back-propagation rule is basically a way of speading out the blame for the error of the output units onto the hidden units. The more influence a hidden unit has on a particular output unit, the more blame it gets for the error. The total blame associated with a hidden unit then give an indication of how much the input-to-hidden layer weights need changing. The two things that govern how much blame is passed back is the weight connecting the hidden and output layer weights (obviously) and the output of the hidden unit (if it is shouting rather than whispering it is likely to have a larger influence). The rest is just the mathematical niceties that turn that intuition into the derivative of the training criterion.

I'd also recommend Bishops book for a proper answer! ;o)

Dikran Marsupial
quelle

2

It's an algorithm for training feedforward multilayer neural networks (multilayer perceptrons). There are several nice java applets around the web that illustrate what's happening, like this one: http://neuron.eng.wayne.edu/bpFunctionApprox/bpFunctionApprox.html. Also, Bishop's book on NNs is the standard desk reference for anything to do with NNs.

Stephen Turner
quelle

In trying to build a permanent repository of high-quality statistical information in the form of questions & answers, we try to avoid link-only answers. If you're able, could you expand this, perhaps by giving a summary of the information at the link?

Glen_b -Reinstate Monica

Kann mir bitte jemand den Backpropagation-Algorithmus erklären? [Duplikat]

Antworten: