How it works
A Single neuron is not very powerful. If one organizes the neurons into a layered network with several layers one obtains models which are able to approximate general continuous nonlinear functions. It consists of one or more hidden layers and an output layer. An example can be seen below: ![[MLP.png]]
The learn trough an algorithm called Back-propagation.
Mathematical description
In matrix-notation:\(y=W\sigma (Vx + \beta)\) with input \(x \in \mathbb{R}^m\), output \(y \in \mathbb{R}^l\) and interconnection matrices \(W \in \mathbb{R}^{l\times n_h}\), \(V\in \mathbb{R}^{n_h \times m}\) for the output and hidden layer. The bias vector is \(\beta \in \mathbb{R}^{n_h}\) and consists of the threshold values of the \(n_h\) hidden neurons. This notation is more compact than the elementwise notation:
\[y_i = \sum_{r=1}^{n_h} w_{ir} \sigma (\sum_{j=1}^m v_{rj} x_j + \beta_r), i=1,\dots,l\]In these descriptions a linear Activation Function](/ai/Neural%20Networks/Activation%20Function.html) is taken for the output layer. Depending on the application one might choose other functions as well.
For a network with two hidden layers we get the following matrix-notation:\(y=W\sigma (V_2\sigma(V_1x + \beta_1) + \beta_2)\)