tags:
- article
- fundamentalsWhat is a neural network ?

A neural network, in essence, is an universal function approximator.
In the picture above, the blue function is approximated by a 6-neurons network, each responsible for a segment of the fitted curve, using a Rectified linear unit function, noted as Relu, that is detailed in the Activation functions section.
In other words, given a set of samples of data from an unknown Probability Distribution, we can approximate this unknown distribution given enough samples. This becomes useful in many fields of science, where given measurements, we wish to predict some continuous future behavior (regression) or discrete category (classification).
From Statistics, we assume that any measure of real-world phenomenon is a sample from a Probability Distribution, so we can consider that neural networks are an Anything-to-Anything model, as long as we can numerically quantify and measure it.
There are many methods to achieve exactly that, from the simple linear to polynomial regressions, to decision trees, Random forest and most of Machine Learning.
From # Universal approximation theorem, it states that any function can be approximated by an arbitraily large neural network (is proven in the general sense ) :
This paper rigorously establishes that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. In this sense, multilayer feedforward networks are a class of universal approximators.
In simple terms, a Borel measurable function is like a rule that ensures when you measure something in the output, you can also measure the corresponding part in the input using the same kind of measurement tools (Borel sets). This is important in fields like probability and statistics to ensure that functions behave nicely with the kinds of sets we can measure.
In practice, we limit our measured functions to differentiable functions to be able to compute its gradient for the famous Gradient Descent optimization algorithm.

Neural networks are like logical lego lasagna, they are modular like lego bricks by design and stack layer by layer like a lasagna. When broken down to the modular level, they are pretty simple !
To understand more complex architectures, you only need to learn the new lego blocks, the rest remains mostly the same !
This allows also for hardware optimization because most computing are parallelizable on special hardware such as Graphical Processing Units or GPUs, or even AI-specialized hardware such as TPU.
For example, the activation for each neuron in a layer can be computed independently, and if we train the data by batches or packets, we can compute the neuron activation for each data point in the batch in parallel.
for example : given a batch of 64 samples, with a 64-neurons layer, we can compute
A perceptron is a fundamental concept in the field of artificial intelligence and machine learning, serving as the basic building block of neural networks. A perceptron is a simple model of a neuron, the basic unit of the brain.
In summary, a perceptron is a simple model of a neuron that takes inputs, applies weights, and produces an output based on an activation function. It was a pioneering concept in the field of artificial intelligence, paving the way for more advanced neural networks.

How it works :
This function decides whether the perceptron "fires" (produces an output) based on the weighted sum of inputs. Commonly, a step function is used, which outputs 1 if the sum exceeds a threshold and 0 otherwise.
The most commonly used is the Rectified linear unit function, or ReLu :

So for a given neuron of
Without going into details, there is a whole family of activation functions used in practice depending on the problem, and is an important design choice when building a neural network.

The most simple architecture (and the first) to exist is the following, connecting multiple perceptrons in layers, hence the name :

A Multi-Layer Perceptron (MLP) is a type of artificial neural network that consists of multiple layers of interconnected nodes, or "neurons," inspired by the structure of the human brain.
Structure:
How It Works:
Learning Process:
Applications:
In summary, a Multi-Layer Perceptron is a neural network with multiple layers of interconnected nodes that learn to recognize patterns in data through a process of weight adjustment and using the backpropagation algorithm. It is a foundational model in the field of deep learning.

The main algorithm behind training neural networks is called backpropagation algorithm, and will be detailed further in its own article.
As a video is worth a thousand words, the best visual intuition for backpropagation is here from 3blue1Brown # Backpropagation, intuitively | DL3.
In short, the backpropagation algorithm is a fundamental method used to train artificial neural networks, including Multi-Layer Perceptrons (MLPs).
It works by minimizing the error between the network's predicted outputs and the actual target values through a process called Gradient Descent. This algorithm is a subclass of algorithm dedicated to solving optimization problems, where we find the minimum of a given function as fast as possible.
Two great videos covering this topic to build intuition :
# Watching Neural Networks Learn
# Gradient Descent vs Evolution | How Neural Networks Learn
During training, data is fed forward through the network to generate predictions. The error is then calculated using a loss function, which compares the prediction to the provided ground-truth, and this error is propagated backward through the network. It is this error, or loss, that is optimized through the Gradient Descent.
The algorithm computes the gradient of the loss function with respect to each weight by applying the chain rule of calculus, determining how much each weight contributed to the error. The weights are then updated in the opposite direction of the gradient to reduce the error, iteratively improving the network's performance.
This process is repeated over many epochs (whole dataset), allowing the network to learn and refine its internal representations of the data.
Voilà ! You now have a fully trained neural network ! 😎
There are many more steps to building a model fit for a complex problem, that are outside the model itself such as :
hyperparameter tuning
Neural network architecture design such as Multi-task Learning or Multi-output Regression Neural Network
Hardware optimization with GPU for local ML workflow
and language specific tasks such as learning python and its machine learning librairies such as Tensorflow, Keras, Parallelism in python, Running jupyter or IDE on WSL2, or installing Linux for GPU usage using WSL2 Ubuntu 22.04+Windows 10.