UMAP
Original source : https://github.com/lmcinnes/umap
UMAP stands for Uniform Manifold Approximation and Projection. It is a dimensionality reduction technique that aims to preserve distance in high dimensional space unto the lower dimension projected space, in general 2D for visualisation.
It is similar to T-SNE, so it is a non linear method for dimension reduction, it aims to represent high-dimensional data in a lower-dimensional space while preserving both local and global structure. However, UMAP utilizes a different mathematical approach than t-SNE, which can lead to different trade-offs and results.
UMAP is based on the concept of constructing a fuzzy topological representation of the high-dimensional data and then optimizing the low-dimensional representation to be as close as possible to this fuzzy topological structure. It leverages ideas from manifold learning, Graph Theory, and Riemannian geometry. In particular it uses a Riemannian manifold as a hypothethical underlying structure.
Math details :
Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data:
1. The data is uniformly distributed on a Riemannian manifold;
2. The Riemannian metric is locally constant (or can be approximated as such);
3. The manifold is locally connected.
From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.
UMAP in Python
To use UMAP in python we have two options :
- Using the CPU only with umap-learn , the original library with all the API details and examples.
- The Nvidia RapidsAI GPU implementation using GPU only (faster if we have enough memory).
