tags:
  - dataviz
  - clustering
  - manifold
  - graph
  - mapping
  - models
  - GPU
  - interpretability
  - unsupervised

Original source : https://github.com/lmcinnes/umap

UMAP stands for Uniform Manifold Approximation and Projection. It is a dimensionality reduction technique that aims to preserve distance in high dimensional space unto the lower dimension projected space, in general 2D for visualisation.

It is similar to T-SNE, so it is a non linear method for dimension reduction, it aims to represent high-dimensional data in a lower-dimensional space while preserving both local and global structure. However, UMAP utilizes a different mathematical approach than t-SNE, which can lead to different trade-offs and results.

UMAP is based on the concept of constructing a fuzzy topological representation of the high-dimensional data and then optimizing the low-dimensional representation to be as close as possible to this fuzzy topological structure. It leverages ideas from manifold learning, Graph Theory, and Riemannian geometry. In particular it uses a Riemannian manifold as a hypothethical underlying structure.

Math details :

Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data:

1. The data is uniformly distributed on a Riemannian manifold;
2. The Riemannian metric is locally constant (or can be approximated as such);
3. The manifold is locally connected.

From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.

UMAP in Python

To use UMAP in python we have two options :

Using the CPU only with umap-learn , the original library with all the API details and examples.
The Nvidia RapidsAI GPU implementation using GPU only (faster if we have enough memory).