8. Creating a mapper graph using computeMapper

This notebook will show how to construct a mapper graph from a point cloud using the computeMapper function.

[ ]:
# Standard imports

import matplotlib.pyplot as plt

8.1. Generate example data

In this case, we will use an example data set from the sklearn package.

[44]:
from sklearn.datasets import make_circles

number_of_points = 500
data, labels = make_circles(n_samples=number_of_points, factor=0.4, noise=0.05, random_state=0)

plt.scatter(data[:, 0], data[:, 1], c=labels)
plt.axis('scaled');
[44]:
numpy.ndarray
../_images/notebooks_compute_mapper_3_1.png

8.2. Computing an example mapper graph

We may compute the mapper graph of this shape using computeMapper. In this example, we use a cover from \(-1\) to \(1\) with 7 intervals overlapping 50%. The lens function used is just the first coordinate of each data point.

[43]:
from cereeberus import MapperGraph, computeMapper, cover
import numpy as np


import sklearn

graph = MapperGraph()
graph = computeMapper(pointcloud = data,
                      lensfunction=(lambda a : a[0]),
                      cover=cover(min=-1, max=1, numcovers=7, percentoverlap=.5),
                      clusteralgorithm=sklearn.cluster.DBSCAN(min_samples=2,eps=0.3).fit
                      )
graph.draw()
../_images/notebooks_compute_mapper_5_0.png

8.2.1. Breakdown of inputs

The computeMapper function takes 4 inputs:

  • A point cloud

  • A lens function

  • A cover

  • A clustering algorithm

Covers may be created using the cover method

[40]:
print(cover(min=-1, max=1, numcovers=4, percentoverlap=.5))
[(-1.125, -0.375), (-0.625, 0.125), (-0.125, 0.625), (0.375, 1.125)]

Covering sets may go beyond the specified range if they have a nonzero percentage overlap with other covering sets.

Both covers and point clouds may be input manually

[23]:
graph2 = computeMapper([(0.6, 0), (-0.1, 0.5)], (lambda a : a[0]), [(-1,0),(-0.5,0.5),(0,1)], "trivial")
graph2.draw()
../_images/notebooks_compute_mapper_11_0.png

computeMapper allows for any sklearn clustering algorithm to be used as input. It will also work with the trvial clustering.

[24]:
from sklearn.datasets import make_moons

number_of_points = 200

data, labels = make_moons(n_samples=number_of_points, noise=0.05, random_state=0)

val = 0
pointcloud = []
while val < number_of_points:
    pointcloud.append(data[val])
    val += 1
[25]:
import matplotlib.pyplot as plt

plt.scatter(data[:, 0], data[:, 1], c=labels)
plt.axis('scaled')
[25]:
(np.float64(-1.2259004307319845),
 np.float64(2.1731624220868624),
 np.float64(-0.6639113257043391),
 np.float64(1.162391196195627))
../_images/notebooks_compute_mapper_14_1.png
[26]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : a[1]), cover(min=-1, max=1, numcovers=7, percentoverlap=.4), sklearn.cluster.DBSCAN(min_samples=2,eps=0.3).fit)
graph.draw()
../_images/notebooks_compute_mapper_15_0.png
[27]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : a[1]), cover(min=-1, max=1, numcovers=7, percentoverlap=.4), sklearn.cluster.KMeans(n_clusters=4).fit)
graph.draw()
../_images/notebooks_compute_mapper_16_0.png
[28]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : a[1]), cover(min=-1, max=1, numcovers=7, percentoverlap=.4), sklearn.cluster.HDBSCAN(min_cluster_size=8).fit)
graph.draw()
/opt/anaconda3/envs/cereeberus/lib/python3.12/site-packages/sklearn/cluster/_hdbscan/hdbscan.py:722: FutureWarning: The default value of `copy` will change from False to True in 1.10. Explicitly set a value for `copy` to silence this warning.
  warn(
/opt/anaconda3/envs/cereeberus/lib/python3.12/site-packages/sklearn/cluster/_hdbscan/hdbscan.py:722: FutureWarning: The default value of `copy` will change from False to True in 1.10. Explicitly set a value for `copy` to silence this warning.
  warn(
/opt/anaconda3/envs/cereeberus/lib/python3.12/site-packages/sklearn/cluster/_hdbscan/hdbscan.py:722: FutureWarning: The default value of `copy` will change from False to True in 1.10. Explicitly set a value for `copy` to silence this warning.
  warn(
/opt/anaconda3/envs/cereeberus/lib/python3.12/site-packages/sklearn/cluster/_hdbscan/hdbscan.py:722: FutureWarning: The default value of `copy` will change from False to True in 1.10. Explicitly set a value for `copy` to silence this warning.
  warn(
/opt/anaconda3/envs/cereeberus/lib/python3.12/site-packages/sklearn/cluster/_hdbscan/hdbscan.py:722: FutureWarning: The default value of `copy` will change from False to True in 1.10. Explicitly set a value for `copy` to silence this warning.
  warn(
/opt/anaconda3/envs/cereeberus/lib/python3.12/site-packages/sklearn/cluster/_hdbscan/hdbscan.py:722: FutureWarning: The default value of `copy` will change from False to True in 1.10. Explicitly set a value for `copy` to silence this warning.
  warn(
../_images/notebooks_compute_mapper_17_1.png
[29]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : a[1]), cover(min=-1, max=1, numcovers=7, percentoverlap=.4), "trivial")
graph.draw()
../_images/notebooks_compute_mapper_18_0.png

computeMapper orders the coveringsets it takes as input, preserving the location of each for better computing the distances between graphs

Notice where the labels for the graph start

[30]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : a[1]), cover(min=-10, max=10, numcovers=40, percentoverlap=.4), "trivial")
graph.draw()
../_images/notebooks_compute_mapper_21_0.png

computeMapper accepts any function recognizable by numpy as input

[31]:
from sklearn.datasets import make_swiss_roll
number_of_points = 300

data, labels = make_swiss_roll(n_samples=number_of_points, noise=0.1, random_state=0)

val = 0
pointcloud = []
while val < number_of_points:
    pointcloud.append((data[val][0],data[val][2]))
    val += 1
[32]:
import matplotlib.pyplot as plt

plt.scatter(data[:, 0], data[:, 2], c=labels)
plt.axis('scaled');
../_images/notebooks_compute_mapper_24_0.png
[33]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : a[1]), cover(min=-12, max=15, numcovers=10, percentoverlap=.4), sklearn.cluster.DBSCAN(min_samples=2,eps=3).fit)
graph.draw()
../_images/notebooks_compute_mapper_25_0.png
[34]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : a[1] * a[0]), cover(min=-200, max=200, numcovers=7, percentoverlap=.4), sklearn.cluster.DBSCAN(min_samples=2,eps=3).fit)
graph.draw()
../_images/notebooks_compute_mapper_26_0.png
[35]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : np.sqrt(a[0]+15)), cover(min=0, max=6, numcovers=7, percentoverlap=.4), sklearn.cluster.DBSCAN(min_samples=2,eps=3).fit)
graph.draw()
../_images/notebooks_compute_mapper_27_0.png
[36]:
graph = MapperGraph()
graph = computeMapper(pointcloud, (lambda a : np.sqrt(a[0]**2 + a[1]**2)), cover(min=0, max=15, numcovers=10, percentoverlap=.4), sklearn.cluster.DBSCAN(min_samples=2,eps=3).fit)
graph.draw()
../_images/notebooks_compute_mapper_28_0.png