Clustering with Hypertools ========================== The cluster feature performs clustering analysis on the data (an arrray, dataframe, or list) and returns a list of cluster labels. The default clustering method is K-Means (argument 'KMeans') with MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, SpectralClustering and HDBSCAN also supported. Note that, if a list is passed, the arrays will be stacked and clustering will be performed *across* all lists (not within each list). Import Packages --------------- .. code:: ipython3 import hypertools as hyp from collections import Counter %matplotlib inline Load your data -------------- We will load one of the sample datasets. This dataset consists of 8,124 samples of mushrooms with various text features. .. code:: ipython3 geo = hyp.load('mushrooms') mushrooms = geo.get_data() We can peek at the first few rows of the dataframe using the pandas function ``head()`` .. code:: ipython3 mushrooms.head() .. raw:: html

	bruises	cap-color	cap-shape	cap-surface	gill-attachment	gill-color	gill-size	gill-spacing	habitat	odor	...	ring-type	spore-print-color	stalk-color-above-ring	stalk-color-below-ring	stalk-root	stalk-shape	stalk-surface-above-ring	stalk-surface-below-ring	veil-color	veil-type
0	t	n	x	s	f	k	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
1	t	y	x	s	f	k	b	c	g	a	...	p	n	w	w	c	e	s	s	w	p
2	t	w	b	s	f	n	b	c	m	l	...	p	n	w	w	c	e	s	s	w	p
3	t	w	x	y	f	n	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
4	f	g	x	s	f	k	b	w	g	n	...	e	n	w	w	e	t	s	s	w	p

5 rows × 22 columns

Obtain cluster labels --------------------- To obtain cluster labels, simply pass the data to ``hyp.cluster``. Since we have not specified a desired number of cluster, the default of 3 clusters is used (labels 0, 1, and 2). Additionally, since we have note specified a desired clustering algorithm, K-Means is used by default. .. code:: ipython3 labels = hyp.cluster(mushrooms) set(labels) .. parsed-literal:: {0, 1, 2} We can further examine the number of datapoints assigned each label. .. code:: ipython3 Counter(labels) .. parsed-literal:: Counter({0: 1296, 1: 5067, 2: 1761}) Specify number of cluster labels -------------------------------- You can also specify the number of desired clusters by setting the ``n_clusters`` argument to an integer number of clusters, as below. We can see that when we pass the int 10 to n\_clusters, 10 cluster labels are assigned. Since we have note specified a desired clustering algorithm, K-Means is used by default. .. code:: ipython3 labels_10 = hyp.cluster(mushrooms, n_clusters = 10) set(labels_10) .. parsed-literal:: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} Different clustering models --------------------------- You may prefer to use a clustering model other than K-Means. To do so, simply pass a string to the cluster argument specifying the desired clustering algorithm. In this case, we specify both the clustering model (HDBSCAN) and the number of clusters (10). .. code:: ipython3 labels_HDBSCAN = hyp.cluster(mushrooms, cluster='HDBSCAN') .. code:: ipython3 geo = hyp.plot(mushrooms, '.', hue=labels_10, title='K-means clustering') geo = hyp.plot(mushrooms, '.', hue=labels_HDBSCAN, title='HCBSCAN clustering') .. image:: cluster_files/cluster_20_0.png .. image:: cluster_files/cluster_20_1.png

	bruises	cap-color	cap-shape	cap-surface	gill-attachment	gill-color	gill-size	gill-spacing	habitat	odor	...	ring-type	spore-print-color	stalk-color-above-ring	stalk-color-below-ring	stalk-root	stalk-shape	stalk-surface-above-ring	stalk-surface-below-ring	veil-color	veil-type
0	t	n	x	s	f	k	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
1	t	y	x	s	f	k	b	c	g	a	...	p	n	w	w	c	e	s	s	w	p
2	t	w	b	s	f	n	b	c	m	l	...	p	n	w	w	c	e	s	s	w	p
3	t	w	x	y	f	n	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
4	f	g	x	s	f	k	b	w	g	n	...	e	n	w	w	e	t	s	s	w	p

	bruises	cap-color	cap-shape	cap-surface	gill-attachment	gill-color	gill-size	gill-spacing	habitat	odor	...	ring-type	spore-print-color	stalk-color-above-ring	stalk-color-below-ring	stalk-root	stalk-shape	stalk-surface-above-ring	stalk-surface-below-ring	veil-color	veil-type
0	t	n	x	s	f	k	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
1	t	y	x	s	f	k	b	c	g	a	...	p	n	w	w	c	e	s	s	w	p
2	t	w	b	s	f	n	b	c	m	l	...	p	n	w	w	c	e	s	s	w	p
3	t	w	x	y	f	n	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
4	f	g	x	s	f	k	b	w	g	n	...	e	n	w	w	e	t	s	s	w	p

	bruises	cap-color	cap-shape	cap-surface	gill-attachment	gill-color	gill-size	gill-spacing	habitat	odor	...	ring-type	spore-print-color	stalk-color-above-ring	stalk-color-below-ring	stalk-root	stalk-shape	stalk-surface-above-ring	stalk-surface-below-ring	veil-color	veil-type
0	t	n	x	s	f	k	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
1	t	y	x	s	f	k	b	c	g	a	...	p	n	w	w	c	e	s	s	w	p
2	t	w	b	s	f	n	b	c	m	l	...	p	n	w	w	c	e	s	s	w	p
3	t	w	x	y	f	n	n	c	u	p	...	p	k	w	w	e	e	s	s	w	p
4	f	g	x	s	f	k	b	w	g	n	...	e	n	w	w	e	t	s	s	w	p