Clustering with Hypertools ========================== The cluster feature performs clustering analysis on the data (an arrray, dataframe, or list) and returns a list of cluster labels. The default clustering method is K-Means (argument 'KMeans') with MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, SpectralClustering and HDBSCAN also supported. Note that, if a list is passed, the arrays will be stacked and clustering will be performed *across* all lists (not within each list). Import Packages --------------- .. code:: ipython3 import hypertools as hyp from collections import Counter %matplotlib inline Load your data -------------- We will load one of the sample datasets. This dataset consists of 8,124 samples of mushrooms with various text features. .. code:: ipython3 geo = hyp.load('mushrooms') mushrooms = geo.get_data() We can peek at the first few rows of the dataframe using the pandas function ``head()`` .. code:: ipython3 mushrooms.head() .. raw:: html
bruises | cap-color | cap-shape | cap-surface | gill-attachment | gill-color | gill-size | gill-spacing | habitat | odor | ... | ring-type | spore-print-color | stalk-color-above-ring | stalk-color-below-ring | stalk-root | stalk-shape | stalk-surface-above-ring | stalk-surface-below-ring | veil-color | veil-type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | t | n | x | s | f | k | n | c | u | p | ... | p | k | w | w | e | e | s | s | w | p |
1 | t | y | x | s | f | k | b | c | g | a | ... | p | n | w | w | c | e | s | s | w | p |
2 | t | w | b | s | f | n | b | c | m | l | ... | p | n | w | w | c | e | s | s | w | p |
3 | t | w | x | y | f | n | n | c | u | p | ... | p | k | w | w | e | e | s | s | w | p |
4 | f | g | x | s | f | k | b | w | g | n | ... | e | n | w | w | e | t | s | s | w | p |
5 rows × 22 columns