hypertools.cluster

hypertools.cluster(x, cluster='KMeans', n_clusters=3, ndims=None, format_data=True)[source]

Performs clustering analysis and returns a list of cluster labels

Parameters
xA Numpy array, Pandas Dataframe or list of arrays/dfs

The data to be clustered. You can pass a single array/df or a list. If a list is passed, the arrays will be stacked and the clustering will be performed across all lists (i.e. not within each list).

clusterstr or dict

Model to use to discover clusters. Support algorithms are: KMeans, MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, SpectralClustering and HDBSCAN (default: KMeans). Can be passed as a string, but for finer control of the model parameters, pass as a dictionary, e.g. reduce={‘model’ : ‘KMeans’, ‘params’ : {‘max_iter’ : 100}}. See scikit-learn specific model docs for details on parameters supported for each model.

n_clustersint

Number of clusters to discover. Not required for HDBSCAN.

format_databool

Whether or not to first call the format_data function (default: True).

ndimsNone

Deprecated argument. Please use new analyze function to perform combinations of transformations

Returns
cluster_labelslist

An list of cluster labels