hypertools.cluster¶

hypertools.cluster(x, cluster='KMeans', n_clusters=3, ndims=None, format_data=True)[source]¶

Performs clustering analysis and returns a list of cluster labels

Parameters

xA Numpy array, Pandas Dataframe or list of arrays/dfs: The data to be clustered. You can pass a single array/df or a list. If a list is passed, the arrays will be stacked and the clustering will be performed across all lists (i.e. not within each list).
clusterstr or dict: Model to use to discover clusters. Support algorithms are: KMeans, MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, SpectralClustering and HDBSCAN (default: KMeans). Can be passed as a string, but for finer control of the model parameters, pass as a dictionary, e.g. reduce={‘model’ : ‘KMeans’, ‘params’ : {‘max_iter’ : 100}}. See scikit-learn specific model docs for details on parameters supported for each model.
n_clustersint: Number of clusters to discover. Not required for HDBSCAN.
format_databool: Whether or not to first call the format_data function (default: True).
ndimsNone: Deprecated argument. Please use new analyze function to perform combinations of transformations

Returns