.. _sphx_glr_auto_examples_plot_corpus.py: ============================= Defining a custom corpus for plotting text ============================= By default, the text samples will be transformed into a vector of word counts and then modeled using Latent Dirichlet Allocation (# of topics = 100) using a model fit to a large sample of wikipedia pages. However, you can optionally pass your own text to fit the semantic model. To do this define corpus as a list of documents (strings). A topic model will be fit on the fly and the text will be plotted. .. image:: /auto_examples/images/sphx_glr_plot_corpus_001.png :align: center .. code-block:: python # Code source: Andrew Heusser # License: MIT # load hypertools import hypertools as hyp # load the data text_samples = ['i like cats alot', 'cats r pretty cool', 'cats are better than dogs', 'dogs rule the haus', 'dogs are my jam', 'dogs are a mans best friend', 'i haz a cheezeburger?'] # plot it hyp.plot(text_samples, 'o', corpus=text_samples) **Total running time of the script:** ( 0 minutes 0.206 seconds) .. only :: html .. container:: sphx-glr-footer .. container:: sphx-glr-download :download:`Download Python source code: plot_corpus.py ` .. container:: sphx-glr-download :download:`Download Jupyter notebook: plot_corpus.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_