.. _sphx_glr_auto_examples_plot_corpus.py:


=============================
Defining a custom corpus for plotting text
=============================

By default, the text samples will be transformed into a vector of word counts
and then modeled using Latent Dirichlet Allocation (# of topics = 100) using a
model fit to a large sample of wikipedia pages.  However, you can optionally
pass your own text to fit the semantic model. To do this define corpus as a
list of documents (strings). A topic model will be fit on the fly and the text
will be plotted.


.. image:: /auto_examples/images/sphx_glr_plot_corpus_001.png
    :align: center


.. code-block:: python


    # Code source: Andrew Heusser
    # License: MIT

    # load hypertools
    import hypertools as hyp

    # load the data
    text_samples = ['i like cats alot', 'cats r pretty cool', 'cats are better than dogs',
            'dogs rule the haus', 'dogs are my jam', 'dogs are a mans best friend',
            'i haz a cheezeburger?']

    # plot it
    hyp.plot(text_samples, 'o', corpus=text_samples)

**Total running time of the script:** ( 0 minutes  0.206 seconds)


.. only :: html

 .. container:: sphx-glr-footer


  .. container:: sphx-glr-download

     :download:`Download Python source code: plot_corpus.py <plot_corpus.py>`


  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: plot_corpus.ipynb <plot_corpus.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_