{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Normalization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `normalize` is a helper function to z-score your data. This is useful if your features (columns) are scaled differently within or across datasets. By default, hypertools normalizes *across* the columns of all datasets passed, but also affords the option to normalize columns *within* individual lists. Alternatively, you can also normalize each row. The function returns an array or list of arrays where the columns or rows are z-scored (output type same as input type)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import hypertools as hyp\n",
    "import numpy as np\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate synthetic data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we generate two sets of synthetic data. We pull points randomly from a multivariate normal distribution for each set, so the sets will exhibit unique statistical properties."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "x1 = np.random.randn(10,10)\n",
    "x2 = np.random.randn(10,10)\n",
    "\n",
    "c1 = np.dot(x1, x1.T)\n",
    "c2 = np.dot(x2, x2.T)\n",
    "\n",
    "m1 = np.zeros([1,10])\n",
    "m2 = 10 + m1\n",
    "\n",
    "data1 = np.random.multivariate_normal(m1[0], c1, 100)\n",
    "data2 = np.random.multivariate_normal(m2[0], c2, 100)\n",
    "\n",
    "data = [data1, data2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "geo = hyp.plot(data, '.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Normalizing (Specified Cols or Rows)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, to specify a different normalization, pass one of the following arguments as a string, as shown in the examples below.\n",
    "\n",
    "+ 'across' - columns z-scored across passed lists (default)\n",
    "+ 'within' - columns z-scored within passed lists\n",
    "+ 'row' - rows z-scored "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Normalizing 'across'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you normalize 'across', all of the data is stacked/combined, and the normalization is done on the columns of the full dataset. Then the data is split back into separate elements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "norm = hyp.normalize(data, normalize = 'across')\n",
    "geo = hyp.plot(norm, '.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Normalizing 'within'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you normalize 'within', normalization is done on the columns of each element of the data, separately. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "norm = hyp.normalize(data, normalize = 'within')\n",
    "geo = hyp.plot(norm, '.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Normalizing by 'row'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "norm = hyp.normalize(data, normalize = 'row')\n",
    "geo = hyp.plot(norm, '.')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}