{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Interpolating missing data with probabalistic PCA\n\nWhen you pass a matrix with with missing data, hypertools will attempt to\nfill in the values using probabalistic principal components analysis (PPCA).\nHere is an example where we generate some synthetic data, remove some of the\nvalues, and then use PPCA to interpolate those missing values. Then, we plot\nboth the original and data with missing values together to see how it performed.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Code source: Andrew Heusser\n# License: MIT\n\n# import\nfrom scipy.linalg import toeplitz\nimport numpy as np\nfrom copy import copy\nimport hypertools as hyp\n\n# simulate data\nK = 10 - toeplitz(np.arange(10))\ndata1 = np.cumsum(np.random.multivariate_normal(np.zeros(10), K, 250), axis=0)\ndata2 = copy(data1)\n\n# simulate missing data\nmissing = .1\ninds = [(i,j) for i in range(data2.shape[0]) for j in range(data2.shape[1])]\nmissing_data = [inds[i] for i in np.random.choice(int(len(inds)), int(len(inds)*missing))]\nfor i,j in missing_data:\n data2[i,j]=np.nan\n\n# plot\nhyp.plot([data1, data2], linestyle=['-',':'], legend=['Original', 'PPCA'])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 0 }