svd_mpsic/jupyter.ipynb
2022-04-25 12:48:08 +02:00

725 lines
22 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"[Jupyter](http://jupyter.org/) is an interactive browser based notebook environment where we can combine text, code execution and visualization. It supports multiple programming languages through it's language specific kernel plugins. However, it is widely used with Python in scientific computing and data science communities. The availability of large number of high quality open source libraries useful for many tasks in scientific computing,numerical linear algebra, machine learning and visualization ensures that Python is being widely used in these fields. Jupyter notebooks are an excellent environment for learning and teaching because of the interactivity.\n",
"\n",
"In this short blogpost, I will explore few topics to illustrate the interactivity of the jupyter environment and the availability of high quality libraries in the python ecosystem.\n",
"- Montecarlo calculation of $\\pi$\n",
"- Image Compression using Singular value decomposition"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# A note on installation of jupyter"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Installation of jupyter and other important packages individually is cumbersome. Thanks to [Anaconda](https://www.anaconda.com/products/individual) we have Anaconda python distribution in which almost all of the useful packages are bundled. Install anaconda.\n",
"\n",
"But it is even better not setting up any python environment at all on your computer. Instead, you can use free online python environments- [colab notebooks](https://colab.research.google.com/) from google or [sagemaker studio lab notebook](https://studiolab.sagemaker.aws/) from aws. Watch [this video](https://www.youtube.com/watch?v=SP-WBt2b54o) from the twitter user [1littlecoder](https://twitter.com/1littlecoder) on getting started with sagemaker studio lab."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
" # Montecarlo calculation of $\\pi$"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"I had earlier written about calculating the mathematical constant $\\pi$ using [montecarlo method](https://medium.com/@rameshputalapattu/life-of-pi-a-gophers-tale-2e6922b80792). It involves generating random points on a unit square and counting the number of points inside the unit quarter circle. We will write a function ```mc_pi``` to calculate $\\pi$. In that function, we will also visualize the process of montecarlo simulation using matplotlib library."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"#This line is required to display visualizations in the browser\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def mc_pi(ntrials):\n",
" \"\"\"\n",
" calculate the value of pi using montecarlo method and visualize the process\n",
" \n",
" \"\"\"\n",
" x = np.random.random(ntrials)\n",
" y = np.random.random(ntrials)\n",
" #masking\n",
" inside_circle = x**2+y**2 < 1\n",
" unit_circle_x = np.linspace(0,1,100)\n",
" unit_circle = [unit_circle_x,np.sqrt(1.0-unit_circle_x**2)]\n",
" plt.plot(*unit_circle,color='black')\n",
" plt.scatter(x[inside_circle],y[inside_circle],marker='.',color='blue',s=1)\n",
" plt.scatter(x[~inside_circle],y[~inside_circle],marker='.',color='red',s=1)\n",
" plt.title(\"value of $\\pi$=\"+str( 4.0*np.sum(inside_circle)/float(ntrials)))\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"We are able to write the function ```mc_pi``` without making use of any explicit ```for``` loops - thanks to vectorization features in numpy library. Due to a concept called broadcasting in numpy, we are able to subtract a vector from a scalar (1.0 - unit_circle_x**2 ) to compute the y co-ordinate of the unit circle. \n",
"\n",
"Now we will make use of ipywidgets module to pass the parameter interactively to the function ```mc_pi```. ipywidgets module provide widgets to generate the UI controls in the notebook itself with which the user can interact. We can drag the slider and observe how the value of pi calculated by mc_pi function changes with the number of trials."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fa2a66bb49664825b1d21a532749b123",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(IntSlider(value=49991, description='ntrials', max=100000, min=1, step=10), Output()), _d…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from ipywidgets import interact,interactive,interact_manual\n",
"mc_widget=interactive(mc_pi,ntrials=(1,100000,10));\n",
"mc_widget"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### SVD and Image compression"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Now we will explore how to apply Singular Value Decomposition of a matrix to the problem of image compression. SVD decomposes a rectangular matrix $M$ to a three parts.\n",
"$M=U\\Sigma V^T$ -\n",
"- $U$ - matrix of left singular vectors in the columns\n",
"- $\\Sigma$ - diagonal matrix with singular values\n",
"- $V$ - matrix of right singular vectors in the columns\n",
"\n",
"SVD in effect involves reconstructing the original matrix as a linear combination of several rank one matrices. A rank one matrix can be expressed as a outer product of two column vectors. \n",
"\n",
"$M=\\sigma_1u_1v_1^T+\\sigma_2u_2v_2^T+\\sigma_3u_3v_3^T+\\sigma_3u_3v_3^T+....$ . \n",
"A matrix of rank r will have r terms of these.\n",
"\n",
"Here $\\sigma_1,\\sigma_2,\\sigma_3 ...$ are singular values. $u_1,u_2,u_3 ...$ and $v_1,v_2,v_3 ...$ are left and right singular vectors respectively.\n",
"\n",
"Image compression using SVD involves taking advantage of the fact that very few of the singular values are large. Although images from the real world are of full rank, they have low effective rank which means that only few of the singular values of the SVD of images will be large."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### skimage image processing library\n",
"\n",
"We will use skimage image processing library (from sci-kit family of packages) for working with images in python. skimage has a module called data which makes available a set of images for exploration. We will load some images and convert them into a gray scale format. These images are stored in a python dict object gray_images."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from skimage import data\n",
"from skimage.color import rgb2gray"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from skimage import img_as_ubyte,img_as_float\n",
"gray_images = {\n",
" \"cat\":rgb2gray(img_as_float(data.chelsea())),\n",
" \"astro\":rgb2gray(img_as_float(data.astronaut())),\n",
" \"camera\":data.camera(),\n",
" \"coin\": data.coins(),\n",
" \"clock\":data.clock(),\n",
" \"blobs\":data.binary_blobs(),\n",
" \"coffee\":rgb2gray(img_as_float(data.coffee()))\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### svd in python\n",
"We will use ```numpy.linalg``` library's ```svd``` function to compute svd of a matrix in python. The svd function returns U,s,V .\n",
" - U has left singular vectors in the columns\n",
" - s is rank 1 numpy array with singular values\n",
" - V has right singular vectors in the rows -equivalent to $V^T$ in traditional linear algebra literature\n",
" \n",
"The reconstructed approximation of the original matrix is done using a subset of singular vectors as below in the ```compress_svd``` function . We use numpy array slicing to select k singular vectors and values. Instead of storing $m\\times n$ values for the original image, we can now store $k(m+n)+k$ values\n",
"\n",
" reconst_matrix = np.dot(U[:,:k],np.dot(np.diag(s[:k]),V[:k,:]))\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from numpy.linalg import svd\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compress_svd(image,k):\n",
" \"\"\"\n",
" Perform svd decomposition and truncated (using k singular values/vectors) reconstruction\n",
" returns\n",
" --------\n",
" reconstructed matrix reconst_matrix, array of singular values s\n",
" \"\"\"\n",
" U,s,V = svd(image,full_matrices=False)\n",
" reconst_matrix = np.dot(U[:,:k],np.dot(np.diag(s[:k]),V[:k,:]))\n",
" \n",
" return reconst_matrix,s"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Compress gray scale images\n",
"The function ```compress_show_gray_images``` below takes in the image name (img_name) and number of singular values/vectors(k) to be used in the compressed reconstruction. It also plots the singular values and the image."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compress_show_gray_images(img_name,k):\n",
" \"\"\"\n",
" compresses gray scale images and display the reconstructed image.\n",
" Also displays a plot of singular values\n",
" \"\"\"\n",
" image=gray_images[img_name]\n",
" original_shape = image.shape\n",
" reconst_img,s = compress_svd(image,k)\n",
" fig,axes = plt.subplots(1,2,figsize=(8,5))\n",
" axes[0].plot(s)\n",
" compression_ratio =100.0* (k*(original_shape[0] + original_shape[1])+k)/(original_shape[0]*original_shape[1])\n",
" axes[1].set_title(\"compression ratio={:.2f}\".format(compression_ratio)+\"%\")\n",
" axes[1].imshow(reconst_img,cmap='gray')\n",
" axes[1].axis('off')\n",
" fig.tight_layout()\n",
" \n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Use the below interactive widget to explore how the quality of the reconstructed image varies with $k$"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_k_max(img_name):\n",
" \"\"\"\n",
" utility function for calculating max value of the slider range\n",
" \"\"\"\n",
" img = gray_images[img_name]\n",
" m,n = img.shape\n",
" return m*n/(m+n+1)\n",
"\n",
"#set up the widgets\n",
"import ipywidgets as widgets\n",
"\n",
"list_widget = widgets.Dropdown(options=list(gray_images.keys()))\n",
"int_slider_widget = widgets.IntSlider(min=1,max=compute_k_max('cat'))\n",
"def update_k_max(*args):\n",
" img_name=list_widget.value\n",
" int_slider_widget.max = compute_k_max(img_name)\n",
"list_widget.observe(update_k_max,'value')\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8c4f4db505f14ff6b8176f0e1565bf4b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(Dropdown(description='img_name', options=('cat', 'astro', 'coffee'), value='cat'), IntSl…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"interact(compress_show_gray_images,img_name=list_widget,k=int_slider_widget);"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Load color images"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"color_images = {\n",
" \"cat\":img_as_float(data.chelsea()),\n",
" \"astro\":img_as_float(data.astronaut()),\n",
" \"coffee\":img_as_float(data.coffee())\n",
" \n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Compress color images\n",
"\n",
"Color images are represented in python as 3 dimensional numpy arrays - the third dimension to represent the color values (red,green blue). However, svd method is applicable to two dimensional matrices. So we have to find a way to convert the 3 dimensional array to 2 dimensional arrays, apply svd and reconstruct it back as a 3 dimensional array . There are two ways to do it. We will show both these methods below .\n",
" - reshape method\n",
" - Layer method\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"#### Reshape method to compress a color image\n",
"This method involves flattening the third dimension of the image array into the second dimension using numpy's reshape method .\n",
" \n",
" ``` image_reshaped = image.reshape((original_shape[0],original_shape[1]*3))```\n",
"\n",
"The svd decomposition is applied on the resulting reshaped array and reconstructed with the desired number of singular values/vectors. The image array is reshaped back to the three dimensions by another call to reshape method.\n",
" \n",
" ```image_reconst = image_reconst.reshape(original_shape)```\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compress_show_color_images_reshape(img_name,k):\n",
" \"\"\"\n",
" compress and display the reconstructed color image using the reshape method \n",
" \"\"\"\n",
" image = color_images[img_name]\n",
" original_shape = image.shape\n",
" image_reshaped = image.reshape((original_shape[0],original_shape[1]*3))\n",
" image_reconst,_ = compress_svd(image_reshaped,k)\n",
" image_reconst = image_reconst.reshape(original_shape)\n",
" compression_ratio =100.0* (k*(original_shape[0] + 3*original_shape[1])+k)/(original_shape[0]*original_shape[1]*original_shape[2])\n",
" plt.title(\"compression ratio={:.2f}\".format(compression_ratio)+\"%\")\n",
" plt.imshow(image_reconst)"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Here is the interactive widget to explore image compression of color images using the reshape method. By dragging the slider to vary $k$, observe how image quality varies. Also, we can explore different images by selecting through the drop down widget."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_k_max_color_images(img_name):\n",
" image = color_images[img_name]\n",
" original_shape = image.shape\n",
" return (original_shape[0]*original_shape[1]*original_shape[2])//(original_shape[0] + 3*original_shape[1] + 1)\n",
"\n",
"\n",
"list_widget = widgets.Dropdown(options=list(color_images.keys()))\n",
"int_slider_widget = widgets.IntSlider(min=1,max=compute_k_max_color_images('cat'))\n",
"def update_k_max_color(*args):\n",
" img_name=list_widget.value\n",
" int_slider_widget.max = compute_k_max_color_images(img_name)\n",
"list_widget.observe(update_k_max_color,'value')"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d1463b3eff5d497ba77a77f6d176236e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(Dropdown(description='img_name', options=('cat', 'astro', 'coffee'), value='cat'), IntSl…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"interact(compress_show_color_images_reshape,img_name=list_widget,k=int_slider_widget);"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"### Layers method to compress color images\n",
"In the function ```compress_show_color_images_layer```, we treat a color image as a stack of 3 seperate two dimensional images (Red,blue and green layers) . We apply the truncated svd reconstruction on each two dimensional layer seperately.\n",
"\n",
"```image_reconst_layers = [compress_svd(image[:,:,i],k)[0] for i in range(3)]```\n",
"\n",
"And we put back the reconstructed layers together.\n",
"\n",
"```\n",
"image_reconst = np.zeros(image.shape)\n",
"for i in range(3):\n",
" image_reconst[:,:,i] = image_reconst_layers[i]\n",
"```\n",
"\n",
"\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compress_show_color_images_layer(img_name,k):\n",
" \"\"\"\n",
" compress and display the reconstructed color image using the layer method \n",
" \"\"\"\n",
" image = color_images[img_name]\n",
" original_shape = image.shape\n",
" image_reconst_layers = [compress_svd(image[:,:,i],k)[0] for i in range(3)]\n",
" image_reconst = np.zeros(image.shape)\n",
" for i in range(3):\n",
" image_reconst[:,:,i] = image_reconst_layers[i]\n",
" \n",
" compression_ratio =100.0*3* (k*(original_shape[0] + original_shape[1])+k)/(original_shape[0]*original_shape[1]*original_shape[2])\n",
" plt.title(\"compression ratio={:.2f}\".format(compression_ratio)+\"%\")\n",
" \n",
" plt.imshow(image_reconst)"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Here is the widget to explore layers method of compressing color images."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"def compute_k_max_color_images_layers(img_name):\n",
" image = color_images[img_name]\n",
" original_shape = image.shape\n",
" return (original_shape[0]*original_shape[1]*original_shape[2])// (3*(original_shape[0] + original_shape[1] + 1))\n",
"\n",
"\n",
"list_widget = widgets.Dropdown(options=list(color_images.keys()))\n",
"int_slider_widget = widgets.IntSlider(min=1,max=compute_k_max_color_images_layers('cat'))\n",
"def update_k_max_color_layers(*args):\n",
" img_name=list_widget.value\n",
" int_slider_widget.max = compute_k_max_color_images_layers(img_name)\n",
"list_widget.observe(update_k_max_color_layers,'value')"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "aed3940b3f2147cda2892b104113dcfe",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(Dropdown(description='img_name', options=('cat', 'astro', 'coffee'), value='cat'), IntSl…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"interact(compress_show_color_images_layer,img_name=list_widget,k=int_slider_widget);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 1
}