Linear regression model evaluation metrics using Python

*The contents of this post or Jupyter Notebook is best viewed at this link. It even allows you to run the notebook in Collab or Binder. *

In this post, we will cover the common metrics used for evaluation for linear regression model :-

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume we have following set of data\n",
    "\n",
    "| X | Y |\n",
    "| --- | --- |\n",
    "| 20 | 23 |\n",
    "| 21 | 21 |\n",
    "| 22 | 26 |\n",
    "| 23 | 22 |\n",
    "| 24 | 25 |\n",
    "| 25 | 24 |\n",
    "\n",
    "For this data, we get this linear regression using this [calculator](http://www.alcula.com/calculators/statistics/linear-regression/)\n",
    "\n",
    "y = 15.14 + 0.37x\n",
    "\n",
    "We can solve this equation and get the value of $\\hat y$\n",
    "\n",
    "| X | Y | $ \\hat Y$ |\n",
    "| --- | --- | --- |\n",
    "| 20 | 23 | 22.54 |\n",
    "| 21 | 21 | 22.91 |\n",
    "| 22 | 26 | 23.28 |\n",
    "| 23 | 22 | 23.65 |\n",
    "| 24 | 25 | 24.02 |\n",
    "| 25 | 24 | 24.39 |\n",
    "\n",
    "We can evaluate this regression line in terms of different error metrics :-\n",
    "\n",
    "**MAE (Mean Absolute Error) :-**\n",
    "\n",
    "$$MAE = (\\frac{1}{n})\\sum_{i=1}^{n}\\left | y_{i} - \\hat y_{i} \\right |$$\n",
    "\n",
    "where y = actual value in the data set ;  $\\hat y$ = value computed by solving the regression equation\n",
    "\n",
    "1. Calculate the difference between Y and $\\hat Y$\n",
    "2. Get the absolute values\n",
    "3. Take the mean/average i.e. divide by number of elements\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAE using Python: % 1.3516666666666666\n",
      "MAE using Sklearn: %  1.3516666666666666\n",
      "MAE using Numpy: %  1.3516666666666666\n"
     ]
    }
   ],
   "source": [
    "X = [20, 21, 22, 23, 24, 25]\n",
    "Y = [23, 21, 26, 22, 25, 24]\n",
    "Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
    "\n",
    "# Core Python\n",
    "\n",
    "n = len(X) # Use length of either X or Y to get number of elements\n",
    "s = 0\n",
    "for i in range(0,n):\n",
    "    s += abs(Y[i] - Y_BAR[i])\n",
    "MAE = s/n\n",
    "print (\"MAE using Python: %\", MAE)\n",
    "\n",
    "# Using Scikit-Learn Library\n",
    "\n",
    "from sklearn.metrics import mean_absolute_error\n",
    "\n",
    "MAE_sci = mean_absolute_error(Y, Y_BAR)\n",
    "print (\"MAE using Sklearn: % \", MAE_sci)\n",
    "\n",
    "# Using Numpy\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "MAE_numpy = np.mean(np.abs(np.subtract(Y,Y_BAR)))\n",
    "print (\"MAE using Numpy: % \", MAE_numpy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**MSE (Mean Square Error) :-**\n",
    "\n",
    "$$MSE = (\\frac{1}{n})\\sum_{i=1}^{n}\\left ( y_{i} - \\hat y_{i} \\right )^2$$\n",
    "\n",
    "where y = actual value in the data set ;  $\\hat y$ = value computed by solving the regression equation\n",
    "\n",
    "1. Calculate the difference between Y and $\\hat Y$\n",
    "2. Take a square\n",
    "3. Take the mean/average i.e. divide by number of elements\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MSE using Python: % 2.5155166666666653\n",
      "MSE using Sklearn: %  2.5155166666666653\n",
      "MSE using Numpy: %  2.5155166666666653\n"
     ]
    }
   ],
   "source": [
    "X = [20, 21, 22, 23, 24, 25]\n",
    "Y = [23, 21, 26, 22, 25, 24]\n",
    "Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
    "\n",
    "# Core Python\n",
    "\n",
    "n = len(X) # Use length of either X or Y to get number of elements\n",
    "s = 0\n",
    "for i in range(0,n):\n",
    "    s += (Y[i] - Y_BAR[i])**2\n",
    "MSE = s/n\n",
    "print (\"MSE using Python: %\", MSE)\n",
    "\n",
    "# Using Scikit-Learn Library\n",
    "\n",
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "MSE_sci = mean_squared_error(Y, Y_BAR)\n",
    "print (\"MSE using Sklearn: % \", MSE_sci)\n",
    "\n",
    "# Using Numpy\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "MSE_numpy = np.mean(np.square(np.subtract(Y,Y_BAR)))\n",
    "print (\"MSE using Numpy: % \", MSE_numpy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**RMSE (Root Mean Square Error) :-**\n",
    "\n",
    "$$RMSE = \\sqrt{(\\frac{1}{n})\\sum_{i=1}^{n}\\left ( y_{i} - \\hat y_{i} \\right )^2}$$\n",
    "\n",
    "where y = actual value in the data set ;  $\\hat y$ = value computed by solving the regression equation\n",
    "\n",
    "1. Calculate the difference between Y and $\\hat Y$\n",
    "2. Take a square\n",
    "3. Take the mean/average i.e. divide by number of elements\n",
    "4. Take the square root\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSE using Python: % 1.5860380407375685\n",
      "RMSE using Sklearn: %  1.5860380407375685\n",
      "RMSE using Numpy: %  1.5860380407375685\n"
     ]
    }
   ],
   "source": [
    "X = [20, 21, 22, 23, 24, 25]\n",
    "Y = [23, 21, 26, 22, 25, 24]\n",
    "Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
    "\n",
    "# Core Python\n",
    "from math import sqrt\n",
    "\n",
    "n = len(X) # Use length of either X or Y to get number of elements\n",
    "s = 0\n",
    "for i in range(0,n):\n",
    "    s += (Y[i] - Y_BAR[i])**2\n",
    "RMSE = sqrt(s/n)\n",
    "print (\"RMSE using Python: %\", RMSE)\n",
    "\n",
    "# Using Scikit-Learn Library\n",
    "\n",
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "RMSE_sci = sqrt(mean_squared_error(Y, Y_BAR))\n",
    "print (\"RMSE using Sklearn: % \", RMSE_sci)\n",
    "\n",
    "# Using Numpy\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "RMSE_numpy = np.sqrt(np.mean(np.square(np.subtract(Y,Y_BAR))))\n",
    "print (\"RMSE using Numpy: % \", RMSE_numpy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**RAE (Relative Absolute Error) :-**\n",
    "\n",
    "$$RAE = \\frac{\\sum_{i=1}^{n}\\left | y_{i} - \\hat y_{i} \\right |}{\\sum_{i=1}^{n}\\left | y_{i} - \\bar y \\right |}$$\n",
    "\n",
    "where y = actual value in the data set ;  $\\hat y$ = value computed by solving the regression equation ;  $\\bar y$ is mean value of y\n",
    "\n",
    "1. Calculate the difference between Y and $\\hat Y$ for each row, take absolute value and sum it all\n",
    "2. Calculate the mean of Y denoted by $\\bar Y$ \n",
    "3. Calculate the difference between Y and $\\bar Y$ for each row, take absolute value and sum it all\n",
    "4. Divide value obtained in step1 by step3\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RAE using Numpy: %  0.9011111111111111\n"
     ]
    }
   ],
   "source": [
    "X = [20, 21, 22, 23, 24, 25]\n",
    "Y = [23, 21, 26, 22, 25, 24]\n",
    "Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
    "\n",
    "# Using Numpy\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "RAE_numpy = np.sum(np.abs(np.subtract(Y,Y_BAR))) / np.sum(np.abs(np.subtract(Y, np.mean(Y))))\n",
    "print (\"RAE using Numpy: % \", RAE_numpy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**RSE (Relative Squared Error) :-**\n",
    "\n",
    "$$RSE = \\frac{\\sum_{i=1}^{n}\\left ( y_{i} - \\hat y_{i} \\right )^2}{\\sum_{i=1}^{n}\\left ( y_{i} - \\bar y \\right )^2}$$\n",
    "\n",
    "where y = actual value in the data set ;  $\\hat y$ = value computed by solving the regression equation ;  $\\bar y$ is mean value of y\n",
    "\n",
    "1. Calculate the difference between Y and $\\hat Y$ for each row, square it and sum it all\n",
    "2. Calculate the mean of Y denoted by $\\bar Y$\n",
    "3. Calculate the difference between Y and $\\bar Y$ for each row, square it and sum it all\n",
    "4. Divide value obtained in step1 by step3\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RSE using Numpy: %  0.8624628571428568\n"
     ]
    }
   ],
   "source": [
    "X = [20, 21, 22, 23, 24, 25]\n",
    "Y = [23, 21, 26, 22, 25, 24]\n",
    "Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
    "\n",
    "# Using Numpy\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "RSE_numpy = np.sum(np.square(np.subtract(Y,Y_BAR))) / np.sum(np.square(np.subtract(Y, np.mean(Y))))\n",
    "print (\"RSE using Numpy: % \", RSE_numpy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

Comments