*The contents of this post or Jupyter Notebook is best viewed at this link. It even allows you to run the notebook in Collab or Binder. *
In this post, we will cover the common metrics used for evaluation for linear regression model :-
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Assume we have following set of data\n",
"\n",
"| X | Y |\n",
"| --- | --- |\n",
"| 20 | 23 |\n",
"| 21 | 21 |\n",
"| 22 | 26 |\n",
"| 23 | 22 |\n",
"| 24 | 25 |\n",
"| 25 | 24 |\n",
"\n",
"For this data, we get this linear regression using this [calculator](http://www.alcula.com/calculators/statistics/linear-regression/)\n",
"\n",
"y = 15.14 + 0.37x\n",
"\n",
"We can solve this equation and get the value of $\\hat y$\n",
"\n",
"| X | Y | $ \\hat Y$ |\n",
"| --- | --- | --- |\n",
"| 20 | 23 | 22.54 |\n",
"| 21 | 21 | 22.91 |\n",
"| 22 | 26 | 23.28 |\n",
"| 23 | 22 | 23.65 |\n",
"| 24 | 25 | 24.02 |\n",
"| 25 | 24 | 24.39 |\n",
"\n",
"We can evaluate this regression line in terms of different error metrics :-\n",
"\n",
"**MAE (Mean Absolute Error) :-**\n",
"\n",
"$$MAE = (\\frac{1}{n})\\sum_{i=1}^{n}\\left | y_{i} - \\hat y_{i} \\right |$$\n",
"\n",
"where y = actual value in the data set ; $\\hat y$ = value computed by solving the regression equation\n",
"\n",
"1. Calculate the difference between Y and $\\hat Y$\n",
"2. Get the absolute values\n",
"3. Take the mean/average i.e. divide by number of elements\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MAE using Python: % 1.3516666666666666\n",
"MAE using Sklearn: % 1.3516666666666666\n",
"MAE using Numpy: % 1.3516666666666666\n"
]
}
],
"source": [
"X = [20, 21, 22, 23, 24, 25]\n",
"Y = [23, 21, 26, 22, 25, 24]\n",
"Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
"\n",
"# Core Python\n",
"\n",
"n = len(X) # Use length of either X or Y to get number of elements\n",
"s = 0\n",
"for i in range(0,n):\n",
" s += abs(Y[i] - Y_BAR[i])\n",
"MAE = s/n\n",
"print (\"MAE using Python: %\", MAE)\n",
"\n",
"# Using Scikit-Learn Library\n",
"\n",
"from sklearn.metrics import mean_absolute_error\n",
"\n",
"MAE_sci = mean_absolute_error(Y, Y_BAR)\n",
"print (\"MAE using Sklearn: % \", MAE_sci)\n",
"\n",
"# Using Numpy\n",
"\n",
"import numpy as np\n",
"\n",
"MAE_numpy = np.mean(np.abs(np.subtract(Y,Y_BAR)))\n",
"print (\"MAE using Numpy: % \", MAE_numpy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**MSE (Mean Square Error) :-**\n",
"\n",
"$$MSE = (\\frac{1}{n})\\sum_{i=1}^{n}\\left ( y_{i} - \\hat y_{i} \\right )^2$$\n",
"\n",
"where y = actual value in the data set ; $\\hat y$ = value computed by solving the regression equation\n",
"\n",
"1. Calculate the difference between Y and $\\hat Y$\n",
"2. Take a square\n",
"3. Take the mean/average i.e. divide by number of elements\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MSE using Python: % 2.5155166666666653\n",
"MSE using Sklearn: % 2.5155166666666653\n",
"MSE using Numpy: % 2.5155166666666653\n"
]
}
],
"source": [
"X = [20, 21, 22, 23, 24, 25]\n",
"Y = [23, 21, 26, 22, 25, 24]\n",
"Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
"\n",
"# Core Python\n",
"\n",
"n = len(X) # Use length of either X or Y to get number of elements\n",
"s = 0\n",
"for i in range(0,n):\n",
" s += (Y[i] - Y_BAR[i])**2\n",
"MSE = s/n\n",
"print (\"MSE using Python: %\", MSE)\n",
"\n",
"# Using Scikit-Learn Library\n",
"\n",
"from sklearn.metrics import mean_squared_error\n",
"\n",
"MSE_sci = mean_squared_error(Y, Y_BAR)\n",
"print (\"MSE using Sklearn: % \", MSE_sci)\n",
"\n",
"# Using Numpy\n",
"\n",
"import numpy as np\n",
"\n",
"MSE_numpy = np.mean(np.square(np.subtract(Y,Y_BAR)))\n",
"print (\"MSE using Numpy: % \", MSE_numpy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**RMSE (Root Mean Square Error) :-**\n",
"\n",
"$$RMSE = \\sqrt{(\\frac{1}{n})\\sum_{i=1}^{n}\\left ( y_{i} - \\hat y_{i} \\right )^2}$$\n",
"\n",
"where y = actual value in the data set ; $\\hat y$ = value computed by solving the regression equation\n",
"\n",
"1. Calculate the difference between Y and $\\hat Y$\n",
"2. Take a square\n",
"3. Take the mean/average i.e. divide by number of elements\n",
"4. Take the square root\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RMSE using Python: % 1.5860380407375685\n",
"RMSE using Sklearn: % 1.5860380407375685\n",
"RMSE using Numpy: % 1.5860380407375685\n"
]
}
],
"source": [
"X = [20, 21, 22, 23, 24, 25]\n",
"Y = [23, 21, 26, 22, 25, 24]\n",
"Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
"\n",
"# Core Python\n",
"from math import sqrt\n",
"\n",
"n = len(X) # Use length of either X or Y to get number of elements\n",
"s = 0\n",
"for i in range(0,n):\n",
" s += (Y[i] - Y_BAR[i])**2\n",
"RMSE = sqrt(s/n)\n",
"print (\"RMSE using Python: %\", RMSE)\n",
"\n",
"# Using Scikit-Learn Library\n",
"\n",
"from sklearn.metrics import mean_squared_error\n",
"\n",
"RMSE_sci = sqrt(mean_squared_error(Y, Y_BAR))\n",
"print (\"RMSE using Sklearn: % \", RMSE_sci)\n",
"\n",
"# Using Numpy\n",
"\n",
"import numpy as np\n",
"\n",
"RMSE_numpy = np.sqrt(np.mean(np.square(np.subtract(Y,Y_BAR))))\n",
"print (\"RMSE using Numpy: % \", RMSE_numpy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**RAE (Relative Absolute Error) :-**\n",
"\n",
"$$RAE = \\frac{\\sum_{i=1}^{n}\\left | y_{i} - \\hat y_{i} \\right |}{\\sum_{i=1}^{n}\\left | y_{i} - \\bar y \\right |}$$\n",
"\n",
"where y = actual value in the data set ; $\\hat y$ = value computed by solving the regression equation ; $\\bar y$ is mean value of y\n",
"\n",
"1. Calculate the difference between Y and $\\hat Y$ for each row, take absolute value and sum it all\n",
"2. Calculate the mean of Y denoted by $\\bar Y$ \n",
"3. Calculate the difference between Y and $\\bar Y$ for each row, take absolute value and sum it all\n",
"4. Divide value obtained in step1 by step3\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RAE using Numpy: % 0.9011111111111111\n"
]
}
],
"source": [
"X = [20, 21, 22, 23, 24, 25]\n",
"Y = [23, 21, 26, 22, 25, 24]\n",
"Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
"\n",
"# Using Numpy\n",
"\n",
"import numpy as np\n",
"\n",
"RAE_numpy = np.sum(np.abs(np.subtract(Y,Y_BAR))) / np.sum(np.abs(np.subtract(Y, np.mean(Y))))\n",
"print (\"RAE using Numpy: % \", RAE_numpy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**RSE (Relative Squared Error) :-**\n",
"\n",
"$$RSE = \\frac{\\sum_{i=1}^{n}\\left ( y_{i} - \\hat y_{i} \\right )^2}{\\sum_{i=1}^{n}\\left ( y_{i} - \\bar y \\right )^2}$$\n",
"\n",
"where y = actual value in the data set ; $\\hat y$ = value computed by solving the regression equation ; $\\bar y$ is mean value of y\n",
"\n",
"1. Calculate the difference between Y and $\\hat Y$ for each row, square it and sum it all\n",
"2. Calculate the mean of Y denoted by $\\bar Y$\n",
"3. Calculate the difference between Y and $\\bar Y$ for each row, square it and sum it all\n",
"4. Divide value obtained in step1 by step3\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RSE using Numpy: % 0.8624628571428568\n"
]
}
],
"source": [
"X = [20, 21, 22, 23, 24, 25]\n",
"Y = [23, 21, 26, 22, 25, 24]\n",
"Y_BAR = [22.54, 22.91, 23.28, 23.65, 24.02, 24.39]\n",
"\n",
"# Using Numpy\n",
"\n",
"import numpy as np\n",
"\n",
"RSE_numpy = np.sum(np.square(np.subtract(Y,Y_BAR))) / np.sum(np.square(np.subtract(Y, np.mean(Y))))\n",
"print (\"RSE using Numpy: % \", RSE_numpy)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}