Python for Finance: Part II: 3 Using Regressions for Financial Analysis
Regression
Mortgage
1. Size: Explanatory variable
2. Price: Dependent variable
Simple regression: Using only one variable
y = mx + b
y = α + βx
Multivariate regression: Using more than one variable
Implementation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import numpy as np import pandas as pd from scipy import stats import statsmodels.api as sm import matplotlib.pyplot as plt data = pd.read_excel('[PATH]/Housing.xlsx') data data[['House Price', 'House Size (sq.ft.)']] // Univariate Regression X = data['House Size (sq.ft.)'] Y = data['House Price'] X Y plt.scatter(X,Y) plt.show() plt.scatter(X,Y) plt.axis([0,2500,0,1500000]) plt.ylabel('House Price') plt.xlabel('House Size (sq.ft)') plt.show() |
Are all regressions created equal? Learning how to distinguish good
Y = α + βx + error
Residuals
– The best fitting line minimizes the sum of the squared residuals
=> OLS (ordinary least square) estimates
Good vs. Bad regressions (a comparison of explanatory power)
– Using the R square
S2 = (Σ (X-X–)2) / (N-1)
TSS = Σ(X-X–)2
TSS(Total Sum of Squares):
– provides a sense of the variability of data
R2 = 1 – SSR / TSS
– R square varies between 0% – 100%.
– The higher it is, the more predictive power the model has.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
X1 = sm.add_constant(X) reg = sm.OLS(Y, X1).fit() reg.summary() // Expected value of Y: 260800 + 402 * 1000 // 662800 // Alpha, Beta, R^2: slope, intercept, r_value, p_value, std_err = stats.linregress(X,Y) slope // 401.91628631922595 intercept // 260806.23605609639 r_value // 0.82357755346969241 r_value **2 // 0.67827998657912403 p_value // 8.1296423772313077e-06 std_err // 65.242995106364916 |