Steps:
Ⅰ.estimate the model coefficients
Ⅱ.build a strategy with a stable expected return
Ⅰ.Estimate the model coefficients
We set the data range from 2015/4/16—2016/8/26 to run the regression.
1. The independent variables for CSI300 and SSE50 are named as x and y respectively. The dependable variable A50 is named as z. Each index is multiplied by the contract multiplier and converted into Chinese Yuan.
First of all, we calculated the correlation between each two of the indexes as following:
> cor(x,y) [1] 0.9850738 > cor(x,z) [1] 0.9364365The correlation is high between each pair of the three.
> lm1<-lm(z~x+y) > lm1The result showed some defects:Call: lm(formula = z ~ x + y)
Coefficients: (Intercept) x y 17959.0566 -0.0223 0.1027
> ridge.sol<-lm.ridge(z~x+y,lambda=seq(0,150,length=151),model=TRUE) > names(ridge.sol) [1] "coef" "scales" "Inter" "lambda" "ym" "xm" "GCV" "kHKB" "kLW" > ridge.solAnd also with the graph of lambda and GCV:$lambda[which.min(ridge.sol$ GCV)] [1] 0 > ridge.sol$coef[which.min(ridge.sol$ GCV)] [1] -300.4365 > par(mfrow=c(1,2)) > matplot(ridge.sol$lambda,t(ridge.sol$ coef),xlab=expression(lambda),ylab="Cofficients",type="l",lty=1:20) > abline(v=ridge.sol$lambda[which.min(ridge.sol$ GCV)])
> ridge.sol<-lm.ridge(z~x+y,lambda=0,model=TRUE) > ridge.solThe coefficient for x and y are still too small, as the contract value of A50 is much smaller compared to CSI300 and SSE50, as below:$GCV 0 4341.567 > models = lm.ridge(rt$ A50~rt$CSI300+rt$ SSE50,lambda=seq(0,150,len=150)) rt$CSI300 rt$ SSE50 0.000000 17485.04 -0.001586841 0.07127925 1.006711 17804.23 0.002810185 0.06422905 2.013423 18044.22 0.005794676 0.05941096 116.778523 25822.98 0.017710798 0.03060350 117.785235 25875.29 0.017693995 0.03055534 118.791946 25927.45 0.017677140 0.03050745 119.798658 25979.47 0.017660236 0.03045983 120.805369 26031.36 0.017643286 0.03041248 147.986577 27382.69 0.017174862 0.02921970 148.993289 27430.98 0.017157316 0.02917829 150.000000 27479.14 0.017139766 0.02913706
> mean(rtWe run regression with Lambda ranged from 0-150 and estimated x at 0.018 and y at 0.03;$CSI300) [1] 1070110 > mean(rt$ SSE50) [1] 712673.9 > mean(rt$A50) [1] 66585.81
After comparison of the parameters above, we picked N=10&T=60 and N=20&T=50 as parameters for the trading strategy.
Append:
The collinearity is checked with lasso (least absolute shrinkage and selection operator) regression:
> A<-as.matrix(rt[,3:4]) > B<-as.matrix(rt[,2]) > laa=lars(A,B,type="lar")Call: lars(x = A, y = B, type = "lar") R-squared: 0.979 Sequence of LAR moves: SSE50 CSI300 Var 2 1 Step 1 2
The lasso method ranked SSE50 before CSI300 as expected.
> summary(laa) LARS/LAR Call: lars(x = A, y = B, type = "lar") Df Rss Cp 0 1 2.3467e+10 15702.7815 1 2 4.9092e+08 1.5306 2 3 4.9015e+08 3.0000
copyright by FangQuant.com