FangQuant › Strategies

A50&CSI300&SSE50 arb report

Fang submitted 2017-04-20 20:22:26

We set the data range from 2015/4/16—2016/8/26 to run the regression.
The independent variables for CSI300 and SSE50 are named as x and y respectively. The dependable variable A50 is named as z. Each index is multiplied by the contract multiplier and converted into Chinese Yuan.

First of all, we calculated the correlation between each two of the indexes as following:

> cor(x,y)
[1] 0.9872779
> cor(x,z)
[1] 0.9759849

And we run the ordinary least square (OLS) linear regression in R:

> lm1<-lm(z~x+y)
> lm1

Call:
lm(formula = z ~ x + y)

Coefficients:
(Intercept) x y 
1.749e+04 -1.587e-03 7.128e-02 

The result showed some defects:
1, the intercept is an oversized number.
2, the coefficient for x, CSI300, is negative.
3, obvious collinearity and difficult to eliminate.

With logarithmic – conversion, we set x1=log(x), y1 = log(y).We run the regression again:
> x1<-log(x)
> y1<-log(y)
> z1<-log(z)
> lm2<-lm(z1~x1+y1)
> summary(lm2)

Call:
lm(formula = z1 ~ x1 + y1)

Residuals:
Min 1Q Median 3Q Max 
-0.117396 -0.010662 0.001583 0.012657 0.038213 

Coefficients:
Estimate Std. Error t value Pr(>|t|) 
(Intercept) 1.04649 0.08893 11.768 <2e-16 ***
x1 -0.02634 0.04259 -0.619 0.537 
y1 0.77376 0.04473 17.297 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01874 on 335 degrees of freedom
Multiple R-squared: 0.9745, Adjusted R-squared: 0.9744 
F-statistic: 6407 on 2 and 335 DF, p-value: < 2.2e-16


The negative coefficient remained.

Graph is plotted to show collinearity between x1 and y1 as below:
> plot(x1~y1,col="red")


We applied regularization method to set a limit for the intercept. And run the ridge regression to process the data with collinearity.
With lm.ridge()function, we acquired 151 lambdas and selected the lambda with Generalized Cross Validation (GCV) when lambdaGCV is at minimum.

> library(MASS)
> ridge.sol<-lm.ridge(z1~x1+y1, lambda = seq(0,150,length =151),model = TRUE)
> ridge.sol$lambda[which.min(ridge.sol$GCV)]
> coef(ridge.sol)[which.min(ridge.sol$GCV),]
> matplot(ridge.sol$lambda,t(ridge.sol$coef),xlab = expression(lambda),ylab="cofficients",type="l",lty=1:20)

The ridge trace is plotted with minimum lambdaGCV as below:

And the graph of lambda and GCV:
> plot(ridge.sol$lambda,ridge.sol$GCV,type = "l",xlab = expression(lambda),ylab=expression(beta))
> abline(v=ridge.sol$lambda[which.min(ridge.sol$GCV)])


The collinearity is checked with lasso (least absolute shrinkage and selection operator) regression:
> A<-as.matrix(rt[,3:4])
> B<-as.matrix(rt[,2])
> lass=lars(A,B,type="lar") 

The lasso method ranked SSE50 before CSI300 as expected.

> summary(lass)
LARS/LAR
Call: lars(x = A, y = B, type = "lar")
Df Rss Cp
0 1 2.3467e+10 15702.7815
1 2 4.9092e+08 1.5306
2 3 4.9015e+08 3.0000

The Mallows's Cp is calculated to access the collinearity. The smaller the Cp the better the fit. So by this criteria, it’s more proper to hedge A50 with SSE50 only.
The ridge trace is stabilized in a narrow range when lambda rose above 20.
To show this, GCVs are calculated for lambda at 0 and 20 as below:
> ridge.sol<-lm.ridge(z1~x1+y1, lambda = 0,model = TRUE)
> ridge.sol$GCV
0 
1.042328e-06 
> ridge.sol<-lm.ridge(z1~x1+y1, lambda = 20,model = TRUE)
> ridge.sol$GCV
20 
1.247798e-06 

The difference between the two values is not significant.
And coefficients for x1 and y1 also stayed in narrowed regions with lambda increases, showed as below:
> lm.ridge(z1~x1+y1, lambda = 20)
x1 y1 
1.4218741 0.2854380 0.4247126 
> lm.ridge(z1~x1+y1, lambda = 25)
x1 y1 
1.4939782 0.2933251 0.4112328 
> lm.ridge(z1~x1+y1, lambda = 30)
x1 y1 
1.5642558 0.2981725 0.4010200 
> lm.ridge(z1~x1+y1, lambda = 35)
x1 y1 
1.6330535 0.3011715 0.3928210 
> lm.ridge(z1~x1+y1, lambda = 40)
x1 y1 
1.7005682 0.3029678 0.3859563 

Hence we took coefficients 0.3 and 0.4 for x1 and y1.
We can compare them to those coefficients when lambda is below 8:
> lm.ridge(z1~x1+y1, lambda = 2)
x1 y1 
1.1126174 0.1066454 0.6318545 
> lm.ridge(z1~x1+y1, lambda = 4)
x1 y1 
1.1595287 0.1699368 0.5631743 
> lm.ridge(z1~x1+y1, lambda = 6)
x1 y1 
1.1990122 0.2066814 0.5223915 
> lm.ridge(z1~x1+y1, lambda = 8)
x1 y1 
1.2347922 0.2305053 0.4951932 

In the next report we will do the collinearity analysis and filter the trading signal for arbitrage trade. The price data after 2016/8/26 would be used for back test.



Copyright by FangQuant.com

Currently no Comments.