No. of Recommendations: 53
Given the recent discussion about whether it's realistic to expect backtested Sharpe ratios of 2.2 for blends to be repeated going forward, I thought I would step back and use my daily-cycled GTR1 backtester to determine if Sharpe ratios of 2.2 ever really existed in the in the past in the first place. While I believe the Screen Builder's Sharpe ratios of 2.2 for such blends are computationally correct, daily-cycled testing reveals them to be flukes caused by over-tuning them to the quirks of the particular slice of monthly history represented by the Screen Builder.

In I described the screens appearing in Elan's post on optimal blends for 2006, , as "the MI board's Pearl Harbor of screens" because of their vulnerability to daily-cycled torpedo damage, given the fact that they surfaced in the course of an optimization on monthly data. Indeed, that post is where my most dramatic examples of torpedo damage have come from. I will likewise use one of the blends in that post, the blend of five 4-stock screens selected for maximum Sharpe ratio (which Zeelotes calls the "Sharpe 5 Screen Blend"), as my first public blend target.

In order to enable direct comparison with that post, where backtests are from 1989-2005, I will restrict all backtests in this post to 19890103-20051230, even though all screens can be backtested through 2006 and some screens can be backtested from 1986.

YldYear2 1-4

YldYear2 1-5 and YldYear2 1-10 have already survived torpedo attacks in for 1986-2006 almost completely unscathed. However, YldYear2 1-4 over 1989-2005, the variant that surfaces in blend optimizations, takes a bit of damage:
YldYear2 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 31.23 27.32 35.29 1.92 36.09
GSD(20): 20.93 19.65 21.92 0.67 20.93
DD(20): 10.42 9.06 11.56 0.64 N/A
UI(20): 6.93 5.40 8.85 0.91 N/A
Sharpe(20): 1.27 1.10 1.48 0.09 1.48
AT: 5.09 4.89 5.24 0.09 N/A

gtr1:,tr1y)tn4:pri:vprc(0,2): ces:ces.v:cpe:ratio(pri,ces):cdv:cdv.v:cdy:ratio(cdv,pri):yld:ratio(cdy,cpe):tr1y_mult:tr(12,253):tr1y:linear(100,tr1y_mult,-100,1)


LLTD 1-4

I have not yet posted results for any backtests using the field file ltd.v, [VL Long-Term Debt], due to some complicated irregularities in this field over time. Briefly, Value Line does not currently allow the user to directly distinguish between zero long-term debt and null long-term debt, while before 1997, this distinction was made in the raw data. I have regularized the field over time by converting nulls to zeros before 1997. This regularization (rather than cycle variation) may very well account for much of the following torpedo damage:
LLTD 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 33.80 28.94 38.51 2.90 41.80
GSD(20): 27.00 25.66 28.53 0.88 25.25
DD(20): 14.62 13.31 15.83 0.65 N/A
UI(20): 8.35 6.71 11.02 1.29 N/A
Sharpe(20): 1.14 0.96 1.30 0.09 1.46
AT: 5.78 5.55 6.02 0.13 N/A

gtr1:,2)gt0:linear(1,product(vprc(0,2),cso.v),-10000,ltd.v)gt0: ces.v:gt0:ratio(vprc(0,2),ces.v)bn10:tr(12,253)tn4


H52EarnPS 1-4

H52EarnPS 1-5 was torpedoed in for 1986-2005; this was done not long after the completion of TechCzech's GTR1 linearizer, but my backtester itself (which used the linearized data) was still in the dark ages of Excel/Access VBA. Not surprisingly, the torpedo damage when restricting H52EarnPS to positions 1-4 and 1989-2005 (the variant that surfaces in blend optimizations) is even more severe:
H52EarnPS 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 30.48 26.23 36.81 2.27 39.12
GSD(20): 27.12 24.63 29.34 1.33 24.57
DD(20): 15.04 12.65 17.24 1.05 N/A
UI(20): 14.19 9.64 22.06 2.90 N/A
Sharpe(20): 1.04 0.87 1.25 0.09 1.41
AT: 6.79 6.56 7.05 0.11 N/A

gtr1: ratio(product(vprc(0,2),cso.v),sls.v)bn4


Note that this is the first of my posts in which the field file sls.s, [VL Reported Annual Sales], appears. I know of no irregularities in this field over time (and if there were any, there probably wouldn't be anything I could do about them).

PEG-Minimalist 1-4

I torpedoed PEG-Minimalist 1-5 in for 1986-2005, around the same time I torpedoed H52EarnPS 1-5. As is to be expected, the torpedo damage is more severe for PEG-Minimalist 1-4 over 1989-2005, the variant that surfaces in blend optimizations:
PEG-Minimalist 1-4, 20-day hold, 19891203-20051230
Avg Min Max SD gritton
CAGR: 33.49 28.11 41.91 3.28 45.71
GSD(20): 27.58 25.41 29.14 1.02 26.19
DD(20): 14.97 13.13 16.49 1.16 N/A
UI(20): 10.27 7.63 12.78 1.42 N/A
Sharpe(20): 1.12 0.95 1.36 0.11 1.54
AT: 8.94 8.67 9.15 0.12 N/A

gtr1:,253)gt1.25: ratio(peg.v,ratio(vprc(0,2),ces.v))tn4


PIH_CSO_simple 1-4

This is the first post in which the field files pih.v [VL % Institutional Holdings] and pst.v [VL Price Stability Rank] appear, meaning I have never posted backtest results for PIH_CSO before. The 10-stock variant (not covered in this post) takes very little damage in daily-cycled testing (which isn't surprising in light of its low turnover), but the 4-stock variant appearing in blend optimizations takes a good hit:
PIH_CSO_simple 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 32.06 29.23 34.78 1.50 40.45
GSD(20): 22.77 21.52 23.75 0.62 22.30
DD(20): 11.28 10.44 11.87 0.29 N/A
UI(20): 8.39 7.24 10.22 0.80 N/A
Sharpe(20): 1.22 1.11 1.30 0.05 1.55
AT: 3.25 3.16 3.35 0.05 N/A



Sharpe 5 Screen Blend

The results for the blend of these five screens are as follows:
Sharpe 5 Screen Blend, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 33.84 31.70 35.82 1.21 42.36
GSD(20): 17.39 16.79 17.97 0.32 16.17
DD(20): 9.03 8.40 9.79 0.40 N/A
UI(20): 5.22 4.39 5.82 0.43 N/A
Sharpe(20): 1.61 1.49 1.73 0.07 2.14
AT: 5.96 5.87 6.08 0.05 N/A

gtr1:,tr1y)tn4:pri:vprc(0,2): ces:ces.v:cpe:ratio(pri,ces):cdv:cdv.v:cdy:ratio(cdv,pri):yld:ratio(cdy,cpe):tr1y_mult:tr(12,253): tr1y:linear(100,tr1y_mult,-100,1)::1:vprc(0,2)gt0:linear(1,product(vprc(0,2),cso.v),-10000,ltd.v)gt0:ces.v:gt0: ratio(vprc(0,2),ces.v)bn10:tr(12,253)tn4::1:tim.v:am2:ph253.g:2gt0.9:elqw.v:gt0:sls.v:gt0: ratio(product(vprc(0,2),cso.v),sls.v)bn4::1:tim.v:am6:ph253.g:2al0.95:peg.v:gt0:tr(12,253)gt1.25: ratio(peg.v,ratio(vprc(0,2),ces.v))tn4::1:tim.v:am2:pst.v:al50:pih.v:bp50:cso.v:bn4


Summary of Torpedo Damage

The table below shows the change to each measurement (CAGR, GSD, Sharpe), a.k.a. "torpedo damage", that results from expanding the backtests from a single monthly cycle to 20 cycles of 20-day holds:
                       CAGR   GSD  Sharpe
YldYear2 1-4: -4.85 0.00 -0.21
LLTD 1-4: -8.00 1.75 -0.32
H52EarnPS 1-4: -8.64 2.54 -0.37
PEG-Minimalist 1-4: -12.22 1.39 -0.43
PIH_CSO 1-4: -8.39 0.48 -0.34

Average: -8.42 1.23 -0.33

Blend: -8.52 1.22 -0.53


1. The torpedo damage to the 5-screen blend is roughly equal to the average torpedo damage suffered by the five screens individually. This surprises me: I expected the damage to the blend to be greater than the average damage to the screens due to the optimization process drawing out spurious negative correlations among screens for the same reasons that datamining with monthly data draws out spurious CAGRs. Apparently screen correlation is a more robust statistic than I had thought.

2. Conventional wisdom is that it's better to take the top few picks from many screens than to "go deep" into just a few screens. This advice may be good, but I believe its importance has been exaggerated by the single-cycled backtests. For example, in daily-cycled backtests, a blend of YldYear2 1-10 and PIH_CSO 1-10 produces a Sharpe ratio of 1.54, which isn't too far from the Sharpe 5 Screen Blend's daily-cycled Sharpe ratio of 1.61:

gtr1: cdy:gt0:product(yld,tr1y)tn10:pri:vprc(0,2):ces:ces.v:cpe:ratio(pri,ces):cdv:cdv.v:cdy:ratio(cdv,pri):yld:ratio(cdy,cpe): tr1y_mult:tr(12,253):tr1y:linear(100,tr1y_mult,-100,1)

3. Since we have seen some screens that suffer little or no torpedo damage in daily-cycled backtesting, it's quite possible that Elan's blend optimization would have selected different combinations of screens if the GTR1 backtester had been available at the beginning of 2006. Or perhaps there wouldn't have been such clear-cut winning blends to the point that Elan was too worried about everyone piling into his blend if he continued posting the blend he uses. Hopefully the GTR1 backtester will be usable by enough people so that we get to see its impact on screen selection for next year.

4. This blend, of course, still looks very good. However, my expectations for what kind of CAGR I would get using the blend have just been reduced by 8.5 points after daily-cycled backtesting. Many, like myself, will say that they had already expected the future to fall short of the past by perhaps 15 CAGR points for a host of other reasons; if so, then before this post you were expecting a CAGR of 42.36 - 15 = 27.36; now that the past has been clarified, you would only expect a CAGR of 33.84 - 15 = 18.84. That's still a market-beating expectation, of course, but with a lot less room for failure.

Robbie Geary
Print the post  


When Life Gives You Lemons
We all have had hardships and made poor decisions. The important thing is how we respond and grow. Read the story of a Fool who started from nothing, and looks to gain everything.
Contact Us
Contact Customer Service and other Fool departments here.
Work for Fools?
Winner of the Washingtonian great places to work, and Glassdoor #1 Company to Work For 2015! Have access to all of TMF's online and email products for FREE, and be paid for your contributions to TMF! Click the link and start your Fool career.