UnThreaded | Threaded | Whole Thread (35) | Ignore Thread Prev Thread | Next Thread
Author: rgearyiii Three stars, 500 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: of 252157  
Subject: A Blend Torpedoed Date: 11/16/2007 5:08 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 53
Given the recent discussion about whether it's realistic to expect backtested Sharpe ratios of 2.2 for blends to be repeated going forward, I thought I would step back and use my daily-cycled GTR1 backtester to determine if Sharpe ratios of 2.2 ever really existed in the in the past in the first place. While I believe the Screen Builder's Sharpe ratios of 2.2 for such blends are computationally correct, daily-cycled testing reveals them to be flukes caused by over-tuning them to the quirks of the particular slice of monthly history represented by the Screen Builder.

In http://boards.fool.com/Message.asp?mid=24477228 I described the screens appearing in Elan's post on optimal blends for 2006, http://boards.fool.com/Message.asp?mid=23622348 , as "the MI board's Pearl Harbor of screens" because of their vulnerability to daily-cycled torpedo damage, given the fact that they surfaced in the course of an optimization on monthly data. Indeed, that post is where my most dramatic examples of torpedo damage have come from. I will likewise use one of the blends in that post, the blend of five 4-stock screens selected for maximum Sharpe ratio (which Zeelotes calls the "Sharpe 5 Screen Blend"), as my first public blend target.

In order to enable direct comparison with that post, where backtests are from 1989-2005, I will restrict all backtests in this post to 19890103-20051230, even though all screens can be backtested through 2006 and some screens can be backtested from 1986.


YldYear2 1-4

YldYear2 1-5 and YldYear2 1-10 have already survived torpedo attacks in http://boards.fool.com/Message.asp?mid=25533519 for 1986-2006 almost completely unscathed. However, YldYear2 1-4 over 1989-2005, the variant that surfaces in blend optimizations, takes a bit of damage:
YldYear2 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 31.23 27.32 35.29 1.92 36.09
GSD(20): 20.93 19.65 21.92 0.67 20.93
DD(20): 10.42 9.06 11.56 0.64 N/A
UI(20): 6.93 5.40 8.85 0.91 N/A
Sharpe(20): 1.27 1.10 1.48 0.09 1.48
AT: 5.09 4.89 5.24 0.09 N/A

gtr1: http://backtest.org/gtr1/blend.cgi?s19890103e20051230::1:tim.v:am6:cpe:gt0:cdy:gt0:product(yld,tr1y)tn4:pri:vprc(0,2): ces:ces.v:cpe:ratio(pri,ces):cdv:cdv.v:cdy:ratio(cdv,pri):yld:ratio(cdy,cpe):tr1y_mult:tr(12,253):tr1y:linear(100,tr1y_mult,-100,1)

gritton: http://backtest.org/YLDEARNYEAR2:8905SBcdyG0XcpeG0Xtr1yMcdyDcpeT4


LLTD 1-4

I have not yet posted results for any backtests using the field file ltd.v, [VL Long-Term Debt], due to some complicated irregularities in this field over time. Briefly, Value Line does not currently allow the user to directly distinguish between zero long-term debt and null long-term debt, while before 1997, this distinction was made in the raw data. I have regularized the field over time by converting nulls to zeros before 1997. This regularization (rather than cycle variation) may very well account for much of the following torpedo damage:
LLTD 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 33.80 28.94 38.51 2.90 41.80
GSD(20): 27.00 25.66 28.53 0.88 25.25
DD(20): 14.62 13.31 15.83 0.65 N/A
UI(20): 8.35 6.71 11.02 1.29 N/A
Sharpe(20): 1.14 0.96 1.30 0.09 1.46
AT: 5.78 5.55 6.02 0.13 N/A

gtr1: http://backtest.org/gtr1/blend.cgi?s19890103e20051230::1:vprc(0,2)gt0:linear(1,product(vprc(0,2),cso.v),-10000,ltd.v)gt0: ces.v:gt0:ratio(vprc(0,2),ces.v)bn10:tr(12,253)tn4

gritton: http://backtest.org/LLTD:8905SBcpeG0XltdM10000LmcpXcpeB10Xtr1yT4


H52EarnPS 1-4

H52EarnPS 1-5 was torpedoed in http://boards.fool.com/Message.asp?mid=24490197 for 1986-2005; this was done not long after the completion of TechCzech's GTR1 linearizer, but my backtester itself (which used the linearized data) was still in the dark ages of Excel/Access VBA. Not surprisingly, the torpedo damage when restricting H52EarnPS to positions 1-4 and 1989-2005 (the variant that surfaces in blend optimizations) is even more severe:
H52EarnPS 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 30.48 26.23 36.81 2.27 39.12
GSD(20): 27.12 24.63 29.34 1.33 24.57
DD(20): 15.04 12.65 17.24 1.05 N/A
UI(20): 14.19 9.64 22.06 2.90 N/A
Sharpe(20): 1.04 0.87 1.25 0.09 1.41
AT: 6.79 6.56 7.05 0.11 N/A

gtr1: http://backtest.org/gtr1/blend.cgi?s19890103e20051230::1:tim.v:am2:ph253.g:2gt0.9:elqw.v:gt0:sls.v:gt0: ratio(product(vprc(0,2),cso.v),sls.v)bn4

gritton: http://backtest.org/H52EarnPS:8905SBT12XpriDh52G.90XelqG0XmcpDslsB4

Note that this is the first of my posts in which the field file sls.s, [VL Reported Annual Sales], appears. I know of no irregularities in this field over time (and if there were any, there probably wouldn't be anything I could do about them).


PEG-Minimalist 1-4

I torpedoed PEG-Minimalist 1-5 in http://boards.fool.com/Message.asp?mid=24482345 for 1986-2005, around the same time I torpedoed H52EarnPS 1-5. As is to be expected, the torpedo damage is more severe for PEG-Minimalist 1-4 over 1989-2005, the variant that surfaces in blend optimizations:
PEG-Minimalist 1-4, 20-day hold, 19891203-20051230
Avg Min Max SD gritton
CAGR: 33.49 28.11 41.91 3.28 45.71
GSD(20): 27.58 25.41 29.14 1.02 26.19
DD(20): 14.97 13.13 16.49 1.16 N/A
UI(20): 10.27 7.63 12.78 1.42 N/A
Sharpe(20): 1.12 0.95 1.36 0.11 1.54
AT: 8.94 8.67 9.15 0.12 N/A

gtr1: http://backtest.org/gtr1/blend.cgi?s19890103e20051230::1:tim.v:am6:ph253.g:2al0.95:peg.v:gt0:tr(12,253)gt1.25: ratio(peg.v,ratio(vprc(0,2),ces.v))tn4

gritton: http://backtest.org/PEG-Minimalist:8905SBpriGE.95Mh52XpegG0Xtr1yG25XpegDcpeT4


PIH_CSO_simple 1-4

This is the first post in which the field files pih.v [VL % Institutional Holdings] and pst.v [VL Price Stability Rank] appear, meaning I have never posted backtest results for PIH_CSO before. The 10-stock variant (not covered in this post) takes very little damage in daily-cycled testing (which isn't surprising in light of its low turnover), but the 4-stock variant appearing in blend optimizations takes a good hit:
PIH_CSO_simple 1-4, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 32.06 29.23 34.78 1.50 40.45
GSD(20): 22.77 21.52 23.75 0.62 22.30
DD(20): 11.28 10.44 11.87 0.29 N/A
UI(20): 8.39 7.24 10.22 0.80 N/A
Sharpe(20): 1.22 1.11 1.30 0.05 1.55
AT: 3.25 3.16 3.35 0.05 N/A

gtr1: http://backtest.org/gtr1/blend.cgi?s19890103e20051230::1:tim.v:am2:pst.v:al50:pih.v:bp50:cso.v:bn4

gritton: http://backtest.org/PIH_CSO_simple:8905SBT12XpstGE50XpihB50HXcsoB4


Sharpe 5 Screen Blend

The results for the blend of these five screens are as follows:
Sharpe 5 Screen Blend, 20-day hold, 19890103-20051230
Avg Min Max SD gritton
CAGR: 33.84 31.70 35.82 1.21 42.36
GSD(20): 17.39 16.79 17.97 0.32 16.17
DD(20): 9.03 8.40 9.79 0.40 N/A
UI(20): 5.22 4.39 5.82 0.43 N/A
Sharpe(20): 1.61 1.49 1.73 0.07 2.14
AT: 5.96 5.87 6.08 0.05 N/A

gtr1: http://backtest.org/gtr1/blend.cgi?s19890103e20051230::1:tim.v:am6:cpe:gt0:cdy:gt0:product(yld,tr1y)tn4:pri:vprc(0,2): ces:ces.v:cpe:ratio(pri,ces):cdv:cdv.v:cdy:ratio(cdv,pri):yld:ratio(cdy,cpe):tr1y_mult:tr(12,253): tr1y:linear(100,tr1y_mult,-100,1)::1:vprc(0,2)gt0:linear(1,product(vprc(0,2),cso.v),-10000,ltd.v)gt0:ces.v:gt0: ratio(vprc(0,2),ces.v)bn10:tr(12,253)tn4::1:tim.v:am2:ph253.g:2gt0.9:elqw.v:gt0:sls.v:gt0: ratio(product(vprc(0,2),cso.v),sls.v)bn4::1:tim.v:am6:ph253.g:2al0.95:peg.v:gt0:tr(12,253)gt1.25: ratio(peg.v,ratio(vprc(0,2),ces.v))tn4::1:tim.v:am2:pst.v:al50:pih.v:bp50:cso.v:bn4

gritton: http://backtest.org/?8905BL(YLDEARNYEAR2)14p20(LLTD)14p20(H52EarnPS)14p20(PEG-Minimalist)14p20(PIH_CSO_simple)14p20


Summary of Torpedo Damage

The table below shows the change to each measurement (CAGR, GSD, Sharpe), a.k.a. "torpedo damage", that results from expanding the backtests from a single monthly cycle to 20 cycles of 20-day holds:
                       CAGR   GSD  Sharpe
YldYear2 1-4: -4.85 0.00 -0.21
LLTD 1-4: -8.00 1.75 -0.32
H52EarnPS 1-4: -8.64 2.54 -0.37
PEG-Minimalist 1-4: -12.22 1.39 -0.43
PIH_CSO 1-4: -8.39 0.48 -0.34

Average: -8.42 1.23 -0.33

Blend: -8.52 1.22 -0.53


Observations

1. The torpedo damage to the 5-screen blend is roughly equal to the average torpedo damage suffered by the five screens individually. This surprises me: I expected the damage to the blend to be greater than the average damage to the screens due to the optimization process drawing out spurious negative correlations among screens for the same reasons that datamining with monthly data draws out spurious CAGRs. Apparently screen correlation is a more robust statistic than I had thought.

2. Conventional wisdom is that it's better to take the top few picks from many screens than to "go deep" into just a few screens. This advice may be good, but I believe its importance has been exaggerated by the single-cycled backtests. For example, in daily-cycled backtests, a blend of YldYear2 1-10 and PIH_CSO 1-10 produces a Sharpe ratio of 1.54, which isn't too far from the Sharpe 5 Screen Blend's daily-cycled Sharpe ratio of 1.61:

gtr1: http://backtest.org/gtr1/blend.cgi?s19890103e20051230::1:tim.v:am2:pst.v:al50:pih.v:bp50:cso.v:bn10::1:tim.v:am6:cpe:gt0: cdy:gt0:product(yld,tr1y)tn10:pri:vprc(0,2):ces:ces.v:cpe:ratio(pri,ces):cdv:cdv.v:cdy:ratio(cdv,pri):yld:ratio(cdy,cpe): tr1y_mult:tr(12,253):tr1y:linear(100,tr1y_mult,-100,1)

3. Since we have seen some screens that suffer little or no torpedo damage in daily-cycled backtesting, it's quite possible that Elan's blend optimization would have selected different combinations of screens if the GTR1 backtester had been available at the beginning of 2006. Or perhaps there wouldn't have been such clear-cut winning blends to the point that Elan was too worried about everyone piling into his blend if he continued posting the blend he uses. Hopefully the GTR1 backtester will be usable by enough people so that we get to see its impact on screen selection for next year.

4. This blend, of course, still looks very good. However, my expectations for what kind of CAGR I would get using the blend have just been reduced by 8.5 points after daily-cycled backtesting. Many, like myself, will say that they had already expected the future to fall short of the past by perhaps 15 CAGR points for a host of other reasons; if so, then before this post you were expecting a CAGR of 42.36 - 15 = 27.36; now that the past has been clarified, you would only expect a CAGR of 33.84 - 15 = 18.84. That's still a market-beating expectation, of course, but with a lot less room for failure.

Robbie Geary
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: Zeelotes Big red star, 1000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 203954 of 252157
Subject: Re: A Blend Torpedoed Date: 11/16/2007 6:52 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 21
Robbie wrote:
While I believe the Screen Builder's Sharpe ratios of 2.2 for such blends are computationally correct, daily-cycled testing reveals them to be flukes caused by over-tuning them to the quirks of the particular slice of monthly history represented by the Screen Builder.

Indeed, the 2.2 is a fluke, but I'd argue that the bigger part of the fluke is not the difference between monthly and daily backtesting, but the difference between running one blend for the full history, and doing what you'd actually do in real-time, obtain a new blend at the start of each year using the same exact approach one uses now to get the blend in the first place.

Let me illustrate -- here is a comparison of your results using daily data with Gritton's results using monthly data:
               S&P    Sharpe 5 Blend                     S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 33.84 42.36 -20.1% 185.4%
GSD(20): 14.91 17.39 16.17 7.5% -14.3%
Sharpe(20): 0.57 1.61 2.14 -24.8% 181.0%

So the CAGR drops by 20%, the GSD increases by 7.5% and the Sharpe drops by nearly 25%.

Now what happens when you compare the original backtest to actually using the strategy over the full period:
               S&P    Sharpe 5 Blend                     S&P
Monthly Strategy Check Monthly % Diff % Diff
CAGR: 11.86 26.13 42.36 -38.3% 120.4%
GSD(20): 14.91 18.58 16.17 14.9% -19.8%
Sharpe(20): 0.57 1.18 2.14 -44.9% 106.0%

Now we find a drop in CAGR of over 38%, an increase in the Sharpe by nearly 15% and a 45% drop in the Sharpe.

This brings me to the concept of something being torpedoed -- what exactly do you mean when you use this term?

According to Merriam-Webster we get these two definitions:
1 : to hit or sink (a ship) with a naval torpedo : strike or destroy by torpedo
2 : to destroy or nullify altogether : wreck <torpedo a plan>
http://m-w.com/dictionary/torpedoed

It seems to me that the main idea of the word is to destroy or nullify. If what you mean is to strike it such that it takes on damage, than I'd say your use of the word is great IMO. But if your main point is to destroy it, or nullify it altogether, than I for one, don't agree.

Let me illustrate with the examples from your post:
               S&P       YldYear2                        S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 31.23 36.09 -13.5% 163.4%
GSD(20): 14.91 20.93 20.93 0.0% -28.8%
Sharpe(20): 0.57 1.27 1.48 -14.2% 121.7%
S&P LLTD S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 33.8 41.8 -19.1% 185.1%
GSD(20): 14.91 27 25.25 6.9% -44.8%
Sharpe(20): 0.57 1.14 1.46 -21.9% 99.0%
S&P H52EarnPS S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 30.48 39.12 -22.1% 157.1%
GSD(20): 14.91 27.12 24.57 10.4% -45.0%
Sharpe(20): 0.57 1.04 1.41 -26.2% 81.5%
S&P PEG-Minimalist S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 33.49 45.71 -26.7% 182.5%
GSD(20): 14.91 27.58 26.19 5.3% -45.9%
Sharpe(20): 0.57 1.12 1.54 -27.3% 95.5%
S&P PIH_CSO_simple S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 32.06 40.45 -20.7% 170.4%
GSD(20): 14.91 22.77 22.3 2.1% -34.5%
Sharpe(20): 0.57 1.22 1.55 -21.3% 112.9%
S&P Sharpe 5 Blend S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 33.84 42.36 -20.1% 185.4%
GSD(20): 14.91 17.39 16.17 7.5% -14.3%
Sharpe(20): 0.57 1.61 2.14 -24.8% 181.0%

What I see from a comparison between daily and monthly data is a hit of about 25% in the overall return and in the risk-adjusted return as measured by Sharpe. In my view, a hit of anywhere from 15 to 25% is to be expected when you increase your granularity from monthly to daily, and go with an all start days test. When the hit exceeds 30%, or above, I could see talking about it in terms of being nullified or destroyed.

We need to keep in mind that these five screens still show an average CAGR that is more than 175% above the S&P with a Sharpe that is just over 100% over the S&P. Granted, the GSD exceeds the S&P by an average of 40%, but this is to be expected when you consider that these screens are holding four stocks compared to the S&P holding 500. Even after you have torpedoed the blend I still find it very impressive -- keeping in mind that the strategy itself is where it really gets torpedoed.
               S&P    Sharpe 5 Blend                     S&P
Monthly Daily Monthly % Diff % Diff
CAGR: 11.86 33.84 42.36 -20.1% 185.4%
GSD(20): 14.91 17.39 16.17 7.5% -14.3%
Sharpe(20): 0.57 1.61 2.14 -24.8% 181.0%

It beats out the S&P holding just twenty stocks compared to the benchmark of 500 -- a CAGR 185% higher, a Sharpe not far behind, and a gain in GSD of just 14%.

Is this something that has been torpedoed, or just damaged. I'd say the latter -- it provides us all with a healthy dose of realism, but it is still a very worthwhile investment.

Robbie added:
Many, like myself, will say that they had already expected the future to fall short of the past by perhaps 15 CAGR points for a host of other reasons; if so, then before this post you were expecting a CAGR of 42.36 - 15 = 27.36; now that the past has been clarified, you would only expect a CAGR of 33.84 - 15 = 18.84.

I'd say that if any of us can consistently produce a gain at or near 20 we are doing among the best in the world. Anything above that -- and certainly double that -- is fantasy land at best, and running after it, a recipe for total annihilation at the worse.

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: mungofitch Big gold star, 5000 posts Top Favorite Fools Top Recommended Fools Feste Award Winner! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 203956 of 252157
Subject: Re: A Blend Torpedoed Date: 11/16/2007 8:15 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 11
Hi Robbie

You mention...
Value Line does not currently allow the user to directly distinguish
between zero long-term debt and null long-term debt, while before 1997,
this distinction was made in the raw data. I have regularized the field
over time by converting nulls to zeros before 1997...


I didn't know this, and it's pretty tragic.
Long term debt is of great interest to me, perhaps more than any other
field, and it's a shame that this is the only solution found to date.
Since pretty much any screen using the field seeks long term debt of zero
or below a certain value or formula, this poses a problem, in that the
nulls will now appear as debt free companies in the early years.

Forgive me for asking: is there any better way to regularize this?
Is there perhaps some other field which would point out which
zeros were nulls in the more recent data, something like that?
For example, if any of the other balance sheet fields or ratio fields
betray the existence of meaningful amounts of debt, one could tag
the zero-debt companies as null-long-term-debt-field records in the
more recent years. (i.e., value not known).

One could make the simplifying assumption that (with the exception of
Annaly) almost any company with significant debt has long term debt.
Some possible ways to corner it might include:
Total assets / common shares outstanding - book value per share,
Total assets - shareholders' equity,
Debt-to-equity ratios,
Total liabilities less total current liabilities,
Total capital less shareholders' equity,
Return on total capital versus return on shareholders' equity, etc.
If any one of these shows clear sign of significant debt, the
LTD field could be tagged as null rather than zero.
One could of course optionally be aggressive and produce an
estimated value, but that's a different question.

Even an imperfect approach might be more useful than erasing the
null-versus-zero distinction in a subset of the data.
Perhaps, because this would inevitably be error prone, it might
be worth exposing as a separate field.

====================================================================
In another more radical approach, the question occurs to me:
how often is the figure actually null? It might be a lot closer
to reality (though more dangerous) to assume that zeroes in more
recent data are all really zero debt companies.
Taking a quick glance at the data, it seems very likely that in recent
editions, in most cases that the Shareholders' Equity value is set, the
Long Term Debt field seems to be meaningful, whether zero or not.
Since these are the most basic figures to pull from a financial
statement, it seems they are the most reliable fields.
For example, even in the "Plus" edition where you would expect the
majority of data holes, I get the following breakdown:
7737 companies total (100%)
2191 companies with LTD=0, but Total Assets and Current Liabilities populated (28% of total)
895 companies with all 3 fields zero (11.5% of total)

Of the 2191 with LTD=0 and the other fields set, only 337 have
an error of over 20% when comparing Current Liabilities to
Total Assets - Shareholders' Equity - Long Term Debt.
Of those, 189 also have a Current Liabilites of zero, which means
that perhaps it is the null field. This leaves only 2% questionable,
many of which may come down to definitions of shareholders' equity
due to other share classes, etc.

So, it seems highly probable to me that if Current Liabilities,
Shareholders' Equity, and Total Assets are all non-zero in the records
from recent years, the zero in the Long Term Debt field is very probably
a real zero. If not, assuming null might be the best course.

Jim

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: JeffLandon One star, 50 posts Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 203961 of 252157
Subject: Re: A Blend Torpedoed Date: 11/16/2007 10:36 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
"This brings me to the concept of something being torpedoed -- what exactly do you mean when you use this term?"

From the comments throughout Robbie's posts, I believe he's trying to torpedo expectations more than screens.

There are plenty of people out there who say the market can't be beaten at all. Beating it by 2 points over a long period of time is seen as extremely fortunate.

But he may believe that a screen which was tuned for particular dates that doesn't hold up on other dates cannot be trusted at all. Robbie?

Print the post Back To Top
Author: elann Big gold star, 5000 posts Top Favorite Fools Top Recommended Fools Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 203970 of 252157
Subject: Re: A Blend Torpedoed Date: 11/16/2007 5:29 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 4
While I believe the Screen Builder's Sharpe ratios of 2.2 for such blends are computationally correct, daily-cycled testing reveals them to be flukes caused by over-tuning them to the quirks of the particular slice of monthly history represented by the Screen Builder.

Robbie,

There's something I don't understand about these test results. I should have paid attention to it in previous tests but I never did.

For every one of the screens you tested, and for the blend as a whole, the CAGR found by Gritton's monthly backtester is higher, in many cases by a wide margin, than the Max results reported by you over all daily start periods. If we saw that Jamie's results were equal to your Max in every case, I'd say it's highly unlikely but possible. It takes an exceptional skill at data mining to come up consistently with a monthly result that is the maximum of all possible daily results. But for Jamie to always come out above your max seems outright impossible, unless there's an explanation I'm not aware of.

(Anyone other than Robbie who can offer an explanation is welcome to pipe in.)

This blend, of course, still looks very good. However, my expectations for what kind of CAGR I would get using the blend have just been reduced by 8.5 points after daily-cycled backtesting. Many, like myself, will say that they had already expected the future to fall short of the past by perhaps 15 CAGR points for a host of other reasons; if so, then before this post you were expecting a CAGR of 42.36 - 15 = 27.36; now that the past has been clarified, you would only expect a CAGR of 33.84 - 15 = 18.84. That's still a market-beating expectation, of course, but with a lot less room for failure.

I disagree. One of the main reasons to discount the monthly test results that we had before is the suspicion of curve fitting that is addressed, at least to some extent, by the daily backtester. It is illogical to claim that the degree of uncertainty of the backtest results is unaffected by the daily backtest, as your statement implies. I'm not one to quantify the reduction of expectations. It's almost as useless to peg it at 15% as to take the raw backtest results literally. But if you chose 15% earlier, for illustrative purposes, then you must correct it now to something smaller, like 10% or what not.

Elan

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: elann Big gold star, 5000 posts Top Favorite Fools Top Recommended Fools Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 203971 of 252157
Subject: Re: A Blend Torpedoed Date: 11/16/2007 5:36 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 1
Value Line does not currently allow the user to directly distinguish
between zero long-term debt and null long-term debt, while before 1997,
this distinction was made in the raw data. I have regularized the field
over time by converting nulls to zeros before 1997...
==========================================
I didn't know this, and it's pretty tragic.
Long term debt is of great interest to me, perhaps more than any other
field, and it's a shame that this is the only solution found to date.
Since pretty much any screen using the field seeks long term debt of zero
or below a certain value or formula, this poses a problem, in that the
nulls will now appear as debt free companies in the early years.


It's not clear to me what a null value in the long term debt field means. The last thing I would assume is that it reflects a situation where long term debt existed but VL neglected to record it due to laziness or sloppiness. It's got to be something else, such as a closed end fund for which reporting long term debt is meaningless, i.e. it's not an item on its balance sheet. So is it helpful to convert null values to zeroes, or will it cause the test to improperly select companies?

Elan

Print the post Back To Top
Author: JeffLandon One star, 50 posts Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 203973 of 252157
Subject: Re: A Blend Torpedoed Date: 11/16/2007 6:19 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 5
Even with Robbie's backtester set to 20 day holds, his buy/sell dates won't line up with the monthly dates Jamie uses. Robbie will be "crawling" through the month as Jamie tries to "stick" to the beginning of it.

It would be nice if we could see the details of one screen using Jamie's dates, seeing if Robbie's backtester picked the same stocks and how the forward returns compared between the two backtesters. Might be a good way to fix bugs in both backtesters.

Print the post Back To Top
Author: rgearyiii Three stars, 500 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204225 of 252157
Subject: Re: A Blend Torpedoed Date: 11/23/2007 10:14 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 5
elann:
There's something I don't understand about these test results...
For every one of the screens you tested, and for the blend as a whole, the CAGR found by Gritton's monthly backtester is higher, in many cases by a wide margin, than the Max results reported by you over all daily start periods. If we saw that Jamie's results were equal to your Max in every case, I'd say it's highly unlikely but possible. It takes an exceptional skill at data mining to come up consistently with a monthly result that is the maximum of all possible daily results. But for Jamie to always come out above your max seems outright impossible, unless there's an explanation I'm not aware of.


I assume you do realize that none of my daily-staggered cycles with holding periods of exactly 20 market days will ever match up (and stay matched up) with Jamie's monthly cycle, because a month is closer to 21 market days. There are other differences between my tests and Jamie's as well, differences that should be performance-neutral (such as measuring "RS26" over 126 market days instead of exactly 26 weeks, etc), but which guarantee that his and my backtests will never agree perfectly even on their common trading dates.

I also assume that you realize that my Max and Min statistics do not represent absolute ceilings and floors on what results are mathematically possible to get by varying trading date: they are only the max and min of a particular sample.

Thus, my explanation is the same as what I gave you in

http://boards.fool.com/Message.asp?mid=25384956

and what I gave Moe in

http://boards.fool.com/Message.asp?mid=25811991

when you two asked what is essentially the same question: why aren't the Screen Builder's results sometimes better than those of the GTR1 backtester instead of consistently worse?

My answer is that we would only expect an average discrepancy of zero and to see the Screen Builder's results fall nicely within my Max and Min cycle results if we were looking at random screens with no regard to performance. But as I said in the two posts above, this is very far from the case: we have a total bias toward high CAGR and high Sharpe when selecting screens as "keepers" from the Screen Builder; such "keepers" are more likely to come from the upper tails of performance distributions. Using the blend optimizer (which is where most of the screens I have tested come from) perfects this bias to the point that using it is the ultimate exercise in cherry-picking the biggest flukes produced by the Screen Builder (even though it may very well produce combinations of screens that blend well, which of course is its purpose).

The fact that all the screens comprising the optimal blends have Screen Builder results that are consistently above the GTR1 backtester's maximum results doesn't represent data-mining "skill" any more than the peppered moth demonstrates adaptive "skill" by always matching the color of the tree bark wherever it is found.

For the very same reasons, I predict that if you devoted the next few years of your life coming up with the worst five screens by Sharpe ratio (subject, let's say, to a minimum of 5 stocks) as measured by the Screen Builder, then each of the five screens would have results that would be worse than those of the GTR1 backtester's worst cycle for the same screen.

Robbie Geary

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: rgearyiii Three stars, 500 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204228 of 252157
Subject: Re: A Blend Torpedoed Date: 11/23/2007 11:01 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 18
Zeelotes,

Your method of eliminating perfect hindsight from backtests by forcing strategies to continuously (or at least annually, in your backtests) adapt to the "known" past at each point in time in a way that mimics the way we choose screens in the present is quite fascinating; you've inspired me to think about ways of adding general WWL capability to my GTR1 backtester (the main challenge will be coming up with a clean syntax for it; performance shouldn't be an issue). However, I'm not sure what exactly you are claiming about it in relation to daily-cycled backtesting. If your point is simply that you did more damage to the Sharpe 5 Blend's mythical 2.2 Sharpe ratio using your method than I did with daily-cycled testing, I won't argue with that.

Thus I'll reply to the more trivial subject you address:

This brings me to the concept of something being torpedoed -- what exactly do you mean when you use this term? ...But if your main point is to destroy it, or nullify it altogether, than I for one, don't agree.

Because I take for granted that the audience of my posts consists of people who make investment decisions based on numbers rather than adjectives, I haven't felt the need to provide a rigorous definition of "torpedoed". People won't stop using a screen because they saw it described as "Torpedoed" in one of my subject headers any more than they'll start using it because they saw you describe it as "Gold."

However, I think my posts usually contain enough contextual clues, such as "This blend, of course, still looks very good", that I don't think anyone takes it to mean "annihilated"; when I mean that, then in keeping with the metaphor, I say "sunk" instead, as I have on a couple occasions (VL Screamers and SI Pro LowVol, IIRC). Even without a definition, I think those who regularly read my posts will have come to expect a reduction in CAGR of 5-10 points when they see a screen described as "torpedoed" at the top of my posts.

Robbie Geary

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: JeffLandon One star, 50 posts Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204233 of 252157
Subject: Re: A Blend Torpedoed Date: 11/23/2007 12:19 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 5
>>For the very same reasons, I predict that if you devoted the next few years of your life coming up with the worst five screens by Sharpe ratio (subject, let's say, to a minimum of 5 stocks) as measured by the Screen Builder, then each of the five screens would have results that would be worse than those of the GTR1 backtester's worst cycle for the same screen.

There are plenty of screens lying around that were found early and abandoned early, but are still tracked because the bar was very low in the early days. In the beginning (before backtest.org) it was extremely hard to tune screens.

There were some discussions about throwing screens out, but it never really happened. If you go back and read old messages, you'll see that some people had the foresight to see that they could be useful in the future.

There are also plenty of screens proposed by people only to be shot down when Alan or Jamie or Moe tested them.

Might be interesting to see how some of these old lame loser screens do post-discovery in both Jamie's and Robbie's backtester. You might find that they do at least relatively better in Robbie's backtester. Might even find a few that have cycles that surpass backtest.org's rusn.

Print the post Back To Top
Author: Zeelotes Big red star, 1000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204252 of 252157
Subject: Re: A Blend Torpedoed Date: 11/23/2007 8:51 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
Robbie wrote:
However, I'm not sure what exactly you are claiming about it in relation to daily-cycled backtesting. If your point is simply that you did more damage to the Sharpe 5 Blend's mythical 2.2 Sharpe ratio using your method than I did with daily-cycled testing, I won't argue with that.

Yes, that is exactly my point. The damage is greater with my method than with daily-cycled testing. Furthermore, the damage of daily-cycled testing is typically within the range of what I would have already expected from it, and therefore, is not really adding any new information. Having said that, however, I am a firm believer in confirmation and validation of one's expecatations, so the daily-cycled backtests made available to this board via your backtesters are very much appreciated.

Print the post Back To Top
Author: elann Big gold star, 5000 posts Top Favorite Fools Top Recommended Fools Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204296 of 252157
Subject: Re: A Blend Torpedoed Date: 11/24/2007 9:58 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 8
I assume you do realize that none of my daily-staggered cycles with holding periods of exactly 20 market days will ever match up (and stay matched up) with Jamie's monthly cycle, because a month is closer to 21 market days. There are other differences between my tests and Jamie's as well, differences that should be performance-neutral (such as measuring "RS26" over 126 market days instead of exactly 26 weeks, etc), but which guarantee that his and my backtests will never agree perfectly even on their common trading dates.

Of course I realize that, and I don't expect the results to be identical. That's not the issue, so let's move on.

I also assume that you realize that my Max and Min statistics do not represent absolute ceilings and floors on what results are mathematically possible to get by varying trading date: they are only the max and min of a particular sample.

Again, you're stating the obvious, which is not the issue I raised. We can regard the 20 start dates you tested, and Jamie's monthly test, as 21 sample paths through the history of a screen and note that Jamie's result always comes out on top - the best of 21 samples.

My answer is that we would only expect an average discrepancy of zero and to see the Screen Builder's results fall nicely within my Max and Min cycle results if we were looking at random screens with no regard to performance. But as I said in the two posts above, this is very far from the case: we have a total bias toward high CAGR and high Sharpe when selecting screens as "keepers" from the Screen Builder; such "keepers" are more likely to come from the upper tails of performance distributions. Using the blend optimizer (which is where most of the screens I have tested come from) perfects this bias to the point that using it is the ultimate exercise in cherry-picking the biggest flukes produced by the Screen Builder (even though it may very well produce combinations of screens that blend well, which of course is its purpose).

I don't think anyone expects a zero bias. We know there is a selection bias in the screens we track. I would, just for discussion's sake, accept that Jamie's backtests would always, or almost always, be in the upper half of the daily backtest results. So let's say we took your best 10 start days and added Jamie's backtest to them. There's a 1 in 11 chance in such a model that Jamie's backtest will come out on top. If it happens without exception for every screen, let's say at least ten tested screens or more, the chance of that happening is in my view one a billion or so.

The odds I put on it are not the key point. The numerical value is immaterial. My point is that you're providing a qualitative explanation for something that, since we don't know the strength of the bias, could well be very very extremely unlikely. I think that trying to just explain it away is not prudent. There is a possibility that this outcome cannot be explained by the bias in screen selection, and a measure of skepticism would be healthy.

For the very same reasons, I predict that if you devoted the next few years of your life coming up with the worst five screens by Sharpe ratio (subject, let's say, to a minimum of 5 stocks) as measured by the Screen Builder, then each of the five screens would have results that would be worse than those of the GTR1 backtester's worst cycle for the same screen.

That's one way of addressing my skeptical concern. Another way is to test, for example, some of the short screens that have been developed here. While they are not very successful at inverting market returns, they were the best screens people could find to underperform the market. If you test those, and your theory of extreme screen selection bias holds, we would find that Jamie's backtester would consistently underperform your minimum by a wide margin (a reverse torpedo).

Or seomone could devise a screen that is meant to be mediocre, such as Timeliness 3 and the 10 stocks that are closest to median RS. Or build a screen by randomly selecting parameters. Whatever method is chosen, I think it is very important to demostrate that, at least in some situations, Jamie's backtester doesn't come out above that maximum of your daily starts.

Until we see at least a couple of such examples I will remain skeptical. The risk that you or Jamie have a bug in your backtesters is just too tangible.

It reminds me of a joke about an architect (not a great joke).
An architect takes part in an opening ceremony for a bridge he designed. The ribbon is cut and the first car drives across the bridge. As it gets to mid-span the bridge collapses spectacularly. Everyone is running around in a panic. The architect walks away with a distressed look, mumbling "S**t, I forgot to divide by 2".

Elan

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: Zeelotes Big red star, 1000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204298 of 252157
Subject: Re: A Blend Torpedoed Date: 11/24/2007 10:22 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 4
Elan wrote:
Until we see at least a couple of such examples I will remain skeptical. The risk that you or Jamie have a bug in your backtesters is just too tangible.

Just for the record, I'd agree with this. It just seems statistically impossible for every single one of Jamie's results to exceed your maximum. The only explanation that I can think of is the fruit of your GTR itself.

1. The way you are dealing with delistings.
2. The overall cautionary way you handle a lack of pricing data.
3. Symbol = Permno designations. Jamie has quite a few cases where he is unable to find the permno. I'm assuming you have no cases. This would explain some differences.
4. Jamie's rebalance dates are always in close proximity to the actual release date of the data itself. I realize your lag experiments question the validity of this concern, but frankly, I'm still not convinced.

It could be that any or all of these produce significant differences in the returns to make it possible for your max to never exceed Jamie's.

Print the post Back To Top
Author: Zeelotes Big red star, 1000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204301 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 12:38 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
Robbie wrote:
Hopefully the GTR1 backtester will be usable by enough people so that we get to see its impact on screen selection for next year.

A poll would be nice to see how many have actually used it at all. What is interesting in my case is that whenever I copy and paste your links into a browser I can get the screen where it says "RUN" but whenever I click it, the result is an error message. I assume this has more to do with where I am than in a problem in the backtester itself?

Print the post Back To Top
Author: rgearyiii Three stars, 500 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204304 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 3:06 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 16
elann:
That's not the issue, so let's move on.

Indeed: if you suspect bugs in my backtester or GTR1 linearization, don't be shy about saying so. You didn't respond to my "qualitative" replies (at least I don't think you did) the last two times, so I had no way of knowing they were inadequate.

I've stated a few times (which is a lot, considering that I hardly ever post) that I think the greatest priority in the area of daily-cycled backtesting should be building another backtester from scratch (preferably one that doesn't use the same GTR1-linearized data, at least not at first) for the purpose of more easily exposing bugs. I realize that sounds like a tall order, but the GTR1 backtester is actually my fourth backtester written from scratch (though the first written in C++; previously, I did all my backtests in Excel/Access VBA). A second backtester wouldn't even need to be public or user-friendly--its only public purpose would be confirmation of my backtester's results by its creator.

Having taken on a task I never volunteered for myself (when starting VLWRP, I envisioned that others would volunteer to write daily-cycled backtesters when the data and GTR1 linearizer were ready), I don't feel too shy about volunteering others for this task when I see the time, resources and talent. Zeelotes (or rather his programmer sons, under his guidance) would be the most obvious candidate, though my past attempts at persuading him haven't been successful. Another candidate would be VTAlumni, who de-lurks about as often as I do, volunteering for programming challenges. If the MI board expressed widespread interest in this goal the next time he appears, perhaps he would bite.

I haven't harped on this goal as much lately because my own confidence in my backtester (and Jamie's, for that matter) has tended to increase the more I use them. But other than this proposal, there isn't much I can say other than provide more qualitative explanations of what you find suspicious. A few more considerations that come to mind are:

1. Your statement, "We can regard the 20 start dates you tested, and Jamie's monthly test, as 21 sample paths through the history of a screen and note that Jamie's result always comes out on top", is actually false. Jamie's results don't always come out on top. For the vast majority of "boring" screens, Jamie's results tend to fall nicely between my Min/Max statistics, with perhaps a slight tilt downward due to the effects of GTR1 linearization. It's only the most popular MI screens--those that surface in the course of blend optimization--that get knocked down by 5-10 CAGR points. IMO, that's the smoking gun of curve-fitting (pardon the mixed metaphor), not evidence of bugs.

2. I'm not ashamed to say that my modus operandi is simply this:

Create buzz (which includes skepticism) about daily-cycled backtesting --> Widespread use of my backtester --> Donations to the MI treasury --> Reimbursement for VLWRP costs.

Thus, I haven't bothered spending the time required to post on all the screens that don't get torpedoed because they wouldn't create buzz. My hope has been that people would discover all the screens that don't get torpedoed on their own. Besides getting reimbursed, I would like to see all the people who put a great deal of time into VLWRP (including yourself) get rewarded with results.

3. As my (very thin) posting history shows, I started torpedoing popular MI screens in August of 2006 using my third Excel/Access VBA backtester, a couple months before I started re-writing my backtester and databases from scratch in C++ (my modus operandi was the same back then, except I was still trying to get others to write the public daily-cycled backtester). Except where I have made changes in field regularization, all of my old torpedo results agree very closely with those obtained with my new GTR1 backtester.

4. Upon TechCzech's completion of his first GTR1 Linearizer in 2005, I did an analysis of the effects of GTR1 linearization on basic RS screens by backtesting them on both my old Sux database and the new GTR1-linearized database. GTR1 linearization (which more realistically handles suspensions, delistings, spin-offs and mergers) appeared to add about 1-2 points to GSD and knock about 1 point off of CAGR for monthly RS screens, with slightly increasing effect as the holding period lengthened. I actually expected more of an impact. A couple bugs in my linearization logic (and one or two in the linearizer itself) have been discovered and fixed since then, but none that had a significant impact on results. Thus I don't think GTR1 linearization accounts for very much of the difference between GTR1 and backtest.org.

5. I have done complete audits of Jamie's and my backtests for a few popular MI screens and concluded that we were both correct. I haven't reported on these findings because there wasn't much to say besides, "Yep, we're right." (Zeelotes has already thoroughly replicated Jamie's backtests anyway.)

6. It seems that those who have become proficient with my backtester (via either Jamie's or Michael's interfaces) have developed some degree of faith in the results. (Ironically, all of the power users are from the WERresearcher Yahoo! group rather than the MI board.) I would therefore suggest that you (and Zeelotes) let your skepticism motivate you to put forth the effort required to use the backtester yourselves. If you do, I think you'll find my curve-fitting explanation for why only popular screens get "impossibly" torpedoed as badly as they do quite satisfying.

Robbie Geary

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: Zeelotes Big red star, 1000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204305 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 4:37 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 18
Robbie wrote:
I don't feel too shy about volunteering others for this task when I see the time, resources and talent. Zeelotes (or rather his programmer sons, under his guidance) would be the most obvious candidate, though my past attempts at persuading him haven't been successful.

Alright, challenge accepted!

Just send me via email the data I'm lacking to do the validation and I'll put resources onto the task.

NOTE: In the past we only worked together on building an interface for your backtester, which would not provide any validation since the code and methods were yours alone. This, in contrast, will be using my code and methods, mostly my data, and some of your data that has resulted from the various clean up projects.

FURTHER NOTE: If the interface project relied on my skills alone I would have done it long ago, but unfortunately, the time my programmers have available to them now is very limited. But actually, if we had been able to complete what we started in building my data into a form your backtester could accept it would have been completed last Spring. Unfortunately, I put tons of time into that, only to end up failing before I departed on my summer excursions. It most certainly was not due to a lack of effort, time or resources put on the task.

I have skills in the areas of research and fresh ideas, but I do not have the requisite skills to program at the level required to build an interface, so now that my programmers time is so limited, what was possible then, is not so easily possible now.

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: ilmostro Big red star, 1000 posts 10+ Year Anniversary! Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204310 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 7:58 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 12
(Ironically, all of the power users are from the WERresearcher Yahoo! group rather than the MI board.)

Yes it is pretty ironic.
Even I have been able to do some simple runs so I don't quite understand why the big brains on this board don't use all the data and tools available to them.

I did a simple VL T1 WER overlap and almost 27% of the T1's were also part of the WER universe. Here is the code if anyone is interested:

field0=ril.w
field1=tim.v
step0=field0lt9999
step1=field1et1
step2=field1tn100
-count

To get the CAGR and other data just change the "-count" to "-shrink" without the quote marks.

To change the default holding period just enter "hold=XX" where XX is the number of days. So for a quarterly I would enter:

field0=ril.w
field1=tim.v
step0=field0lt9999
step1=field1et1
step2=field1tn100
hold=63
-shrink

Say you wanted to look at an odd timeframe, easy to modify the code.
start=19990601
end=20000501
field0=ril.w
field1=tim.v
step0=field0lt9999
step1=field1et1
step2=field1tn100
hold=63
-shrink


I would recommend for the newbies to the GTR1 backtester to start at Michael's site. It is far easier to cut and paste code at that site.
Plus you get to see other people's code and it is really easy to change a few things about the screen and create your own.

I still screw up a lot of the code but it hasn't broken Michael's site yet! Fingers crossed.

Bryan

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: KBGlenn Big red star, 1000 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204315 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 9:09 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
Michael's site?

Could you post the URL please? (Mea Culpa...I've been spending all my time on mkt timing and have spent very little time following/paying attention to screens for the past year.)

Tks KBG

Print the post Back To Top
Author: ilmostro Big red star, 1000 posts 10+ Year Anniversary! Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204317 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 9:38 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 7
Here is Michael's version of the GTR1 backtester:
http://backtest.selfip.org:8007/backtest/index.html
Here is Jamie's version of the GTR1 backtester:
http://www.backtest.org/gtr1/

Go to this link to get some screen code:
http://members.iinet.net.au/~rgearyiii/example.html
Simply copy the code of any screen and paste into Michael's "raw entry" form.

You can see what members are trying at Michael's site by looking at the "results by user" page, Jamie's site doesn't offer that capability.

If you have a screen that has variable number of stocks then you must use Michael's site, since Jamie's doesn't support the "-shrink" command. Same goes for if you want to see the number of stocks passing each step of a screen via the "-count" command.

You can get daily return data (plus the yearly returns) for a screen via Michael's blender by entering the following code into the "raw blend" page:
BLENDALL:(-portval)
BLEND:(1)
followed by the screen's code.

For example:
BLENDALL:(-portval)
BLEND:(1)
field0=ril.w
field1=tim.v
step0=field0lt9999
step1=field1et1
step2=field1tn100
-shrink

The blender also gives you the Sharpe and UI which I assume is the ulcer index.

Bryan

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: JeffLandon One star, 50 posts Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204328 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 12:48 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 5
>>Ironically, all of the power users are from the WERresearcher Yahoo! group rather than the MI board.

I'd peg it as 5% ironic and 95% pathetic. All you have to do is go back to read old board posts from 10 or 5 years ago to see what's missing here now. Skepticism, innovation, attention, barnbustin'. It only comes from a few sources now. For all the messages that flow through here, it's a mighty sleepy place.

Y'all, in my humble opinion, have become ossified. You have a set of beliefs that seem firmly entrenched. Once a week, you should wake up skeptical of something you believe in deeply. You should always be thinking of new ways to challenge what you believe, even if that's not your primary focus.

It's almost as if, having found so many screens, you don't know what to do next.

There is PLENTY to do. Believe me.

You're practically standing still.

Where are the "lists of things we should be doing" that used to appear regularly here? Where's the heart, the soul, and the guts that used to be here?

[My opinions are not endorsed by the management. I am not a doctor--I only play one on TV.]

Print the post Back To Top
Author: emintz Big red star, 1000 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204330 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 2:15 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 18
Y'all, in my humble opinion, have become ossified. You have a set of beliefs that seem firmly entrenched. Once a week, you should wake up skeptical of something you believe in deeply. You should always be thinking of new ways to challenge what you believe, even if that's not your primary focus.
...
Where are the "lists of things we should be doing" that used to appear regularly here? Where's the heart, the soul, and the guts that used to be here?

Well, you have choices. You can post like this, railing about the fact that everyone else isn't doing what you think they should be doing. That isn't going to have any effect on anyone. Alternatively, you can start an effort that goes in the direction that you think is interesting, and gather other interested people into that effort.

In the past, when I was definitely a more productive contributor to this board than I am now, those contributions were in areas that I was interested in and in which I felt able to add something to the conversation. I guess I am now "ossified" because I am satisfied with my own methods.

On the other hand, skepticism has never been in short supply here. It is that very skepticism that has chased away individuals who couldn't handle having their ideas challenged without actual data in support. Others endured the skepticism until they were able to put together the effort to test their ideas.

Eric

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: JeffLandon One star, 50 posts Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204332 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 2:55 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 2
There's nothing wrong with a given person settling down into a plan. That's the whole point of MI, to me. I didn't mean that individuals have become ossified, but the board as a whole. Perhaps it's that long-discussed learning curve that dissuades new blood that's the problem.

Other groups, including institutions, are doing similar work, and taking it in new directions. There's no reason not to do that here. There's no reason not to build up the foundations to a more robust state.

List the beliefs and try to figure out how to test them. Be innovative. Kill the anecdotes. Use Monte Carlo. Use Bayes. Make an attempt to take things from religion to science. Find out what the limits to knowledge are. Let's get rational. I wanna get physical. Let's get into physical.

I'll put forth my list of work that needs to be done. But I want to see other lists first.

Print the post Back To Top
Author: elann Big gold star, 5000 posts Top Favorite Fools Top Recommended Fools Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204334 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 3:51 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 17
Robbie, thank you for your response. As thorough as always. Please see my comments below.

Indeed: if you suspect bugs in my backtester or GTR1 linearization, don't be shy about saying so. You didn't respond to my "qualitative" replies (at least I don't think you did) the last two times, so I had no way of knowing they were inadequate.

I've stated a few times (which is a lot, considering that I hardly ever post) that I think the greatest priority in the area of daily-cycled backtesting should be building another backtester from scratch (preferably one that doesn't use the same GTR1-linearized data, at least not at first) for the purpose of more easily exposing bugs. I realize that sounds like a tall order, but the GTR1 backtester is actually my fourth backtester written from scratch (though the first written in C++; previously, I did all my backtests in Excel/Access VBA). A second backtester wouldn't even need to be public or user-friendly--its only public purpose would be confirmation of my backtester's results by its creator.


Considering the heroic and unique effort you put into the data collection, linearization, and backtester, I don't think it's realistic to expect anyone to do the same from scratch. If anyone ever had the motivation to build such a tool, other than you, the fact that you got there first is a strong demotivator for the next guy.

A few more considerations that come to mind are:

1. Your statement, "We can regard the 20 start dates you tested, and Jamie's monthly test, as 21 sample paths through the history of a screen and note that Jamie's result always comes out on top", is actually false. Jamie's results don't always come out on top. For the vast majority of "boring" screens, Jamie's results tend to fall nicely between my Min/Max statistics, with perhaps a slight tilt downward due to the effects of GTR1 linearization. It's only the most popular MI screens--those that surface in the course of blend optimization--that get knocked down by 5-10 CAGR points. IMO, that's the smoking gun of curve-fitting (pardon the mixed metaphor), not evidence of bugs.


I'd really like to see such results, for specific screen examples and/or in summary. As I suggested, this would go a long way toward addressing my skepticism.

2. I'm not ashamed to say that my modus operandi is simply this:

Create buzz (which includes skepticism) about daily-cycled backtesting --> Widespread use of my backtester --> Donations to the MI treasury --> Reimbursement for VLWRP costs.

Thus, I haven't bothered spending the time required to post on all the screens that don't get torpedoed because they wouldn't create buzz. My hope has been that people would discover all the screens that don't get torpedoed on their own. Besides getting reimbursed, I would like to see all the people who put a great deal of time into VLWRP (including yourself) get rewarded with results.


You may be a backtesting genius, or someone who's willing to put a heroic effort into a mind numbing job, but a marketing genius you're not. :-)

For a data oriented and analytical group as we have here, buzz will not get you very far. Credibility is much more important IMO. Is there any difficulty getting reimbursed for your investment? I'd be surprised if there is. You delayed your request for funding for a very long time. I don't know how much money has come in recently in response to the first call. But I'm sure that a periodic progress report and a friendly nudge for additional funding until the goal is met will get you fully reimbursed without "buzz".

3. As my (very thin) posting history shows, I started torpedoing popular MI screens in August of 2006 using my third Excel/Access VBA backtester, a couple months before I started re-writing my backtester and databases from scratch in C++ (my modus operandi was the same back then, except I was still trying to get others to write the public daily-cycled backtester). Except where I have made changes in field regularization, all of my old torpedo results agree very closely with those obtained with my new GTR1 backtester.

4. Upon TechCzech's completion of his first GTR1 Linearizer in 2005, I did an analysis of the effects of GTR1 linearization on basic RS screens by backtesting them on both my old Sux database and the new GTR1-linearized database. GTR1 linearization (which more realistically handles suspensions, delistings, spin-offs and mergers) appeared to add about 1-2 points to GSD and knock about 1 point off of CAGR for monthly RS screens, with slightly increasing effect as the holding period lengthened. I actually expected more of an impact. A couple bugs in my linearization logic (and one or two in the linearizer itself) have been discovered and fixed since then, but none that had a significant impact on results. Thus I don't think GTR1 linearization accounts for very much of the difference between GTR1 and backtest.org.

5. I have done complete audits of Jamie's and my backtests for a few popular MI screens and concluded that we were both correct. I haven't reported on these findings because there wasn't much to say besides, "Yep, we're right." (Zeelotes has already thoroughly replicated Jamie's backtests anyway.)


I'll repeat, I think it's very important to see a more complete set of results. This is not only to confirm that there is no systematic error lurking in the backtester, but I think it's no less interesting for us to know which "mediocre" screens in fact hold up in the daily testing. As high flying screens get "torpedoed", mediocre screens that are not torpedoed may suddenly become better candidates for investment.

6. It seems that those who have become proficient with my backtester (via either Jamie's or Michael's interfaces) have developed some degree of faith in the results. (Ironically, all of the power users are from the WERresearcher Yahoo! group rather than the MI board.) I would therefore suggest that you (and Zeelotes) let your skepticism motivate you to put forth the effort required to use the backtester yourselves. If you do, I think you'll find my curve-fitting explanation for why only popular screens get "impossibly" torpedoed as badly as they do quite satisfying.

I plead guilty to not even attempting to use your backtester, even though I've been one of the more enthusiastic supporters of the idea. The difficult interface has caused me to put off making the effort, perhaps until I decide to reevaluate my own strategy.

If there is a worthwhile programming effort to be made, I think it would be for someone (not you because you've done enough) to develop a user interface that is as easy to use as Jamie's.

Elan

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: ilmostro Big red star, 1000 posts 10+ Year Anniversary! Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204339 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 5:00 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 3
If there is a worthwhile programming effort to be made, I think it would be for someone (not you because you've done enough) to develop a user interface that is as easy to use as Jamie's.

It already exists. Michael's backtester is easier to use than Jamie's.
I have posted about how easy it is to steal some code and paste into Michael's site to get daily return data on screens.

Joe is currently writing/updating a user guide for Michael's site and there is a lot of discussion of the GTR1 in the WER group and talk about making the guide a Wiki.

If I can start to use and understand the GTR1 backtester there really is no reason a motivated person couldn't use it also as I am not the sharpest tool in the shed and have absolutely no programming skills or specialized knowledge.

Bryan

Print the post Back To Top
Author: Smufty2 Big red star, 1000 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204349 of 252157
Subject: Re: A Blend Torpedoed Date: 11/25/2007 8:05 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 10
I haven't bothered spending the time required to post on all the screens that don't get torpedoed because they wouldn't create buzz.

On behalf of all the computer semi-literate I ask, please post the non-torpedoed list!
Reimbursement for VLWRP costs.
The check is in the mail.
I mean really, the check is in the mail.

Smufty

Print the post Back To Top
Author: winker Big red star, 1000 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204356 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 12:08 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 8
I'll put forth my list of work that needs to be done. But I want to see other lists first.

Why? Either you're interested in putting forth your list of work or you're not. In my opinion, the second sentence is an extremely lame excuse of some sort.

Larry

Print the post Back To Top
Author: Zeelotes Big red star, 1000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204357 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 12:18 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
Jeff wrote:
I'll put forth my list of work that needs to be done. But I want to see other lists first.

Yeah, right!

Print the post Back To Top
Author: JeffLandon One star, 50 posts Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204358 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 12:19 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
>>Why? Either you're interested in putting forth your list of work or you're not.

I'll do my research regardless. I know what I want, but perhaps it's not in alignment with what anyone else wants to do.

Print the post Back To Top
Author: Zeelotes Big red star, 1000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204361 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 12:36 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 3
Jeff wrote:
I'll do my research regardless. I know what I want, but perhaps it's not in alignment with what anyone else wants to do.

The only way to find out is to post and see. This group has been built on the understanding that there is benefit to the combined efforts of everyone.

It is easy to pick holes in what everyone else is doing, or even what the group as a whole is doing, while sitting in your armchair and refusing to put your own work under the scrutiny of the group.

Print the post Back To Top
Author: elann Big gold star, 5000 posts Top Favorite Fools Top Recommended Fools Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204362 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 12:43 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 39
I'd peg it as 5% ironic and 95% pathetic. All you have to do is go back to read old board posts from 10 or 5 years ago to see what's missing here now. Skepticism, innovation, attention, barnbustin'. It only comes from a few sources now. For all the messages that flow through here, it's a mighty sleepy place.

Y'all, in my humble opinion, have become ossified. You have a set of beliefs that seem firmly entrenched.


I think what you wrote is a huge affront to people like zeelotes, rgearyiii, mungofitch and some others who have worked their fingers to the bone in the last year and made great contributions.

Elan

Print the post Back To Top
Author: lsmr409 Big red star, 1000 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204363 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 1:25 AM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
JeffLandon,

Just wondering if you are a re-incarnation of a dearly-beloved but recently-absent contributor here.

The comment about getting physical struck a familiar note somehow, as did the suggestion about Bayes.

;-)

Todd
reading a little between the lines

Print the post Back To Top
Author: JimZipCode Big funky green star, 20000 posts Feste Award Nominee! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204440 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 10:33 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 4
I'll put forth my list of work that needs to be done. But I want to see other lists first.


You might get better results if you go in the opposite order. Many of us are probably ossified or comfortable enough that we might need to see such a list to get stirred up enough that we start to think of other stuff.

Most of the items on my MI to-do list are, uh, operational rather than theoretical. I'd like to replicate the functions of BarryDTO's most excellent RRS spreadsheet using perl – even if it winds up being text files rather than a spreadsheet – with an eye toward maybe eventually making something like it functional in the Open Office spreadsheet app. I'd also like to replicate some of the functions of the Hoadley options analysis spreadsheet in perl. Those to-do items have as much to do with a professional goal (getting better at perl) and a personal preference (open source software!) as they do with MI.

I do have an interest in researching some simple timing systems, using available (non-proprietary) info. My intuition is that stuff which works perfectly well on a major index will not work as well with MI screens, so that means care is warranted. I've looked at some stuff, found nothing I'm comfortable using, haven't really gone anywhere with it.

I share your impression that we have a tendency to be insular and self-congratulatory: let's say “ossified”. Mostly we're small, much smaller than we were 5 years ago, so we don't have as much diversity of voices in the discussion. (And some of our most vibrant, most interesting voices are vanished, like Sparfarkle and Arezi and BarryDTO and MrToast and LorenCobb. Welcome back Rayvt!) Still I think there are exciting developments, such as Robbie's daily-start backtester.

Post your to-do list. I'd like to see it; and I'm sure it'll spark some discussion and some ideas.

Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Print the post Back To Top
Author: JeffLandon One star, 50 posts Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204442 of 252157
Subject: Re: A Blend Torpedoed Date: 11/26/2007 11:00 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
I decided I'd back into my list. For my first trick, I'll build a robustness index from the ground up. This will be an exercise for my own amusement. If anyone ele is interested, that's fine.

I started with "halflife" and moved on to "weebleness." Next up: phasing, which will look at the difference in returns between holding periods in a narrow range (18-23 days).

Print the post Back To Top
Author: spiralingup One star, 50 posts 8 Year Anniversary! Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 204978 of 252157
Subject: Re: A Blend Torpedoed Date: 12/6/2007 8:43 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 0
I heartily agree with Elan. I have been for the most part a lurker with occasional contributions and recently have wandered over to the BMW Method board (which has some great stuff, BTW) and got a month behind on posts. And what a month! Maybe absence makes the heart grow fonder, but some of the posts recently have been the kind that don't just chip around the edges but that really make me sit up and decide to change my portfolio strategy in fundamental ways and feel confident in doing so.

Keep it up, everybody. We're gonna rock 2008, no matter what the market does.

Wes

Print the post Back To Top
Author: hirundo Three stars, 500 posts Old School Fool Add to my Favorite Fools Ignore this person (you won't see their posts anymore) Number: 205515 of 252157
Subject: Re: A Blend Torpedoed Date: 12/24/2007 9:43 PM
Post New | Post Reply | Reply Later | Create Poll . Report this Post | Recommend it!
Recommendations: 5
ilmostro (Bryan) wrote:
If I can start to use and understand the GTR1 backtester there really is no reason a motivated person couldn't use it also as I am not the sharpest tool in the shed and have absolutely no programming skills or specialized knowledge.

Agreed that a motivated person could/can use the GTR1 backtester at michaeljem's site. Like you, I've learned to use it and I'm no programming ace.

But more people would use a GTR1 site with a snappier, yet full-featured user interface. They would spend less time learning the interface, would make fewer mistakes when entering jobs, and would run more jobs. This kind of change in learning and usage patterns, following an upgrade in user interface, is just a fact of life.

hirundo

Print the post Back To Top
UnThreaded | Threaded | Whole Thread (35) | Ignore Thread Prev Thread | Next Thread
Advertisement