Day 29: Out of sample

The moment of truth has arrived! On Day 28, we iterated through all the metrics we had previously used to identify and analyze the robustness of our strategy. We found the new adjusted strategy performed better than the original and adjusted strategies. Such performance was also statistically significant for key scenarios. But on simulation, buy-and-hold beat the new adjusted strategy on average across different sampling methods. Now it’s time to look at how our various strategies would have performed out of sample.

Day 28: Reveal

On Day 27, we had our strategy enhancement reveal. By modifying the arithmetic behind our error correction, we chiseled another 16% points of outperformance vs. buy-and-hold and the original 12-by-12 strategy. All that remains now is to run the prediction scenario metrics and conduct circular block sampling. Given that we’ve laid the ground work for these analyses in past posts, we will only spill a little bit of virtual, binary ink in the discussion.

Day 27: Enhancement

On Day 26, we extended the comparative error analysis to the original, 12-by-12 strategy and showed how results were similar to the unadjusted strategy relative to the adjusted one. The main observation that emerged was that the adjusted strategy performed better than the others due to identifying most of the big moves when it was correct and not missing the big moves when it was not. This was borne out by statistical tests that showed the mean difference between returns for the true positives and false negatives for the adjusted strategy were indeed significant relative to the others.

Day 26: Adjusted vs. Original

The last five days! On Day 25, we compared the peformance of the adjusted vs. unadjusted strategy for different prediction scenarios: true and false positives and negatives. For true positives and false negatives, the adjusted strategy performed better than the unadjusted. For true negatives and false positives, the unadjusted strategy performed better. Today, we run the same comparisons with the original 12-by-12 strategy. We present the confusion matrices below for all three strategies.

Day 25: Positives and Negatives

On Day 24, we explained in detail how the error correction term led to somewhat unexpected outperformance relative to the original and unadjusted strategies. The reason? We hypothesized that it was due to the the error term adjusting the prediction in a trending direction when or if the current walk-forward model was mean reverting. We noted that the walk-forward models tended to have negative size effects, so were likely mean reverting.

Day 24: Lucky Logic

On Day 23 we dove into the deep end to understand why the error correction we used worked as well as it did. We showed how traditional machine learning uses loss functions and then hypothesized how our use helped improve predictions through its effect on the correlation of the signs of the prediction with that of the forward return. We have to admit that our decision to use the error term in the way we did was a bit hacky so while it did generate improvements vial trial and error, one wouldn’t have necessarily thought to use it in the way we did.

Day 23: Logic or Luck

On Day 22 we saw a meaningful improvement in our strategy by waiting an additional week to quantify model error and then using that error term to adjust the prediction on the most recently completed week of data. What was even more dramatic was comparing this improved strategy to one that followed the same waiting logic, but did not include the error correct. It turned an underperforming strategy into an outperforming one!

Day 22: Error Correction

On Day 21, we wrung our hands with frustration over how to proceed. The results of our circular block sampling suggested we shouldn’t expect a whole lot of outperformance in our 12-by-12 model out-of-sample. To deal with this our choices were, back to the drawing board or off to the waterboard to start over or to torture the data until it told us what we wanted. However, we found a third way, in which we could use the information we already had, to make a few minor tweaks to improve the model.

Day 21: Drawing Board

On Day 20 we completed our analysis of the 12-by-12 strategy using circular block sampling on the 3 and 7 blocks. We found the strategy did not outperform buy-and-hold on average and its frequency of outperformance was modest – in the 28-31% range – insufficient to warrant actually executing the strategy. What to do? Back to the drawing board to test a new strategy? Or to the water board to torture the current one?

Day 20: Strategy Sample

On Day 19, we introduced circular block sampling and used it to test the likelihood the 200-day SMA strategy would outperform buy-and-hold over a five year period. We found that the 200-day outperformed buy-and-hold a little over 25% of the time across 1,000 simulations. The frequency of the 200-day’s Sharpe Ratio exceeding buy-and-hold was about 30%. Today, we apply the same analysis to the 12-by-12 strategy. When we run the simulations we find that the strategy has an average underperformance of 11% points vs.