The Cross-Section of Stock Returns before CRSP (2023)
Guido Baltussen, Bart van Vliet, Pim van Vliet
SSRN Working Paper, URL
Before I start with this week’s AGNOSTIC Paper, a brief announcement. Given that I currently have a lot of projects ongoing, the posts of this series will be considerably shorter until the end of June 2023.
This week’s AGNOSTIC Paper is an unprecedented out-of-sample test of the major factors (Momentum, Value, Low-Risk, Size) that I examined over the last weeks. The authors construct a novel dataset of US stocks that reaches from 1866 to 1926. It therefore extends the extensively studied CRSP dataset by 60 years.
Everything that follows is only my summary of the original paper. So unless indicated otherwise, all tables and charts belong to the authors of the paper and I am just quoting them. The authors deserve full credit for creating this material, so please always cite the original source.
Setup and Idea
I mentioned the importance of out-of-sample tests several times throughout the last five weeks. But why are they so important, especially when it comes to investing? Because they are our best weapon against data mining and overfitting. If you discover a winning strategy that historically worked in the US, you want to make sure that a similar strategy also works in Europe, Japan, and everywhere else. If it doesn’t and there are no plausible reasons for it, chances are high that you didn’t discover a true pattern but fell prey to statistical noise.
Needless to say, this is very important for investing because we all have the tendency to produce wonderful backtests. It is much more important, however, that our strategies work going forward because that’s when we want to make money from them. At least I want to bet my money on real patterns, not on statistical noise…
Unfortunately, the scope for out-of-sample tests is naturally limited in investing. You can test strategies around the globe and within other asset classes, but to see if they also work in the future, you need to wait until the future happened. That’s generally true, but you can also test strategies with “new history”. What matters for out-of-sample tests is a dataset no one has looked at before and this is exactly the contribution of this week’s paper…
Data and Methodology
This authors construct a novel database of US equities that spans from 1866 to 1926 and covers 1,488 stocks. Apart from stock prices, the database also includes market capitalization and dividends. This allows to test the major factors (Momentum, Value, Low-Risk, Size) within a 60 year sample that hasn’t been analyzed before. Truly out-of-sample…
The data comes from Global Financial Data (a data provider) and the Commercial Financial Chronicle (a well-read financial newspaper), and is mostly hand-collected by the authors. Needless to say, this is a large effort and it’s pretty cool that the authors did this and pulled together a sample that is “[…] believed to be free from survivorship bias”. As data quality is of course an issue in such a project, the authors run several tests and filters to ensure that potential errors do not interfere with the analyses.
The result is (in my opinion) pretty cool and the new database certainly helps to advance our understanding of equity markets. It is also impressive to see how the research community continuously explores new ways to stress-test existing findings on new data. In my opinion, critics of evidence-based investing should take a careful and open-minded look at the effort that is done here…1Actually not. We need the skeptics to take the other side of the trades…
Important Results and Takeaways
While the paper also examines a few other issues, I will limit this post to the out-of-sample tests of the four major factors. I will also not dig into the methodology and intuition of each factor as these are covered in the last weeks’ posts.
Momentum, Value, and Low-Risk were there before 1926
The following chart summarizes the key results of the paper. The authors replicate the four major factors and report return spreads of long-short portfolios and CAPM Alphas for the in-sample period (1927-2019) and the out-of-sample period (1866-1926). The axis labels correspond to the following factors:
- Beta: the Betting Against Beta factor, a popular low-risk/defensive strategy.
- Size: the size effect, stocks sorted by market capitalization.
- Dividend: high dividend yields as proxy for the value-factor as accounting variables are not available for the out-of-sample period.
- Momentum: the standard momentum-factor, stocks sorted by 12-month returns.
- ST Reversal: the effect that securities with low returns over the last month tend to generate high returns for a short period thereafter.
All of the major factors exhibit significant return spreads and CAPM alphas in the out-of-sample period before 1926. In fact, the results are quite striking. Low-Risk and Value were actually more profitable in the period before 1926. For Momentum, the out-of-sample results are somewhat weaker but still sizable. The Size effect, however, is very weak and not statistically significant in either sample.
Although this paper came out a few years later than each of the last five, the overall results are virtually identical and fully in-line with the consensus in the literature. Strong evidence for Momentum, Value, and Low-Risk, but no evidence for a stand-alone Size effect. In my opinion, this is pretty amazing and further adds to the evidence that the major factors come from real mechanisms instead of data mining. This is exactly what we would like to have as an investor.
Factors weren’t stronger before 1926
Another important issue with factors is decay. Among others, McLean & Pontiff (2016) show that the profitability of factors significantly decayed after popular publications. This is quite reasonable because when more people know about a profitable trade, they probably try to exploit it. Critics of factor investing therefore often argue that factors will not work in the future because they get too crowded or arbitraged away.
The authors address this issue and compare the return spreads and CAPM alphas of the factors between both sample periods. If the “arbitraged away argument” holds true, return spreads and alphas should be smaller in the “more recent” 1927-2019 period. The following table summarizes the results.
As you can see from the Difference lines, there is no clear pattern. Momentum (UMD) was weaker in the pre-1926 period, Value (HML) and Low-Risk (BETA) were somewhat stronger. None of the differences, however, are statistically significant at the common 5% level. This is evidence against factor decay and suggests that the major factors haven’t been arbitraged away since 1866. The fact that the CAPM alphas and return spreads (except for Value) are all highly significant over the entire sample period from 1866 to 2019 also speaks against this.
Machine learning models find the same factors
In an extension to the main analysis, the authors apply some more sophisticated prediction models to their novel data. Specifically, they use a set of well-known machine learning algorithms to account for non-linear relations between returns and characteristics.
I will not go into the technical details. However, it is quite interesting that these “self-learning” models also select many of the factors from before. According to the authors, important predictive signals are dividend yield (the value proxy), different momentum variables, beta, other risk measures, and market capitalization.2Note that this doesn’t necessarily contradict with the evidence against a stand-alone size effect. Most factors work better among smaller companies and machine learning models incorporate such interactions. In my opinion, it is very satisfying that a purely data-driven approach arrives at the same conclusions as the literature on factor investing. Needless to say, this further adds to the evidence behind the major factors.
Conclusions and Further Ideas
No matter how you tweak or twist it, the overall message remains the same. There is strong and robust evidence for sizable Momentum, Value, and Low-Risk premiums among US equities. Both before and after 1926. These results speak against data mining concerns and suggest that there are real economic mechanisms behind the major factors. And apparently, they were already active in the 19th century and haven’t changed much since then.
- AgPa #72: Machine-Reading of Private Equity Prospectuses
- AgPa #71: Go Where the Earnings (per Share) Are
- AgPa #70: Equal vs. Market Cap Weights
- AgPa #69: Rebalancing Luck
This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). All content that is not my intellectual property is marked as such. If you own the intellectual property displayed on this website and do not agree with my use of it, please send me an e-mail and I will remedy the situation immediately. Please also read the Disclaimer.