Can we trust the results of academic and practitioner research in asset management? For a blog focusing on summaries of research papers, this is of course a very important question. But even without such an obvious bias, this is a very interesting issue for all who use some form of research for their investment decisions. The author of this week’s AGNOSTIC Paper presents several concerning facts and strongly recommends to not take all research insights at face value…
Everything that follows is only my summary of the original paper. So unless indicated otherwise, all tables and charts belong to the authors of the paper and I am just quoting them. The authors deserve full credit for creating this material, so please always cite the original source.
Setup and Idea
The concern about false discoveries in research is neither new nor a phenomenon that only affects finance or asset management research. In 2005, for example, John P. A. Ioannidis published an article with the provoking title Why Most Published Research Findings Are False in a medical journal. Unfortunately, the existence of false positives is a problem we can never really solve. If a researcher tests 1000 ideas and applies a conservative 99% significance level, she will still have 10 false discoveries. Apply this to the global research community and we have probably thousands of false discoveries each year.
On top of that, and this brings us to this week’s paper, incentives play a huge role in the production of research. The author explains how most of the academic system incentivizes to produce as many publications as possible. Journals come with impact factors and are more interested in papers with “positive results” that support plausible hypotheses. Junior researchers need publications to get promotions and a higher salary. And analysts within banks or asset managers should come up with interesting insights or strategies for their clients. Given these factors, some healthy skepticism about research results is definitely useful. As Warren Buffett once said, “Don’t ask the barber whether you need a haircut.”
Data and Methodology
The paper is a qualitative overview about some potential problems with asset management research. There are no original analysis so there is no dedicated Data and Methodology to present at this point.
Important Results and Takeaways
Some concerning facts about finance research
The author starts with a few interesting facts about finance research that should at least raise our attention. As you may recognize from my posts, I am always a bit skeptical and require thorough tests, but the following charts strengthened this view even further.
Figure 1 shows that a staggering 90% of Economics & Business papers (sixth row from the bottom) “support” their hypothesis. Figure 2 shows a specific example for this. Until December 2018, the literature came up with more than 400 factors that apparently predict stock returns.1Always remember that I am not better than the rest. I also produced a master thesis that replicates one of those 400 factors. But I am gradually learning and improving… We all know that stock markets are quite efficient and it is difficult to beat them. So how can those results be true?
First, things aren’t as bad as displayed here. The “factor zoo” is a bit messy but many of the 400 factors are just different ways of measuring the same major patterns. There is a broad consensus about a handful of robust factors (remember all the out-of-sample tests…) that drive asset prices. However, there is not necessarily a consensus about how to measure them (and there will probably never be). Against this background, it is not too surprising that researchers always come up with new incremental innovations of the same thing as it is an easy way to publish a paper. So the good news are that the >400 factors are most likely not only false positives. The bad news are that most of them are not very relevant either.
Even though the “factor zoo” is probably less of a problem than it seems from this chart, the concerns of the author are very real. We don’t have the luxury of fixed natural laws and experiments in empirical finance, so we need to be very careful with the data and avoid bad habits wherever possible. Figure 4 shows that many researchers apparently don’t take this serious enough. The chart shows cumulative excess returns of ETFs after their inception date. Excess returns are positive for the 36 months before inception but virtually flat for the 36 months thereafter. This is obviously not satisfying and probably the result of a combination between wrong incentives and other biases that may enter the research process.
Research incentives and multiple testing
As I mentioned in the introduction, most academic and practitioner departments reward publication output. It is difficult to tell your manager that you have tested 127 trading strategies but none of them reliably worked even though it might be the intellectually honest answer. Similarly, you can hardly apply for a professor position and say you haven’t written a paper yet but you had 127 rejected ideas. Such setups inevitably lead to some biases that enter the research process. The author provides various examples for bad habits and I will summarize the most important ones below.
The mother of all biases is multiple testing. Let’s go through an example to understand this. Suppose you find a profitable trading strategy for US equities. Naturally, you will test the same strategy with slightly different parameters over the course of your research project. Quarterly vs. monthly rebalancing, value vs. equal weights, rebalancing at the end or middle of the month, you name it. From the perspective of robustness, this is completely reasonable but only if you report all results. Interestingly, however, many brains are somehow wired to cherry-pick the best backtest and only report this one. Unless there are plausible reasons why this particular specification performed best, this is a classic example of multiple testing. If you see papers that report just one particular of many plausible specifications, this is definitely a red flag.
Other examples for multiple testing are all kind of adjustments researchers do after running the first model. Removing outliers, changing the sample period, excluding certain periods where things didn’t work – those are all bad practices that often start when the data doesn’t really fit the desired hypothesis.
How to cope with this problem? Researchers should carefully document how many tests they were running and report the results from all of them. There are also approaches to increase the thresholds for statistical significance as a function of the number of tests.2For example, the Deflated Sharpe Ratio from Bailey & de Prado (2014). For example, a t-statistic of 2.58 is fine for a 99% significance level in the first test but for the 27th, it must be much higher to guard against multiple testing.
Practitioner research in asset management
In the final section, the author distinguishes three types of practitioner research in asset management and the potential impact of biases for each of them. First, there is “research” for clients which is often just marketing. Second, there are articles in practitioner journals (also a form of marketing but more scientific). And finally, there is (usually proprietary) research that fuels investment processes.
According to the author, the first two categories (unsurprisingly) suffer more from biases and bad practices than the final one. The reasons are simple. Clients don’t want to hear that you did 127 research projects without any practical results. Similarly, you have a hard time publishing papers in practitioner journals without significant results.
For the third one, however, there are actually incentives to do proper research. If you build an investment process on flawed analyses, the investment product will most likely disappoint and the asset manager will suffer from outflows and reputation problems. That is the ideal world of asset managers who optimize for long-term value-adding strategies for investors.
Unfortunately, this is not always (or even rarely) the reality. The author mentions that over shorter horizons, there are also other incentives. For example, an asset manager could simply launch a bunch of funds in the hope that some will outperform by chance and stay in the mind of investors. In my experience this is unfortunately quite common. Also note that revenues of many asset managers often just depend indirectly on performance. Most of them charge a fixed fee on assets under management, so for short-term profits, it is sufficient to raise a lot of money from investors. Even when the strategy sucks over the next years, the manager could be already rich enough to do something else…
Conclusions and Further Ideas
This week’s paper is obviously quite skeptical and those are not the type of comments the research or practitioner community wants to hear. However, it is (in my opinion) important to discuss such issues because they are generally undeniable. Does this mean that all research is bad and we should stop reading it? I don’t think so.3Remember that incentives matter. I am writing posts about research papers. So don’t trust me at this point… If we consider the important best practices and evaluate results conservatively, I believe sound research it is still the best we can do to approach financial markets.
In fact, the author offers some advice to mitigate the problems. First, take all research results with a grain of salt and mentally discount them depending on the quality of the work. Second, check the origin of the research and identify problematic incentives. Third, identify potential costs of false discoveries in your process. Fourth, be very skeptical about results without a plausible economic mechanism. As I mentioned in my posts on factor investing, we want to know who is on the other side of our trades and why. Fifth, ask for the number of tests and watch out for all types of creative data cleaning. If not all are reported, this is multiple testing and most likely a problem. Finally, if you are not sure if a result is false positive, it probably is.
- AgPa #72: Machine-Reading of Private Equity Prospectuses
- AgPa #71: Go Where the Earnings (per Share) Are
- AgPa #70: Equal vs. Market Cap Weights
- AgPa #69: Rebalancing Luck
This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). All content that is not my intellectual property is marked as such. If you own the intellectual property displayed on this website and do not agree with my use of it, please send me an e-mail and I will remedy the situation immediately. Please also read the Disclaimer.
|1||Always remember that I am not better than the rest. I also produced a master thesis that replicates one of those 400 factors. But I am gradually learning and improving…|
|2||For example, the Deflated Sharpe Ratio from Bailey & de Prado (2014).|
|3||Remember that incentives matter. I am writing posts about research papers. So don’t trust me at this point…|