Machine Learning and Fund Characteristics Help to Select Mutual Funds with Positive Alpha (2023)
Victor DeMiguel, Javier Gil-Bazo, Francisco J. Nogales, Andre A. P. Santos
SSRN Working Paper, URL
The second AGNOSTIC Paper on the application of machine learning in manager selection. This week’s paper follows essentially the same idea as Kaniel et al. (2022) in AgPa #65. The authors also examine a comprehensive sample of US mutual funds and although they use slightly different methodology, arrive at generally similar conclusions. This, of course, increases the evidence that machine learning is indeed helpful for manager selection…
- Week 1: US Mutual Funds – Alphas
- Week 2: US Mutual Funds – Long Only
- Week 3: US Mutual Funds – Total Returns
- Week 4: Hedge Funds
Everything that follows is only my summary of the original paper. So unless indicated otherwise, all tables and charts belong to the authors of the paper and I am just quoting them. The authors deserve full credit for creating this material, so please always cite the original source.
Setup and Idea
The setup and idea of this week’s paper is basically the same as for the last one. Over the last few years, more and more research showed that machine learning models add significant value in predicting returns of asset classes like stocks and corporate bonds. As a consequence, it is just logical that investors and researchers now also apply the methods in other fields and mutual funds are a natural candidate. For more details about the Setup and Idea and applications of machine learning in asset management, I recommend to read AgPa #65 and this post from December 2021.
Although this and last week’s paper are very similar, there are some notable differences. As I already mentioned last week, applying machine learning to mutual fund selection is a very recent and currently emerging research area. This week’s paper is therefore also not yet published and competition doesn’t exclude academia. The authors are well-aware that others also write papers about the issue and provide a detailed explanation how theirs is different.1As you can see, even the academics need marketing… Most importantly, they explicitly focus on long-only portfolios that are actually investable in practice. While long-short portfolios are theoretically appealing to isolate differences among securities, it is hardly possible to short mutual funds. Returns and alphas from the short side are therefore most likely not implementable in practice. The authors further argue that they include all fund-costs in their analyses, most importantly front and back loads. They also use a variety of established machine learning models and consider different variables for their prediction. In line with the result of Kaniel et al. (2022), they also ignore holding-characteristics and only focus on fund characteristics.2To avoid any confusion, I will henceforth refer to last week’s post and paper as Kaniel et al. (2022).
Overall, the paper therefore augments last week’s paper by Kaniel et al. (2022) as it arrives at generally similar conclusions with somewhat different methodology. In addition to that, it is probably somewhat more relevant for real-world investors as the authors put more emphasize on practical issues like fees and implementation.
Data and Methodology
Similar to Kaniel et al. (2021), the authors source their data from the CRSP Survivorship-Bias-Free US Mutual Fund database. The sample ranges from January 1980 to December 2020 and includes 8,767 unique fund share classes from diversified equity and sector funds. This sample follows a set of filters. Most importantly, the authors exclude funds that charge front or back-loads to only consider mutual funds without direct trading costs in their backtests. Obviously, they also exclude passive funds as those should not earn any alpha by construction.
To train their machine learning models, the authors use several fund characteristics for which the mutual fund literature found significant relation with future returns. They also calculate several past-performance statistics for each fund against the Fama-French-5-factor-plus-momentum model (FF5+MOM). Overall, they use 17 input-variables and the table summarizes them below.
Again very similar to Kaniel et al. (2022), the authors construct their machine learning models to predict risk-adjusted abnormal returns, a.k.a. alpha. While Kaniel et al. (2022) use 4-factor-alphas, this week’s authors go one step further and train their models on 6-factor alphas from the earlier mentioned FF5+MOM model. Another notable difference is the prediction setup. This week’s authors predict alpha on a 12-month horizon and thus assume much longer holding periods.
The most important difference, however, are the machine learning models. While Kaniel et al. (2022) only use neural networks on the basis of prior research results from a popular study on machine learning in asset pricing, this week’s authors actively decide against them. Specifically, they argue that neural networks require much larger datasets to unlock their full potential. Therefore, they use Elastic Net, Random Forests, and Gradient Boosting instead.
I will not go into the details of those models as all of them are nowadays well-established for machine learning applications. Needless to say, I think it is great that the authors approach basically the same target (identifying outperforming funds) with the same data as last week, but with different models and methodology. Since the results are generally similar, this gives us additional evidence and robustness that machine learning is actually valuable for fund selection.
Important Results and Takeaways
Machine learning helps to identify outperforming funds
The heart of the paper is the performance analysis of mutual funds for which the models predict positive alpha. Before going into the results, the authors carefully explain their backtest and I think this is very useful. They use 6-factor-alphas from the first 10 years of their sample (1981 to 1990) to train their initial models on lagged fund characteristics (1980 to 1989). The models are designed to predict the alpha over the next 12-months. At the start of each year, the authors sort all mutual funds according to the alpha-forecast and form equal-weighted portfolios of the respective top-decile.
After each year, they add the passed data to the training sample, re-train the model, and repeat the forecast and portfolio formation. This procedure yields a monthly time-series of the top-decile fund portfolio from January 1991 to December 2020. Compared to last week’s approach, this methodology is much simpler and certainly easier to implement in practice as it doesn’t require monthly trading and (impossible) shorting of mutual funds. The authors also focus on net returns after all expenses to make the results as realistic as possible.
With respect to performance evaluation, the authors regress the returns of their top-decile portfolio against various factor models that include all of the major factors like the overall market, value, momentum, size, and quality. The following table shows monthly alphas for the different machine learning models, a benchmark where the authors just use linear regression to predict 6-factor-alphas (OLS), and two simple benchmarks with equal- and asset-weighted portfolios of all available funds at the respective point in time. The chart thereafter shows the cumulative 6-factor-alphas over time.
Both exhibits show very clear results. First, and always fascinating again, the average mutual fund produces negative alpha after costs. This becomes even worse for the asset-weighted average indicating that larger mutual funds tend to underperform even more. Second, a simple prediction method like OLS or Elastic Net produced some alpha over time, but it is not statistically significant. Instead, and this is the third point, investors apparently need more sophisticated machine learning models like Random Forests or Gradient Boosting to create a profitable fund selection strategy.
Looking at the cumulative alpha chart, however, even those delivered anything but steady profits and much of the alpha actually comes from a relatively short period between 1998 and 2002. So while the strategies look appealing on paper and the authors do their best to consider all kinds of practical details, I don’t know if this selection-strategy is really that appealing in practice. For this backtest, however, the statistically significant net 6-factor-alphas are between 2.36% and 2.69% per year which is definitely sizable. The authors explain that those numbers are also economically signifcant as the alphas of the top-decile funds are about twice their expense ratios.
Overall, the authors thus conclude that it is possible to identify outperforming funds with the help of machine learning. So far, however, all analyses are again based on alpha. As I explained last week, alpha and relative performance is not necessarily the right metric for all investors. It comes with a lot of implicit assumptions that not necessarily fit to everyone. Most importantly, you can generate positive alpha but still lose money if one or more factors exhibit negative absolute returns. The authors address this issue and provide some information about monthly excess returns and other performance measures in the table below.
The top-decile portfolios from the two machine learning models generate a very decent monthly return of about 0.9% in excess of the risk-free rate. That corresponds to a compounded annual excess return of about 11.3% which is definitely sizable. Those returns, however, also come with significant risk. For example, the monthly volatility of returns is close to 5% and there have been drawdowns of more than 50%.
Unfortunately, the authors do not include a benchmark for the overall US stock market over that period. The Sharpe ratios of the two top-decile portfolios are around 0.2 which seems relatively low compared to the well-known rule of thumb of 0.4 for the overall stock market. But I am really speculating here and haven’t calculated the Sharpe ratio of the US market with the same methodology that the authors use here. In addition to that, the top-decile portfolios require quite some turnover. For example, the annual turnover of 1.476 for the Gradient Boosting portfolio indicates that about 73.8% of the portfolio must be rebalanced at the start of the year.3Sell funds worth 73.8% and replace them with new ones yields a total turnover of 2 x 73.8% = 147.6% or 1.476 in decimal notation.
Overall, the results definitely point in the direction that fund investors can actually use machine learning to identify outperforming funds. Despite different methodology, the results are also within the same ballpark as those of Kaniel et al. (2022) who also find annual alphas in the range of 1-2%. While both papers are not yet published, I think this is promising and suggests that at least the core of the results is not data-mined.
Past performance and measures of activeness are the most relevant variables
In the next step, the authors examine the most relevant variables of their prediction with SHAP values, a common technique for machine learning interpretability. To be honest, I would be very careful with specific statements about marginal effects or, god beware, causality. Nonetheless, I think it is useful to see what actually drives the prediction and the following two charts summarize the most important variables and interaction effects.
For the two machine learning models (Gradient Boosting and Random Forests), the most important variables are some combination of past performance characteristics (t-stats and coefficients from the factor analysis, alphas, and value-added) and measures for activeness.4The authors use R2, the R-squared of the factor regression, as proxy for the activeness of a fund. The lower the R2, the less of a fund’s return can be explained by well-known factors. Funds with lower R2 should therefore be more active. While being important on a standalone-basis, the author particularly highlight that the value of the machine learning prediction comes from considering interaction effects among the variables. As the second charts above show, the interaction of measures for past performance and fund activeness exhibit SHAP values that are multiples higher than the standalone variables. For example, the authors explain that strong past performance is generally a positive predictor for future performance, but particularly strong among the most active funds.
Once again, those results are pretty similar to those of Kaniel et al. (2022). They also find that measures for past performance are the most important predictors and additionally consider a macroeconomic sentiment-variable which is missing here. They also stress the importance of interaction effects in predicting future alpha which explains the success of machine learning models. A notable difference between the two papers, however, is the role of fund activeness. This week’s authors put a lot of emphasize on measures for activeness as those are well-established predictors of future performance in the literature. For Kaniel et al. (2022), activeness plays only a minor role or is not considered at all.
Given their alpha, machine-selected funds remain too small
In one of the last sections, the authors use their results to examine the relation between fund manager skill and assets under management (size). I have already written about this in AgPa #35 and the idea is pretty simple. Even if you have an insanely profitable investment process, it naturally comes with capacity limits. Meaning you cannot employ unlimited amounts of capital because ultimately, you drive up (or down) the price until the profit opportunity disappears. As a conseuqence, Berk & van Binsbergen (2015) argue that successful and skilled fund managers eventually attract so many assets that they can no longer outperform because of diseconomies of scale related to their size.
The authors use their data to estimate an efficient frontier between manager skill and size. In English: if investors are not stupid, which is a reasonable assumption, they should spot good managers and reward them with more assets because they obviously want to participate at their skill.5If you know a guy who reliably makes good returns, you want her to manage your money… With some theoretical background from the corresponding literature, you can create a model that gives you statements like “A manager with a skill of X, should approximately manage assets of Y”. I understand that this sounds abstract, but fortunately, we don’t need to go deeper into it.
The key point of the authors is much simpler. Once they throw their top-decile portfolios into this skill-size relation, there is a very clear result. The outperforming funds from the machine learning selection are too small, given their managers’ level of skill. This indicates that investors apparently do not spot the outperforming funds and that the machine learning approach actually reveals novel patterns. It also explains the persistence of the performance over time. The best thing you can do to keep an outperforming process alive is to limit its capacity and don’t impact prices too much.6The Medallion Fund of Renaissance Technologies is probably the most famous example for that.
Conclusions and Further Ideas
I already mentioned it several times, but I have to repeat it for a proper conclusion. This week’s paper is very similar to the work of Kaniel et al. (2022) that I presented last week. The two author teams use the same data, follow the same overall target, and the methodological differences are really in the details. The most important difference, in my opinion, is that this week’s authors use multiple machine learning models and explicitly exclude the Neural Networks on which Kaniel et al. (2022) solely rely on. Apart from that, this week’s paper is probably somewhat more interesting for practitioners because the authors use more realistic assumptions.
Since neither of the two papers is has yet been published in a peer-reviewed journal, I cannot give you a qualified opinion which one is better. I also don’t know which of the two author-teams first came up with the idea of Machine-Learned Manager Selection. But this is also not relevant for me, because I believe when we have two more-or-less independent studies about the same topic, we should always look at both of them.
I have two more papers on Machine-Learned Manager Selection on my list, so it is too early for an overall conclusion. But as I also mentioned several times throughout the article, I think it is already a very good sign that two papers which examine the same objective with different methodology arrive at generally similar results. Obviously, this is the way science works and in a chaotic environment like financial markets we need as much of such robust evidence as possible.
- AgPa #83: How Much of the US Market is Passive?
- AgPa #82: Equity Risk Premiums and Interest Rates (2/2)
- AgPa #81: Equity Risk Premiums and Interest Rates (1/2)
- AgPa #80: Forget Factors and Keep it Simple?
This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). All content that is not my intellectual property is marked as such. If you own the intellectual property displayed on this website and do not agree with my use of it, please send me an e-mail and I will remedy the situation immediately. Please also read the Disclaimer.
Endnotes
1 | As you can see, even the academics need marketing… |
---|---|
2 | To avoid any confusion, I will henceforth refer to last week’s post and paper as Kaniel et al. (2022). |
3 | Sell funds worth 73.8% and replace them with new ones yields a total turnover of 2 x 73.8% = 147.6% or 1.476 in decimal notation. |
4 | The authors use R2, the R-squared of the factor regression, as proxy for the activeness of a fund. The lower the R2, the less of a fund’s return can be explained by well-known factors. Funds with lower R2 should therefore be more active. |
5 | If you know a guy who reliably makes good returns, you want her to manage your money… |
6 | The Medallion Fund of Renaissance Technologies is probably the most famous example for that. |