AgPa #68: Machine-Learned Manager Selection (4/4)

A Cross-Sectional Machine Learning Approach for Hedge Fund Return Prediction and Selection (2021)
Wenbo Wu, Jiaqi Chen, Zhibin (Ben) Yang, Michael L. Tindall
Management Science 67(7), URL/SSRN

The fourth and at least for the moment final AGNOSTIC Paper on Machine Learned Manager Selection. After examining equity mutual funds in the last three papers, this week‘s authors provide an interesting out-of-sample test and explore machine learning models for selecting hedge funds.1To be fair, this week’s paper is actually older than the last three.So the mutual funds are the out-of-sample test of the hedge funds, not the other way around. Importantly, this week‘s paper appeared in one of the leading business journals already back in 2021. This increases the likelihood that the results are actually robust and strengthens the evidence.

  • Week 1: US Mutual Funds – Alphas
  • Week 2: US Mutual Funds – Long Only
  • Week 3: US Mutual Funds – Total Returns
  • Week 4: Hedge Funds

Everything that follows is only my summary of the original paper. So unless indicated otherwise, all tables and charts belong to the authors of the paper and I am just quoting them. The authors deserve full credit for creating this material, so please always cite the original source.

Setup and Idea

The setup and idea is pretty much the same as for the last three papers. Given the successful applications of machine learning in many areas of portfolio management, it is just logical to explore it in other asset classes. Hedge funds are a natural candidate for a variety of reasons. First, there is generally more dispersion among hedge fund returns which makes aktive fund selection more attractive and necessary. Second, there is not such a clear passive alternative for hedge fund investors like there is for equities. Again, this makes hedge fund selection more attractive and necessary. Third, hedge funds are usually only available to institutional investors or fund-of-fund managers. Given that these types of investors may be somewhat more sophisticated, we could see faster adoption of machine learning in this space.

Apart from those specific characteristics of hedge funds, the setup of the paper is quite similar to the equity mutual funds. The authors collect a sample of funds and try to predict their future returns with various machine learning models and a set of fund characteristics.

Data and Methodology

The data comes from Hedge Fund Research, a standard data vendor in the hedge fund world. Most importantly, this database accounts for survivorship-bias which is probably even more important for hedge funds than for other asset classes. Over the sample period from January 1994 to December 2015, the database covers a total of 23,762 funds of which only 7,327 are currently still alive. After applying various filters, the authors get a final sample of 3,193 hedge funds of which 1,237 are live and 1,956 are graveyarded at the end of 2015.2Specifically, they exclude funds that don‘t report monthly net-of-fee returns, funds with a history shorter than 3 years, funds that didn’t reach $5M in assets under management, and all fund-of-funds. They also use common measures to mitigate other issues with hedge fund data like backfill-bias and illiquidity.3Backfill-bias is the potential problem that only successful hedge funds voluntarily report their past performance to databases. This makes the history look better than it actually is.

Table 1 of Wu et al. (2021).

The table above shows a few summary statistics for the sample across four different categories of hedge funds. I mostly included this table because I don‘t have any particular insight in the hedge fund industry and it is therefore interesting for me to get a sense of it. I think the single best word to summarize this table is heterogeneity. As their returns, hedge funds range from very large (AuM up to almost $66B) to very small, from very expensive to quite fair, and from very young to established over more than 20 years.

Comparing the Means and Medians of the AUM (M$) column along the categories also reveals that the hedge fund industry is a game of extreme concentration. There are apparently many small funds, but a few very large ones dominate the distribution and increase the mean way beyond the median.4More about such positive skewness below This is pretty consistent with what we read about hedge funds in the press. The journalists typically highlight the few very successful firms that manage most of the capital (and earn most of the fees).

Table 2 of Wu et al. (2021).

The second table includes the features in the authors‘ machine learning models to predict future performance. They group them into two general categories, but mostly focus on the Return-Based Features. The authors argue that those should ultimately reflect the relevant information about funds in a reasonably efficient market and ignore other fund characteristics entirely. They also explain that Return-Based Features are easily available for all funds and the analysis thus suffers from less missing values. With respect to the variables, we find the typical past performance, correlation, risk, and alpha figures. I think most of them should be pretty self-explanatory. The Macro-Derivative Features are simple regression coefficients of hedge fund returns and the respective macro time series.

To convert those features into return forecasts, the authors use least absolute shrinkage and selection operators (LASSO), random forests (RF), gradient boosting (GB), and neural networks (DNN). They train those models with rolling 12-month data and predict 3-month hedge fund returns out-of-sample. The main reason for the 3-month horizon is that one quarter is the most common lock-up period among hedge funds. At each rebalancing date, the authors rank the funds according to the return forecasts and sort them into equal-weighted decile-portfolios. The monthly net-of-fee performance data of those portfolios from January 1999 to December 2015 are the basis for the following results.5This is shorter than the overall sample period which starts in 1994. This is because the authors ignore some performance history to avoid backfill-bias and require enough data to calculate features and train their models.

Important Results and Takeaways

Machine learning helps to identify outperforming hedge funds

In the first step, the authors report out-of-sample performance statistics of the top decile-portfolios across the four categories of hedge funds. Except for the Macro funds, which I will comment in more detail below, the results are very promising. Throughout the remaining categories, annual returns (AR) are considerably higher for the Machine Learned Manager Selection than for the HFRI hedge fund benchmark. The returns are also economically strong, as all of them are above 10% per year.

Those attractive returns come with higher volatility (Sd) than the the benchmark, but the outperformance more than compensates the additional risk. Sharpe (ShR), Sortino (SortR), and Information ratios (IR) are all consistently higher than for the HFRI benchmark. Once again, the ratios are not only larger, they are also economically meaningful. For example, the Equity Hedge Sharpe ratios are all around 1 for the machine learning models and even >2 for Relative Value funds. The authors also find consistently positive monthly alpha against a 7-factor model that is specifically designed to evaluate hedge fund performance.

By and large, these results are very impressive and definitely suggest that machine learning is useful for hedge fund selection. Nonetheless, I am always a bit skeptical with such outstanding backtests. However, I think the authors do consider the biggest challenge of practical implementation by using net-of-fee returns. In addition to that, the paper appears in Management Science, a leading business journal, which should ensure that there are no major technical errors. We should probably still expect lower returns going forward, but I think these very strong backtests leave enough room for some real-world decay.6I would definitely invest in a Relative Value strategy that makes “only” 15% per year at a Sharpe ratio of 1.5 instead of the 18.79% and 2.3 in the neural network (DNN) backtest…

Table 3 of Wu et al. (2021).

As mentioned before, the results are somewhat different for Macro hedge funds. While the machine learning predictions yield better annualized returns, they generate substantially more volatile portfolios. As a result, the top decile-portfolios of Macro funds underperform the HFRI benchmark across virtually all risk-adjusted measures (Sharpe, Sortino, and Information ratio). The authors explain this as follows. The machine learning models apparently select very similar macro strategies such that the overall portfolio suffers from under-diversification and concentration issues. They argue that more sophisticated portfolio construction should mitigate these issues and also improve the risk-adjusted returns in the macro hedge fund space. They don’t provide such an analyses, however.

Table 4 of Wu et al. (2021).

In good scientific fashion, the authors stress-test these results in various ways. For example, the table above shows the annualized returns of all ten decile-portfolios. Needless to say, if the machine learning selection indeed “works”, D10 should be better than D9, D9 better than D8, and so on. Although test-statistics are unfortunately missing, the data broadly follows this pattern. There are sizable return spreads between the D10 and D1 portfolios, suggesting that machine learning predictions indeed separate winners from losers. With some exceptions, the annualized returns also increase along the deciles. The relation is of course not perfect, but there are notable differences in the right direction for most portfolios. The authors further conclude that non-macro funds (and in particular Relative Value) are more predictable than macro funds.

Table 7 of Wu et al. (2021).

Next, the authors examine different holding periods and change their forecast and rebalancing horizon from 3 to 6, and from 3 to 12 months. The table above summarizes the results and the percentage change of each specification to the 3-month base case. I will obviously not comment on each number, but looking at the final Avg. column shows an interesting pattern. The performance mostly persists or declines only mildly for the slightly longer 6-month horizon. Moving to 12 months, however, looks different. Some specifications still generate almost the same performance, but for others, the longer horizon leads to performance decays of up to 1/3 compared to the 3-month base case.

Overall, these results suggest that the Machine Learned Manager Selection requires a certain degree of turnover and that a 12-month horizon is probably too long. On the other hand, the mostly insignificant differences between the 3 and 6-month horizons further increase the evidence that the machine learning models actually find some meaningful patterns over shorter horizons.

Risk measures and VIX-correlations are the most important features

Finally, the authors also examine the underlying drivers of their machine learning predictions. For that purpose, they use the interpretability of the random forest algorithm and determine the most important features. The chart below summarizes the results.

Figure 6 of Wu et al. (2021).

Throughout the specifications, there are two clear winners: kurtosis (Kurt) and the funds’ regression coefficient with respect to the VIX index (VIX). Those two are followed by skewness (Skew), Sharpe ratios (SR), and the auto-correlations of the funds’ own returns (ACF1-3). Interestingly, stand-alone past performance (R1-9) seems to be not as important as risk statistics or measures related to the operational characteristics of the fund. Apart from that, the chart also suggests that it definitely makes sense to go beyond Macro-Derivatives and also include Return-Based Features. In my opinion, this is quite obvious and probably no hedge fund selector in the world would invest in a fund without examining its individual characteristics beyond correlations with macro time series.

Given their strong victory over the other features, let me briefly go over kurtosis and skewness again.7I do this primarily for me because I needed a refresher. My basic statistics class was in 2016… Skewness tells us something about the shape of a distribution. The skewness of a normal distribution is zero because it is perfectly symmetric. Somewhat more nerdy, skewness of zero implies that the mean, median, and mode (the most common observation) are all equal. Positive skewness indicates that the mean is larger than the median and mode. Similarly, negative skewness comes with a mean that is smaller than the median and mode.

What does this mean for return distributions? A strategy with positively skewed returns doesn’t make much money most of the time, but has a few very strong months that make up most of its average returns. The composition of the stock market is a perfect and very extreme example for that. A negatively skewed strategy, on the contrary, makes good money most of the time, but occasionally loses heavily. The prime example for this is momentum. You gradually make money as the trend continues, but lose heavily when it suddenly reverses.

Kurtosis, in contrast, tells us something about the tails of a distribution. The higher (lower) the kurtosis, the more (less) observations fall within the tails.8But kurtosis doesn’t tell us if more observations are in the left or right tail. That is skewness… Stock returns, for example, typically come with higher kurtosis than a normal distribution. Why? Because both extreme negative and positive returns are more common than normally distributed returns would suggest. The distribution of stock returns therefore has “fat tails”.

Against this background, I think it makes intuitive sense that skewness and kurtosis are important features in the machine learning predictions. Many investment strategies have defining return distributions (skewness and kurtosis) which hardly change over time. For example, if you invest in trend strategies like momentum, you should be prepared to live through occasional crashes when the trend suddenly reverses. That is just part of the deal and it seems reasonable that the machine learning models consider such characteristics in their predictions.

Conclusions and Further Ideas

The authors present a few more analysis which I haven’t included here as they are more theoretical. For example, they show in more detail how machine learning models outperform simpler prediction methods like linear regression. This is undoubtedly a nice academic motivation for their work, but for practitioners it is probably quite obvious.

Overall, I think the results for Machine Learned Manager Selection among hedge funds are pretty in-line with the more recent results for equity mutual funds in AgPa #65, #66, and #67. To close this series, let me therefore again summarize what I believe are the most important common points of the four papers.

  • In general, there is convincing empirical evidence that machine learning algorithms help to select outperforming asset managers. Portfolios of the funds with the highest out-of-sample return forecasts generated meaningful total returns and risk-adjusted performance from 1980 to the late 2010s.
  • These results are robust to various different input variables, machine learning models, forecast horizons, weighting-schemes, and fee structures. The overall patterns also hold within both equity mutual funds and hedge funds.
  • Capitalizing on the predictions doesn’t necessarily require shorting of funds which is hardly possible in practice. Investable long-only portfolios also generate attractive performance in backtests.
  • The most important features to predict future fund performance tend to be fund characteristics related to past performance, risk, and skill of the manager. Correlations of fund returns to macro variables like the VIX or sentiment indicators also seem to be relevant.
  • Despite those promising results, there is alpha decay over time. This isn’t too surprising as backtest for Machine Learned Manager Selection by definition suffer from a methodological look-ahead bias.9Most of us couldn’t train a neural network in the 1990s and trade on its predictions. Despite those promising results, expected returns going forward are therefore most likely lower than in the backtests.

I think the general takeaway with respect to Machine Learned Manager Selection remains the same as I have written in my general post about machine learning in portfolio management almost two years ago.10Saying this seems somewhat stubborn and arrogant. I am ready to change my mind, but I think the three arguments discussed in this post are just too compelling. We definitely need to use those models to remain in the game and they are certainly better than the existing simpler methods or what many humans are currently doing. But they most likely won’t be the holy grail that prints money forever. Manager selection is an active strategy and as such, many smart and greedy people compete for the same outperformance. Obvious patterns to make money will vanish quickly and just like for us human fund selectors, this makes it also difficult for the machines to spot profitable patterns.

However, I do believe there is one interesting aspect of Machine Learned Manager Selection compared to applications in stocks, derivatives, or bonds. It should be much easier to implement the predictions of your model. Trading and rebalancing a portfolio of funds is probably easier than trading a long-short equity portfolio with a few hundred positions. You still need some capital to get into the most attractive share classes or to invest in hedge funds at all, but this should be easier than building the infrastructure to efficiently trade single securities at scale. On the other hand, this advantage also suggests that profitable patterns should be exploited more quickly. For most investors, however, I do believe that the former argument dominates. Let’s see what we make out of it and when the first machine learning fund-of-fund hits the market.



This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). All content that is not my intellectual property is marked as such. If you own the intellectual property displayed on this website and do not agree with my use of it, please send me an e-mail and I will remedy the situation immediately. Please also read the Disclaimer.

Endnotes

Endnotes
1 To be fair, this week’s paper is actually older than the last three.So the mutual funds are the out-of-sample test of the hedge funds, not the other way around.
2 Specifically, they exclude funds that don‘t report monthly net-of-fee returns, funds with a history shorter than 3 years, funds that didn’t reach $5M in assets under management, and all fund-of-funds.
3 Backfill-bias is the potential problem that only successful hedge funds voluntarily report their past performance to databases. This makes the history look better than it actually is.
4 More about such positive skewness below
5 This is shorter than the overall sample period which starts in 1994. This is because the authors ignore some performance history to avoid backfill-bias and require enough data to calculate features and train their models.
6 I would definitely invest in a Relative Value strategy that makes “only” 15% per year at a Sharpe ratio of 1.5 instead of the 18.79% and 2.3 in the neural network (DNN) backtest…
7 I do this primarily for me because I needed a refresher. My basic statistics class was in 2016…
8 But kurtosis doesn’t tell us if more observations are in the left or right tail. That is skewness…
9 Most of us couldn’t train a neural network in the 1990s and trade on its predictions.
10 Saying this seems somewhat stubborn and arrogant. I am ready to change my mind, but I think the three arguments discussed in this post are just too compelling.