AgPa #49: Machine Learning in Quant Asset Management

How Can Machine Learning Advance Quantitative Asset Management? (2023)
David Blitz, Tobias Hoogteijling, Harald Lohre, Philip Messow
The Journal of Portfolio Management Quantitative Tools 2023, URL/SSRN

After covering interpretability of machine learning in the last post, this week’s AGNOSTIC Paper is a much broader overview. The authors outline the benefits and pitfalls of machine learning compared to “traditional” econometrics and present several use cases in the world of (quantitative) asset management. They also provide ideas for research governance to keep those powerful methods under control.

Everything that follows is only my summary of the original paper. So unless indicated otherwise, all tables and charts belong to the authors of the paper and I am just quoting them. The authors deserve full credit for creating this material, so please always cite the original source.

Setup and Idea

The combination of widely available data and ever-growing cheap computation power lead to the rise of machine learning and other advanced analytics we currently all experience.1I know, we all heard this phrase way too often… It is not surprising that the competitive world of investment management also adapted to those methods. Data-intense strategies like High Frequency Trading have probably been using machine learning for quite some time. Over the last few years, however, the “main stream” of low-frequency quantitative managers finally also caught up. As I mentioned last week and in my post on Big Data & Machine Learning in Asset Management, using machine learning models in quantitative investing is the logical next step and a natural innovation of factor investing. Having said that, realistic expectations are important as financial markets are a much more difficult playground than image recognition…

This week’s authors take the perspective of a (low frequency) quantitative asset manager and examine how machine learning could (or could not) advance their craft in the future.

Data and Methodology

The paper provides a mostly qualitative overview of machine learning in quantitative asset management. Therefore, there is no specific Data and Methodology to discuss at this point.

Important Results and Takeaways

Benefits and pitfalls of machine learning in finance


According to the authors, machine learning models come with three major advantages over “traditional” econometrics. First, they are mostly data-driven and can identify the most important variables from many inputs by themselves. Second, with the exception of choosing the algorithm and hyperparameters, machine learning approaches are “model-free”. We can therefore test hypothesis without limiting ourselves to, for example, a linear relation between variables.2Needless to say, those first two benefits come with the risk of overfitting. Third and finally, machine learning models tend to be more forward-looking and are explicitly designed for prediction. While most traditional econometric approaches are evaluated based on in-sample fit, machine learning models are judged by the accuracy of predictions within out-of-sample data. This reduces biases and is exactly what we need in investing.


The main pitfalls of machine learning in asset management are the three general problems of financial markets.3See Israel et al. (2020) and my post on the issue. First, by the standards of machine learning, we don’t have much data.4Of course, High Frequency Trading is the exception here. Most investors, however, don’t have access to such strategies. Even if you record monthly returns for all of the roughly 9,000 stocks in the world over the last 100 years (which is unrealistic and hardly possible), those are just 10.8M data points. Compared to the samples of other machine learning applications like image recognition this is simply not much data. Unfortunately, we cannot even solve this problem because the only way to get more returns is waiting.

Second, competition among investors eliminate almost all predictability in financial markets. If there is an evident way to make money, people will use it until it is no longer evident… This is a problem as machine learning models are especially powerful in domains with high signal-to-noise ratios. Security prices don’t offer this. Identifying a cat on a picture is much easier than picking an outperforming stock. And if something is already difficult for humans, it is going to be even more difficult for machines.

Third and finally, there are no static rules in markets. If a machine actually detects a profitable pattern, it could disappear over time because exploiting the strategy corrects prices. Israel et al. (2020) provide the wonderful metaphor that in financial markets, cats morph into dogs as soon as the algorithm figured out to detect cats.

Use cases of machine learning in asset management

There are many ways to use machine learning in asset management, the most attention, however, receives the prediction of stock returns. In general, machine learning models tend to work best with a large set of predictor variables to fully exploit their advantage over traditional econometric methods.

The authors provide an overview of several studies that use machine learning to predict stock returns by using many (>50) input variables. The results tend to be better than for traditional econometrics, but the authors caution that research must follow a strict protocol to avoid overfitting and unrealistic results. They also stress that many machine learning models produce trading strategies with intense turnover which may not be implementable in practice after trading costs.

Other applications of machine learning aim on characteristics that are subsequently used to predict asset returns instead of the returns themselves. This usually mitigates the problem of signal-to-noise ratios. The authors cite examples that use machine learning to predict earnings surprises, profitability, and other corporate events. Other examples are studies that use machine learning algorithms as advanced non-linear valuation models. To make money from such applications, the machine-learning-engineered signal must of course still predict future returns…5Or you just sell the signals to asset managers who use them however they want and make your money as a data provider…

Another important area of machine learning is the creation of non-traditional or “alternative” data. The most important subfield in this respect is probably Natural Language Processing, a collection of methods to systematically analyze unstructured text data. I think since the launch of ChatGPT, this doesn’t need much introduction. There are wide-ranging applications and many quantitative investors nowadays process most of the text that is out there (company reports, news, press releases, …).

Finally, the authors explain that machine learning currently also finds its way to other asset classes and disciplines. Return prediction for stocks is still the most active area in academia, but there are more and more studies that apply machine learning to fixed income markets and other tasks like portfolio construction or trade execution.

Keeping it under control: research governance and protocol

Following a research protocol is generally important, but the sophisticated nature of machine learning models turbo-charge the need for discipline. It is very easy to run into issues of data mining and overfitting, so the authors present several important points to consider.

For example, researchers should always follow a plausible motivation when doing research (finding the truth, not selling something) and carefully document what they are doing. The latter is important to mitigate multiple testing, a problem that arises when researchers test many specifications and inevitably find a profitable strategy by chance. There are methods to cope with that and the 27th fine-tuned specification of a model requires higher standards for statistical significance than the first one. In addition to that, the authors argue that in the world of quantitative asset management nothing but live performance ultimately decides about a good or bad model. Unfortunately, we can only test this by waiting…

The final and often neglected point is the culture within a research department. In my opinion, much of what is labeled “research” in the investment industry is actually “reasonably sophisticated marketing”. To avoid the pitfalls of machine learning, it is therefore very important that the culture of the firm actually empowers rigor research and is not (only) focused on selling their products.

Conclusions and Further Ideas

The authors conclude their overview by underlining the current consensus of the literature. Machine learning is much more an evolution than a revolution. It will certainly improve many existing aspects of quantitative investing and investors who don’t use it will most likely lose their edge over time. On the other hand, we shouldn’t expect wonders from machine learning in the “main-stream” low-frequency space. The key difference between financial markets and other applications like image recognition is the difficulty of the task. Recognizing cats on pictures is not very difficult, so it is reasonable that a sophisticated machine achieves similar accuracies like humans. Identifying an outperforming stock, in contrast, is even for experienced professionals quite difficult.

So in summary, machine learning doesn’t work well in asset management when compared with the breakthroughs in other disciplines. However, if we appropriately compare it with “traditional” human fund managers, first evidence suggests that machine learning approaches perform actually better. So we shouldn’t expect wonders from machine learning, but there is also no reason why a smart model cannot be better than a human portfolio manager…

This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). All content that is not my intellectual property is marked as such. If you own the intellectual property displayed on this website and do not agree with my use of it, please send me an e-mail and I will remedy the situation immediately. Please also read the Disclaimer.


1 I know, we all heard this phrase way too often…
2 Needless to say, those first two benefits come with the risk of overfitting.
3 See Israel et al. (2020) and my post on the issue.
4 Of course, High Frequency Trading is the exception here. Most investors, however, don’t have access to such strategies.
5 Or you just sell the signals to asset managers who use them however they want and make your money as a data provider…