Big Data & Machine Learning in Asset Management

This week I gave a (virtual) talk on “Big Data and Machine Learning in Asset Management” at Goethe-University in Frankfurt. Thanks again to my thesis-supervisor Sasan Mansouri for the invitation. In this post I will summarize a few points of the talk and share the slides. The key result is a framework to evaluate investment strategies that claim to use big data and machine learning. I also apply this framework to several real world funds. The following list summarizes the agenda and you can use it to navigate.

Big data and machine learning

Big data and machine learning are two of the most hyped buzzwords from the last decade. So it was only a matter of time until they found their way into finance and the asset management industry. But what is big data and how is it different from normal data? Unfortunately, there is no clear answer to that. However, there is a consensus that big data is somehow beyond the standard techniques for data processing and analysis. Let’s just stick to this definition for the rest of this post. When I refer to big data, I mean data that is not just a big spreadsheet of numbers but requires some new methodology (for example unstructured text data and natural language processing).[1]Sagiroglu, Seref, and Duygu Sinanc, 2013, Big data: A review, in 2013 International Conference on Collaboration Technologies and Systems (CTS).[2]Martin, Ian, and Stefan Nagel, 2019, Market Efficiency in the Age of Big Data, National Bureau of Economic Research Working Paper 26586.[3]Goldstein, Itay, Chester Spatt, and Mao Ye, 2021, Big Data in Finance, National Bureau of Economic Research Working Paper 28615.

A proper definition of machine learning is somewhat easier but still messy. Basically, the term refers to a collection of advanced algorithms and models. A defining element of these models is their ability to overcome previous limitations of statistical modelling by handling many inputs and non-linear relations. Another important aspect is forecasting. Traditional statistics are mostly concerned with statistically significant effects between variables, whereas machine learning aims for the best possible forecast out-of-sample. This makes it very interesting for investing because ultimately, our job as investors is to collect information and somehow process it into forecasts and decisions. Machine learning was created to do this, so it may be of help.[4]Bartram, Söhnke M., Jürgen Branke, and Mehrshad Motahari, 2020, Artificial Intelligence in Asset Management, CFA Institute Research Foundation Literature Review.[5]Bartram, Söhnke M., Jürgen Branke, Giuliano De Rossi, and Mehrshad Motahari, 2021, Machine Learning for Active Portfolio Management, The Journal of Financial Data Science 3, 9–30.[6]Gu, Shihao, Bryan Kelly, and Dacheng Xiu, 2020, Empirical Asset Pricing via Machine Learning, The Review of Financial Studies 33, 2223–2273.

More than just risk and return

Before looking at applications of big data and machine learning in asset management, let’s first examine the underlying process. When I tell normal people (by “normal”, I mean without a nerdy interest in finance) from my job, the typical answer is something like, “Fine, and what are you doing all day long?”. After explaining it with a well-known equity-index, the next reaction is usually, “Got this. Do you have a tip where I should invest my money right now?”. And this reaction is absolutely okay, because predicting returns and thinking about investment opportunities is a large and important part of what we are doing. But it is not everything. In fact, it is only the first of 3 steps that are (in my opinion) required for a successful investment strategy.[7]Bartram, Söhnke M., Jürgen Branke, and Mehrshad Motahari, 2020, Artificial Intelligence in Asset Management, CFA Institute Research Foundation Literature Review.[8]Bartram, Söhnke M., Jürgen Branke, Giuliano De Rossi, and Mehrshad Motahari, 2021, Machine Learning for Active Portfolio Management, The Journal of Financial Data Science 3, 9–30.

  • Return Prediction and Portfolio Construction
  • Implementation, Trading, and Execution
  • Risk Management and Portfolio Monitoring

As mentioned above, Return Prediction and Portfolio Construction is all about the question of which securities to buy, at which point in time, and how to combine them into a portfolio. No matter which strategy you are using, at the end of this step you should have a list of securities with corresponding portfolio weights. Applications of big data and machine learning are both evolutionary and revolutionary at this step. For example, some traditional investors also use natural language processing to analyze business reports more efficiently. On the other hand, a few revolutionary funds are already completely managed by models. Human managers only decide about the process but not about the final investments.

Once you figured out your desired portfolio, you need to buy and sell the securities. Although it may sound obvious, portfolio implementation is often ignored in academic papers and backtests. Most of the time, they just assume that securities can be bought and sold at closing prices.[9]I am not better than the rest and also did this in my master thesis. I don’t need to tell you that this is not true in practice. In fact, if you are managing enough money your orders will distort prices and trading suddenly becomes quite costly (this is called market impact). To avoid such costs, you need a proper execution strategy. This second step of asset management is all about finding the best one and shouldn’t be neglected. Many strategies look nice in the backtest but don’t survive real world trading because of market impact or other costs. On the other hand, perfect execution will not turn a bad strategy into a good one. So it is required but certainly not sufficient.

After trading the desired securities, we are not finished. Markets are dynamic and you need to constantly monitor your portfolio and adjust it if the situation changes. There are many types of risks and the challenge in this final step of the process is to keep them under control while preserving the original investing strategy.

For me, those are the three major steps of asset management on a stylized level. Big data and machine learning will affect all of them. However, some offer more potential than others. This is because of the following three problems with machine learning in finance that are highlighted by Israel et al. (2020).[10]Israel, Ronen, Bryan Kelly, and Tobias Moskowitz, 2020, Can Machines “Learn” Finance?, Journal Of Investment Management 18, 23–36. The follwoing paragraph is to large extend inspired by their paper.

Investing is harder than image recognition

Think about the breakthroughs of machine learning. Some years ago, there were the algorithms that defeated human champions of chess and Go. Right now, advances in image recognition allow us to unlock our phones by just looking at them. And the next big thing, autonomous driving, is currently in the making. All these applications share three characteristics. First, there really is a lot of data for machines to learn. Second, the tasks are quite predictable and are hardly influenced by chance. Third, there are some reliable rules that don’t change over time. Unfortunately, many areas of asset management and finance in general are very different.

Problem #1: Small Data

In comparison to other machine learning applications, finance is mostly an environment with small data. For example, the MSCI ACWI IMI index[11]More information about this index from MSCI. (one of the broadest equity indices) had 9,189 constituents at the time of writing this. Even if you record daily prices for those stocks, you will only get 2,297,250 data points per year (assuming 250 trading days). Compared to the billions of photos that are processed in leading image recognition models, that is not much. And this is even an optimistic estimate since most asset managers don’t trade daily and many of the 9,189 companies are too small for them anyway.

The lack of return data makes it difficult to use machine learning for return prediction or risk management. The breakthrough machine learning models of other fields are often trained on billions of examples and we simply can’t do this with returns. That said, I don’t think it is hopeless and I am deeply convinced that machine learning will produce better forecasts of risk and return than traditional methods. But we cannot expect the same extraordinary results as in chess, Go or image recognition.

Of course, there are exceptions like high-frequency trading. Once you record prices of the 9,189 stocks per second, you end up with about 49.6 billion data points per year (assuming 6h trading days). From a pure data perspective, Implementation, Trading, and Execution is therefore probably the most promising area for machine learning within asset management. However, high-frequency-trading is only a small niche. For most other areas, the problem of too little data exists and will continue to do so in the future (unfortunately, the only way to get more price data is to wait).

Problem #2: Low Signal-to-Noise Ratios

Let’s go back to the early breakthroughs. Chess is admittedly a very complicated game, but it is also perfectly predictable. There is a fixed set of rules and at each point you know all available options of both players. Furthermore, if you move your figure one space forward it will arrive there. There is no randomness at all. Signal-to-noise ratios capture this predictability of a system. For chess, this ratio is virtually infinity because there is no noise at all. But for financial markets and asset management, this is again very different.

Think about all the news, social media posts, and analyst comments that pop up every day. Probably most of it is completely useless and even we humans have trouble to identify what is really important. For algorithms, this is even more difficult, and you don’t want your models to act on every stupid comment. In addition to that, competition is also a big problem for signal-to-noise ratios. I don’t want to open the discussion about market efficiency, but we have to agree that financial markets are very competitive. Everyone wants to make money and if there is an easy way of doing it someone will go for it. While this is nice for whoever is making the money, it lowers the signal-to-noise ratio and makes financial markets very hard to predict. Since this is a direct result from competition among investors, we have to live with this problem, and it makes applications of machine learning far more difficult than in other fields. Again, I don’t say it is hopeless, but identifying an outperforming stock is way harder than identifying pictures of cats.

Admittedly, this problem is again mitigated for high-frequency trading. For shorter time periods, there may be less noise (simply because not so much happens within a few seconds) and somewhat more predictability. In addition to that, there is a consensus in the literature that risks and volatilities are easier to predict than returns. From the perspective of predictability, the last two steps of the asset management process are therefore better suited for machine learning than Return Prediction and Portfolio Construction.

Problem #3: Evolving Environment

The third and final problem of machine learning in finance is the dynamic character of markets. Once again, let’s go back to applications in other fields. An algorithm trained to identify images of cats will continue to work in the future because the appearance of cats will not change that much (they will continue to have four legs, a tail, and so on). Unfortunately, we hardly have such reliable rules in financial markets. A successful strategy may suddenly stop working without any apparent reason. Even strategies that worked over the long-term (whatever that is) did not outperform in every single year.[12]A perfect example for this is the current multi-year drawdown of the Value factor. For machine learning, such a changing environment is problematic because models need a lot of data to generalize. Think of it as an office worker who needs a year to catch up with software updates that are released every six months. You don’t want such a person to manage your investments.

Another issue is again competition among investors. Suppose you created a successful model that recommends profitable transactions. What do you do? Of course, you use it to make as much money as possible.[13]The worst thing you can do is to tell the world how your model makes money. So I am always surprised by the selflessness of those super talented daytraders who offer their strategies in life-changing online courses. I think you see the irony. But at some point, it will stop working because other investors realized that you became rich and start to do the same thing. Israel et al. (2020) therefore compare machine learning in finance to image recognition but with the catch that cats turn into dogs, once the model becomes successful in detecting cats.

Each step of asset management is different

All three problems affect each step of asset management, but as we have seen, not to the same degree. The following chart provides a summary. As mentioned before, Implementation, Trading, and Execution is the most promising area from a purely data- and application-oriented perspective.

Although Return Prediction and Portfolio Construction is the hardest task, it is still the most important and (in my opinion) also the most interesting application of machine learning in asset management. With realistic expectations in mind[14]For image recognition, advanced models already achieve accuracy levels of above 99%. In finance, levels of 51-55% are already considered very good., I will now give you my framework to evaluate investment strategies that claim to use big data and machine learning.

A framework for investment strategies

The overreaching goal of portfolio management is to “beat the market”. As often as possible and ideally always. There are many strategies that attempt to achieve this, but conceptually, there are only three ways to do it. You can be lucky, you can earn a compensation for liquidity provision, or you have better information and insight than most others.[15]The chart is inspired by Pedersen, Lasse Heje, 2015, Efficiently Inefficient: How Smart Money Invests and Market Prices Are Determined, Figure 3.1.

Counting on luck is obviously not a sustainable long-term strategy, so let’s focus on the other two. Earning a compensation for liquidity provision is at the heart of most high-frequency strategies. Most of them act as market makers and earn a (usually) narrow bid-ask spread for each trade. Applied to billions of trades, those small profits turn into attractive risk-adjusted returns. But there are two problems remaining. First, high-frequency trading is a technological arms race and requires very expensive infrastructure. So most of us can’t play this game. Second, the capacity of high-frequency funds is usually limited. So they are not really useful for large mainstream asset managers with billions or even trillions under management.

Therefore, most of us must rely on the other source of outperformance and try to outsmart other investors. Again, there are different ways of doing that. You can either try to create entirely new information, or you can try to gain new insights from information you already have. For both of them, big data and machine learning offer huge potential.

Let’s start with entirely new information. With the exponential development of big data in recent years, there are now more data sources available than ever before. And investors have used this very creatively. Satellite images of parking lots to predict sales of stores[16]Exemplary reference from BerkeleyHaas., corporate jet movements to identify merger negotiations[17]Paragon Intel is selling the data for this., Amazon prices to construct real-time inflation measures[18]Exemplary reference from Harvard Business School. – and there are many more examples. All of this became known as “Alternative Data”.

Machine learning is often used to process and analyze alternative data, but usually the added value comes from the new information itself. However, there are also approaches that attempt to take advantage of machine learning with conventional data like prices or company fundamentals. Again, there are too many approaches to present here. But from my point of view, two interesting examples are an advanced momentum strategy based on random forests[19]Moritz, Benjamin and Tom Zimmermann, 2016, Tree-Based Conditional Portfolio Sorts: The Relation between Past and Future Stock Returns, PhD Thesis., and a machine learning framework for fundamental analysis and valuation.[20]Hanauer, Matthias X., Marina Kononova, and Marc Steffen Rapp, 2021, Boosting Agnostic Fundamental Analysis: Using Machine Learning to Identify Mispricing in European Stock Markets, Working Paper.

I have only been working in the industry for one year, so I am not in the position to give any advice. But if you ask me about important lessons so far, one of them is, “Find a way for yourself to distinguish marketing non-sense (and there is a lot of it) from real content.” With that in mind, I would like to introduce my framework for investment strategies that claim to use big data and machine learning in their process.

First, there must be a clear idea how and why the strategy should beat its benchmark. Does it use new data sources or is there some novel way to analyze existing data? Or maybe even a combination of the two? These are questions you should ask yourself for any investment strategy, no matter if it is using big data and machine learning or not. But if its using machine learning, I also want to know how the fund managers address the three fundamental problems of finance. What data are they using? Is it sufficiently big? How do they cope with low predictability and changing environments? Perhaps they have identified situations that are more predictable than others? While we can hardly solve the three problems, innovative approaches can certainly mitigate them.

I applied this framework to seven real-world investment funds in my guest talk. So if you want to have some examples, please check the slides at the end of this post. There is already a wide range of funds that address those problems in very different ways. The results are therefore quite interesting.

Conclusion and Downloads

I have written a lot about problems and perhaps expressed a cynical view at some places. But overall, I am very optimistic about applications of big data and machine learning in the asset management industry. However, I want to de-hype the concepts and advocate for realistic expectations. Yes, big data provides a lot of opportunities for valuable new information sources. Yes, machine learning with its advantages regarding non-linearity and inputs will improve forecasting. Yes, traditional investment managers who don’t catch up will probably be left behind pretty soon. But we have these three general problems in most areas of finance and there is no way to solve them. Picking an outperforming stock is not as easy as identifying a cat and probably will never be. That said, there is also no reason why a smart model cannot be better at it than a human portfolio manager.

Download Presentation Slides

This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). Please also read the Disclaimer.