Report Analytics USA #1

This is the first post of an ongoing series intended to share updates, insights, and backgrounds on the Report Analytics USA portfolios. At the time of writing this, the portfolios are already implemented on Wikifolio and serve as a live test of my master thesis. I have a separate post about the strategy which, of course, I encourage you to read. But to make sure everyone is on the same page, I will give you a brief summary below. Those who read the previous post can skip this part and jump directly to the implementation.

In my thesis1Sadlo, Sven-Philip; 2020; Copy-Paste Outperformance: Lazy Investors and Copied Reports; Master Thesis, Goethe University; Frankfurt am Main; URL., I replicated a paper titled “Lazy Prices” (2020).2Cohen, Lauren H., Christopher J. Malloy, and Quoc Nguyen; 2020; Lazy Prices; The Journal of Finance 75, 1371–1415; URL. The authors found a striking empirical pattern: large fractions of 10-Ks and 10-Qs (these are annual and quarterly reports in the US) are just copy and paste from the previous year. For example, if you compare the 2020 and 2019 10-K of Apple, you will notice that they are very similar.

With help of so called text similarity measures, we can do that not only for Apple but for all companies in the US. Think about these measures like a correlation. You put two texts into a computer program which returns a value between 0 and 1. A value of 1 indicates that the texts are completely identical. 0 means that there are no similarities at all. The Jaccard similarity (one of the measures I use in my thesis) for the two Apple reports above is about 0.92. Since this is close to 1, it correctly indicates that the 2020 report is mostly copied from the previous year.

Historically, those copy-paste measures predicted future returns and helped to achieve outperformance vs. common US indices. At least for greedy capitalists like me, it is interesting to see if this is really investable and will continue to work in the future. There are good reasons for it: changes in reports are typically negative and indicate previously undisclosed risks. Why? Because firms are legally required to warn about as many risks as possible. If they become aware of something, they need to include it in the next report. This is different for good news. Firms must of course disclose them properly, but they don’t need to speculate about every potential opportunity.

It turns out that firms which change a lot in their reports underperformed the benchmark, whereas those with a lot of copy-paste outperformed. Since I don’t see any practical way to short a portfolio of 300 stocks (for retail investors like me, this is really difficult), I will begin with the long side.

Copy-paste portfolios

Like in my thesis, I start with all firms in the S&P Composite 1500 Index. Since I cannot afford to pay for constituent data, I use the holdings of several ETFs3Total US Market, S&P 500, S&P 400, and S&P 600. to get all relevant tickers. This is not perfect but works sufficiently well. Next, I use the service from Financial Modeling Prep to get some basic data like company names, ISINs, and current market capitalization (I would be happy about it, but I don’t earn anything for referencing them, I am just a happy customer). Once this data is structured, I run a program that downloads all relevant 10-Ks and 10-Qs, transforms them into plain text, and calculates the copy-paste measures between each report and its previous year’s version. I will repeat this procedure at least four times a year (to capture all published reports) but currently I look at it even more frequently.

To construct portfolios, I sort the 1,500 stocks in the index by their most recent copy-paste measure and divide them into 5 equal portfolios of 300 stocks each. But in contrast to what I presented in the post on the strategy, I don’t use market capitalization for weighting but go with simple equal weights. It is true that equal weighting can lead to liquidity issues, however, I do believe that the strategy can be more profitable with equal weighted portfolios. Why? Because equal weights allocate more money to small- and mid-caps which might trade in somewhat less efficient markets. But let’s have a look at the backtest.

Data and charts are from my master thesis.4Sadlo, Sven-Philip; 2020; Copy-Paste Outperformance: Lazy Investors and Copied Reports; Master Thesis, Goethe University; Frankfurt am Main; URL. “Copy-Paste” are the 20% of stocks with most similar reports measured by Jaccard similarity. Each portfolio consists of 300 stocks. Data ranges from January 1996 to June 2020. Depicted are price returns without dividends in USD. Portfolios are rebalanced monthly.

As expected, the equal weighted portfolios performed way better. However, I would never implement a strategy on that basis. I trust my methodology, but these are just hypothetical portfolios without transaction costs, bid-ask spreads, taxes, and other real-world problems. Therefore, this backtest is probably too optimistic about the magnitude of outperformance. Still, I do believe that the strategy will continue to outperform (but I don’t know by how much), and I think equal weights are at least worth trying. The gap to the value weighted portfolio is very large, so even if we lose something on liquidity, it is still fine.

But why should the “Copy-Paste” portfolios outperform? One idea is that copied reports are like an implicit confirmation: if the business is running fine and there are no new risks, firms just copy the previous year’s report and update the numbers. Therefore, a copied report can be a signal for continuous or even improving fundamentals. What is the data saying? In my thesis, I find that companies with more report copy-paste indeed tend to have higher margins and returns on invested capital in the nearer future. This is in line with the results of the original “Lazy Prices” paper.

To be more robust with the live implementation, I don’t use a single copy-paste measure, but take the average of two different ones. I think this is important to avoid data mining and overfitting since none of the measures is generally better than the other. But maybe, we can improve the strategy even further. As discussed above, the whole thing worked better with small- and mid-caps. Let’s divide the equal weighted “Copy-Paste” portfolio again, but now based on market capitalization. This gives us two different “Copy-Paste” portfolios of 150 stocks each, a “Large” and a “Small” one.

Data and charts are from my master thesis.5Sadlo, Sven-Philip; 2020; Copy-Paste Outperformance: Lazy Investors and Copied Reports; Master Thesis, Goethe University; Frankfurt am Main; URL. Portfolio construction is described below. Data ranges from January 1996 to June 2020. Depicted are price returns without dividends in USD. Portfolios are rebalanced monthly.

Both portfolios outperformed the index, however, the “Small” version was substantially better than the “Large” one (but also more volatile). Now, you can never be completely sure if such a strategy is a real phenomenon or just random noise. Therefore, it is most robust to do both and that is exactly what I did. You find the Report Analytics USA Large and Report Analytics USA Small live portfolio directly on Wikifolio and on this website. I will regularly update them and as soon as possible, I will also add an equal weighted combination of the “Small” and “Large” version. I am curious if the “Small” version survives real-world trading costs. Even just a fraction of the backtest would be a great result.

This was a lot of information. So let me briefly summarize the current investment process of the Report Analytics USA portfolios:

  • Universe: all 1,500 stocks in the S&P Composite 1500 Index that are tradable on Wikifolio.
  • Copy Paste: the 300 stocks with the most copy-pasted reports (relative to the previous year).
    • Large: the 150 largest stocks of this selection by market capitalization.
    • Small: the 150 smallest stocks of this selection by market capitalization.
  • Weighting: generally equal weighted.
  • Rebalancing: at least four times a year, optionally more often.

Implementing via Wikifolio

For professional investors with trading infrastructure, implementing such a strategy is pretty straight forward. For retail investors like me, however, it is not that easy. The first problem are taxes. I am based in Germany, so every investment profit is taxed at 25%. Since this strategy involves a fair amount of trading, these taxes would strongly reduce compounding of returns. Of course, you can offset losses, ask for exceptions to pay taxes annually, or use allowances. But all of this is not really sustainable and a lot of effort. The next problem are direct transaction costs (commissions). Depending on the frequency of rebalancing, this strategy requires at least 500-800 transactions per year. Even with very low and fixed commissions of 1€ per trade, this is too expensive if you just have small amounts of money.

A good way to avoid these problems is Wikifolio. You can check the details for yourself, but the general idea is simple. You have something like a simulator where you can manage virtual portfolios (“Wikifolios”) with real financial data. Within those portfolios, trading is free of commission and there are obviously no taxes on profits. The cool thing about Wikifolio is that those virtual portfolios become investable if enough users are interested. To do this, Wikifolio cooperates with a German bank that issues certificates which are exactly tracking the return of the desired portfolio.6Those certificates are collaterized. So even if the bank defaults, investors should get the current value of their holdings. But please inform yourself. I do not update this page and this is no advice or marketing. This is not as intuitive as holding the stocks directly, but it makes the strategy investable with just one security that is easy to trade. That’s very efficient and in my opinion, it is a great idea and helpful service of Wikifolio (again I would be happy about it, but I don’t earn anything from referencing them either).

Beyond that, Wikifolio also makes my life easier in various other ways. First and foremost, they do most of the ongoing calculations and provide updates in real time. This saves me a lot of time and makes the whole thing fully transparent. I have no influence on performance calculation, I cannot delete bad-performing portfolios, and all my trades are reported and can even be downloaded as Excel file (have fun analyzing them). Finally, Wikifolio is of course a great way to reach a community and make my strategy also accessible to other people.

All of this is great, but obviously not for free. If the portfolio becomes investable, they charge a fixed annual fee of 0.95% and a performance fee (5-30%) set by the portfolio manager. The fixed fee goes completely to Wikifolio and the bank, the performance fee is split between Wikifolio and the portfolio manager. Those fees are comparable to conventional investment funds and way higher than for passive ETFs. Therefore, it is not easy to beat a standard ETF which tracks the index after costs. For this reason, I started my portfolios with the lowest possible performance fee of 5% and we will see how it develops. I think performance fees are fair, but I would love to do it with a lower fixed fee. However, Wikifolio is the only practical option at the moment.

Another aspect are indirect trading costs. As mentioned above, there are no commissions on transactions but that doesn’t mean trading is for free. While implementing the portfolios, I occasionally noticed quite high bid-ask spreads. I am not yet sure if the spreads are actually higher than at the official exchange, so I will monitor this closely. I am also planning to do a more detailed analysis on fees, spreads, and their impact on performance. But for that, I need a longer history and a larger number of executed trades.

Even though I am convinced of the strategy, it is entirely possible that it does not survive the real world (i.e. it does not achieve meaningful outperformance after costs). In this case, there are only two choices consistent with the philosophy of AGNOSTIC INVESTING: shut it down or improve it until it works. But I don’t plan to wait for that. There are just too many interesting opportunities to improve the strategy already today.

Improving the strategy and other ideas

So far, everything is still very close to a simple academic backtest. I improved the strategy (at least I hope so) with equal weights and the additional size filter, but that is still not very sophisticated. All of this is just a starting point and more like a live test. I am working on improvements and as soon as I have reliable results you will find them on this website. So for the moment, let me try to give you kind of a roadmap.

More signals

Currently, the strategy is based on three variables: the average of the two copy-paste scores, and market capitalization. I don’t need to tell you that these are very few. Using more variables is therefore an obvious starting point. For example, we can also use computers to determine sentiment (positive/negative) or complexity (how hard is it to read something) of reports. In addition to that, standard data like financial statements or past returns is certaintly relevant as well.

The problem with more variables is that we need some kind of model to incorporate them into a single forecast. Sorting stocks by one or two measures is no problem. But if we split the current Report Analytics USA portfolios two more times, they will shrink to 37 stocks. This is not sufficiently diversified for a systematic strategy. So it is a challenge, but there are ways to do it.

More text data

Let’s recap the general idea behind the strategy: firms recognize new risks to their business and are legally required to disclose them. For really bad news, chances are high that firms don’t even wait for the next report but announce them immediately via ad-hoc reports (in the US, these are called 8-Ks). Considering those as well should therefore improve the stock selection.

There is a lot more text data available: news feeds, online chats, analyst reports etc. Some of this is certainly relevant, but it is very hard to implement. Trading on news requires very fast execution, analyst reports are hard to collect, and most online data is fairly expensive. I don’t think me and my laptop have the slightest chance to compete with the large quant firms on speed and budget. So I don’t even try.

Better methodology

The strategy is essentially a little computer program. As any piece of software, it will never be free from errors and needs continuous maintenance.

I try to improve the code wherever possible, but I am not a computer scientist. To say it with Buffett’s words: a finance guy that knows how to code may be a remarkable finance guy – but not a remarkable coder.

I am pretty sure that with more programming knowledge, even more is possible.

A horse that can count to ten is a remarkable horse – not a remarkable mathematician.

Warren E. Buffett7Quote from goodreads, URL, accessed 2021-10-14.

There are two things I am currently working on: I try to split reports into their sections, and I look at different copy-paste measures. The “Lazy Prices” authors already did some work in this direction, but it is more difficult to implement. However, this is exactly the reason why it may be even more profitable. The other copy-paste measures are more of a robustness check: you don’t want to implement a strategy that fails if you use different measures for the same phenomenon. So we should check that.

Finally, there is portfolio construction. The academic portfolio sort is not necessarily bad, but the standard quant playbook offers much more: we could eliminate industry bets, apply more sophisticated portfolio optimization instead of equal weights, use dynamic risk management, etc.

As you see, more ideas than time to implement them. But I am working on it and as soon as I have sharable results, you will find them here on the website. Since the post is already longer than planned, this is all for the moment. You can use the buttons below to read the post on the strategy or to go directly to the Report Analytics USA portfolios. One final note: the US earnings season just kicked off, so I will rebalance the portfolios very soon.

This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). All content that is not my intellectual property is marked as such. If you own the intellectual property displayed on this website and do not agree with my use of it, please send me an e-mail and I will remedy the situation immediately. Please also read the Disclaimer.


1, 4, 5 Sadlo, Sven-Philip; 2020; Copy-Paste Outperformance: Lazy Investors and Copied Reports; Master Thesis, Goethe University; Frankfurt am Main; URL.
2 Cohen, Lauren H., Christopher J. Malloy, and Quoc Nguyen; 2020; Lazy Prices; The Journal of Finance 75, 1371–1415; URL.
3 Total US Market, S&P 500, S&P 400, and S&P 600.
6 Those certificates are collaterized. So even if the bank defaults, investors should get the current value of their holdings. But please inform yourself. I do not update this page and this is no advice or marketing.
7 Quote from goodreads, URL, accessed 2021-10-14.