Copy-Paste Outperformance – Summary

Almost every publicly traded company in the US must publish three quarterly and one annual report (10-Qs and 10-Ks, respectively).[1]Securities and Exchange Commission; 2021; Form 10-K; URL, accessed 2021-10-03. Those reports are obviously very important for investors. The firms preparing them, however, don’t get as much out of it: drafting reports is a lot of effort, does not really improve operations, and competitors get information that would be rather kept private. Public companies are therefore confronted with a situation we all know from our tax returns: repetitive and tedious work that no one (I apologize for insulting the tax specialists) really likes but still needs to be done every year. How to deal with this? Correct, spend the time to create one comprehensive template and reuse it as long as possible.

Lazy is not always bad

In an excellent research paper titled “Lazy Prices” (2020)[2]Cohen, Lauren H., Christopher J. Malloy, and Quoc Nguyen; 2020; Lazy Prices; The Journal of Finance 75, 1371–1415; URL., the authors show that US companies are no exception from this: many annual and quarterly reports are basically just updated copies from the previous year. This sounds lazy and somehow reprehensible, but economically, it is rather efficient. Most firms probably do not change themselves that much within just one year. So, it seems okay to just update the numbers and thereby confirm that everything is going as usual.

This behavior has important consequences for investors: since most of the current report is just copy and paste from the year before, the content itself is actually not that relevant. Investors should rather focus on differences between the current and previous year’s report. In many cases, such changes follow a clear logic: firms are legally required to disclose all knowable risks of their business (for US reports, this is the “Risk Factors” section). If they don’t do that properly and undisclosed risks materialize, shareholders may sue them for withholding information. To avoid such lawsuits, companies disclose a lot of potential risks, regardless if relevant or not (typically, you find everything from currency risks to the sudden death of the CEO). Many of them are rather unspecific and of limited use to investors but if new ones are added, it may be important.

This logic does not hold for positive aspects. Companies have to inform appropriately about their prospects but don’t need to disclose every potential opportunity. Therefore, changes in reports should be regarded rather skeptically. Why should firms rewrite their report if the previous year’s version is just fine? Since there is no direct benefit, they will probably only do it if they have to. And that is usually when they have discovered a previously undisclosed risk.

Cohen, Malloy, and Nguyen – the authors of “Lazy Prices” – document this pattern and test whether investors can exploit it. They analyze all 10-Ks and 10-Qs published between 1995 and 2014 with respect to copy-paste. Since this is a sample of over 300,000 reports, they cannot do this manually. Instead, they use quantitative text-similarity measures to compare each report with its respective version one year before.

How does this work? Think about these measures like a correlation. You put two texts into a computer program which returns a value between 0 and 1. A value of 1 indicates that the texts are completely identical. 0 means that there are no similarities at all. For example, the Jaccard similarity (a very simple but robust measure) for Apple’s 2020 and 2019 10-K is about 0.92. This is close to 1 and thus indicates that large fractions of the report are indeed just copy-paste. In their paper, Cohen, Malloy, and Nguyen use four of these measures to predict future stock returns.

When I found the paper in early 2020, I was hooked by the simple and reasonable idea. Since I needed a topic for my master thesis and wanted to do something with natural language processing, I decided to replicate the paper. As I am interested to turn such studies into tradable investment strategies, I updated the sample and used a more practical investment universe. Everything that follows is my own work, but obviously, it is based on the ideas of Cohen, Malloy, and Nguyen. Please give them full credit for originally discovering that.

Making money from copied reports

In my thesis[3]Sadlo, Sven-Philip; 2020; Copy-Paste Outperformance: Lazy Investors and Copied Reports; Master Thesis, Goethe University; Frankfurt am Main; URL., I looked at firms in the S&P Composite 1500 Index[4]S&P Dow Jones Indices; 2021; S&P Composite 1500; URL, accessed 2021-10-04. between 1995 and June 2020. This index captures a lot (but admittedly not everything) of the US stock market and should serve as a practical benchmark. Over the sample period, there are more than 3,600 firms which released about 207,000 reports.

To test if investors can exploit this reporting behavior, I construct a 25 year backtest. For every month between January 1996 and June 2020, I search the most recently published 10-K or 10-Q of each stock in the index. Subsequently, I use different text similarity measures to determine how much of each firm’s report is just copy-paste. Based on these measures, I sort the 1,500 stocks by their degree of report copy-paste and divide them into 5 portfolios of 300 stocks each (more geeky: a standard portfolio sort with quintiles). To make it simple and transparent, I just weight stocks based on market capitalization. You can of course do more sophisticated things and optimize over sectors, industries or whatever else you believe in. But since academia is about being transparent and replicable, equal- or market cap weighting is just fine. The strategy worked even better with equal weights. But often, this would allocate too much money to illiquid small caps.

At this point, we have five portfolios that differ with respect to copy-paste in their reports. Let’s focus on the extremes: the 300 stocks with the highest copy-paste scores are those which basically just update the numbers (“Copy-Paste”). The 300 with the lowest copy-paste scores, are those that change something (“Changes”). As discussed before, we should be cautious with those.

Data and charts are from my master thesis.[5]Sadlo, Sven-Philip; 2020; Copy-Paste Outperformance: Lazy Investors and Copied Reports; Master Thesis, Goethe University; Frankfurt am Main; URL. “Copy-Paste” (“Changes”) are the 20% of stocks with most (least) similar reports measured by Jaccard similarity. Given the S&P 1500 index, each portfolio consists of 300 stocks that are weighted by market capitalization. Data ranges from January 1996 to June 2020. Depicted are price returns without dividends in USD. Portfolios are rebalanced monthly.

You probably expected what is coming next, but the data actually supports the idea. The “Changes” portfolio massively underperformed the benchmark index and the “Copy-Paste” portfolio over the last 25 years. It appears that changes in reports are actually indicating problems that affect future returns. On the other extreme, the “Copy-Paste” portfolio generated meaningful outperformance. This is in line with the idea that firms just update the numbers when the business is running as usual.

Why should the strategy work?

Surprisingly however, the same strategy also worked for the subset of S&P 500 firms. The “Changes” portfolio again massively underperformed, both the index and the “Copy-Paste” portfolio. These results are not intuitive: this market is more competitive as there are more journalists[6]Hillert, Alexander, Heiko Jacobs, and Sebastian Müller; 2014; Media Makes Momentum; The Review of Financial Studies 27, 3467–3501; URL., analysts[7]Martineau, Charles, and Marius Zoican; 2019; Crowded Stock Coverage; Working Paper; URL., and institutional investors.[8]Chan, Kalok, Hung Wan Kot, and Gordon Y.N. Tang; 2013; A comprehensive long­term analysis of S&P 500 index additions and deletions; Journal of Banking & Finance 37; 4920–4930, URL. In addition to that, stocks are more liquid and cheaper to trade. So there should be someone who reads and compares those reports cover to cover and use the embedded information to make money (again more geeky: the S&P 500 universe has a higher chance of being semi-strong form efficient).[9]Fama, Eugene F.; 1970; Efficient Capital Markets: A Review of Theory and Empirical Work; The Journal of Finance 25, 383–417; URL.

Data and charts are from my master thesis.[10]Sadlo, Sven-Philip; 2020; Copy-Paste Outperformance: Lazy Investors and Copied Reports; Master Thesis, Goethe University; Frankfurt am Main; URL. “Copy-Paste” (“Changes”) are the 20% of stocks with most (least) similar reports measured by Jaccard similarity. Given the S&P 500 index, each portfolio consists of 100 stocks that are weighted by market capitalization. Data ranges from January 1996 to June 2020. Depicted are price returns without dividends in USD. Portfolios are rebalanced monthly.

But this doesn’t seem to be the case. So why did this strategy work? It seems that most investors need some time to recognize the changes in reports and don’t trade on them immediately. A large literature argues that attention and information-processing of (human) investors is naturally limited. Cohen, Malloy, and Nguyen explain that the underperformance of the “Changes” portfolio is just another example for this. Investors don’t have the time, resources, or motivation to compare the current and previous year’s report in full detail. Therefore, they simply don’t recognize minor differences. But if something draws their attention (for example via press releases), they begin to trade on something that was already published in a report. Using systematic text similarity measures helps to capture profits from this reaction – at least in theory.

This is the simple story and idea. There are more details about the things I mentioned, but I want to keep this post more of a big picture. To monitor the strategy in real time (and within a more realistic trading setup), I implemented two versions of it on Wikifolio. The methodology is not completely identical to what I presented above, but more on that in another post.

This content is for educational and informational purposes only and no substitute for professional or financial advice. The use of any information on this website is solely on your own risk and I do not take responsibility or liability for any damages that may occur. The views expressed on this website are solely my own and do not necessarily reflect the views of any organisation I am associated with. Income- or benefit-generating links are marked with a star (*). Please also read the Disclaimer.