What this tool CAN do
This tool offers data from computer-based textual analysis of annual and quarterly reports of US companies. Much of it is based on results of the growing “Textual Analysis in Accounting & Finance” literature. With the app, I attempt to make those results easily accessible to everyone who is interested.
You can select up to 5 US stock tickers and look up some of the most important data of their annual and quarterly reports (10-Ks and 10-Qs). For the selected data, the tool also gives you the position of the stock in the current US equity universe. More details about this below.
What this tool CANNOT do
I am neither a date vendor nor an app programmer. So please don’t expect too much and read the disclaimer. I will do my best to keep the data updated and correct (I use the app myself) but I cannot guarantee for it. Also note that the app is currently hosted for free (there can be loading times) and not yet optimized for mobile devices.
In addition to the disclaimer, I also personally don’t recommend to make any decisions solely based on the data. Looking at the report analytics is a useful addition to any fundamental analysis but certainly not sufficient. More about those limitations below.
How to use this dashboard?
Select up to 5 stocks, a variable, and choose your desired report type. You will find a brief description for all of these elements in the following boxes. And that’s it. Feel free to use the camera icon and export your chart.
Where does the data come from?
All reports are directly downloaded from EDGAR, the electronic filing system of the SEC. With a self-developed python program I calculate all offered variables myself. So I have full control over the process and all assumptions. Many of these details are explained in my master thesis. For the tickers, I use the holdings of an ETF on the S&P Total Market Index. I also use the service of Financial Modeling Prep for some mapping beneath the surface.
What can you do with the data?
The app is designed for curious investors who want to include some form of alternative data in their process. You should not make any investment decision solely based on data from the app. But you can use it to supplement your analysis, to find new ideas, or just to look up data that is interesting to you.
What else do you need to know?
I do my best to keep the app updated and I also want to add new features over time. But I am neither an app programmer nor a data vendor, so I can’t guarantee for anything. If you discover a problem or have an idea to improve the app, please send me an email.
Settings and Variables
Report Types: All, 10-K or 10-Q
You can select all reports or filter by 10-K and 10-Q (annual and quarterly reports, respectively). Almost all US companies publish one 10-K and three 10-Qs per year. The dates in the first chart show when the reports were published on EDGAR, the electronic filing system of the SEC. Therefore, the data is free from any look-ahead bias.
US Equity Universe and Percentiles
I use an ETF on the S&P Total Market Index to estimate the investable US equity universe. I currently (October 2021) get about 3,400 tickers from the holdings of this ETF, so most US companies should be available in the app.
For the chosen data, the second chart gives you the relative position of stocks within this equity universe. How does this work? For each of the 3,400 stocks, I look up the selected variable for the most recently published report. Then I calculate the percentile ranks for the desired stocks and plot them in the chart. Let’s assume Apple has a Words Percentile value of 80. This means that 80% of the firms in the universe published shorter reports than Apple. Vice versa, only 20% published reports with more words. Depending on the variable, a larger percentile may be good or bad. I leave the interpretation to you, but there are some ideas in my blog.
Please note that the percentile rank depends on your selected report type. For All, I use the most recently published report for each stock, no matter if 10-Q or 10-K. As fiscal years differ among companies, this will almost certainly lead to a list of mixed report types. For variables like Words or File Size you should therefore choose a specific report type to get meaningful percentile ranks (annual 10-Ks are obviously longer than quarterly 10-Qs).
File Size Raw (MB)
The size of the unedited report file in megabytes (MB). This includes exhibits, HTML formatting, and all other attachments. Loughran and McDonald (2014) succesfully used this variable to measure how complicated reports are.
File Size Edited (MB)
The size of the report file in megabytes (MB) after deleting HTML formatting and exhibits. These two steps drastically reduce file size and put the focus on the plain text. Therefore, this variable may be even a better indicator for report-readability than the previous one.
The total number of tables in the report. I am not aware of any existing application of this data, but it is interesting to see how companies flood investors with information and how this changes over time.
Unfortunately, normal text is often mistakenly classified as table. Therefore, I run an additional filter to identify “numeric” tables that are actually used to disclose numbers. I define tables as “numeric” if more than 20% of their characters are numbers.
The total number of graphics in the report. I am not aware of any existing application of this data, but it certainly makes sense to check why and how companies use graphics in their reports.
The total number of exhibits in the report. Exhibits include additional information that is attached to the report and a certain number of them is absolutely normal. However, if the number suddenly changes, it may be worth checking why.
The total number of words in the report. To avoid errors, I only count words with at least two letters. In addition to file size, this variable is a straight forward indicator for report-length.
A measure for copy-paste between the report and its previous year version (e.g. 10-K 2020 vs. 10-K 2019). The measure ranges from 0 to 1. The higher, the more copy-paste. Cohen, Malloy, and Nguyen (2020) show that this variable is a strong predictor for returns and fundamentals. For more information about the Jaccard measure and report copy-paste, please check this post and my master thesis.
A measure for copy-paste between the report and its previous year version (e.g. 10-K 2020 vs. 10-K 2019). The measure ranges from 0 to 1. The higher, the more copy-paste. Cohen, Malloy, and Nguyen (2020) show that this variable is a strong predictor for returns and fundamentals. For more information about the Cosine measure and report copy-paste, please check this post and my master thesis.
A measure for copy-paste between the report and its previous year version (e.g. 10-K 2020 vs. 10-K 2019). The measure ranges from 0 to 1. The higher, the more copy-paste. Cohen, Malloy, and Nguyen (2020) show that this variable is a strong predictor for returns and fundamentals. For more information about the Side-by-Side measure and report copy-paste, please check this post and my master thesis.
The simple average of the previous three copy-paste measures. Using the average yields a more robust indicator for copy-paste without having to choose a single measure.
All content on this website and within the embedded data dashboard is for informational and educational purposes only. You should not make any investment decisions based on the content of this website. I take no responsibility for accuracy, correctness, and timeliness of the data. There is no guarantee that this website or the embedded content will be updated, maintained or continued. I take no responsibility for any damages that may occur. This disclaimer also applies to the embedded data dashboard which is hosted on Heroku. Please also read the general Disclaimer of this website.