Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 Add Feature to Extract Record Holders Information from 10-K Filings #12

Open
JamesAlfonse opened this issue Sep 6, 2024 · 1 comment

Comments

@JamesAlfonse
Copy link
Member

JamesAlfonse commented Sep 6, 2024

2600_yay brings attention to a tool that can be used to pull data from the SEC

They link to this repository, which provides a robust framework for extracting text from SEC documents. It can potentially be adjusted to extract data concerning record holders, which is crucial for financial analysis and investor relations.

Proposed Feature:

  • License: Confirm that the repository license allows for use with our cause.
  • Data Identification: Implement a method to accurately identify and extract the section of 10-K filings that lists the number of record holders. This often appears under the "Security Ownership of Certain Beneficial Owners and Management" section or similar headings.
  • Data Extraction: Develop a parsing function that can read through the identified section and extract the number of record holders. The function should handle variations in document formatting and text structures.
  • Output Specification: The extracted data should be output in a structured format (e.g., JSON, CSV) that specifies the company name, ticker symbol, and the number of record holders.
  • Integration: Ensure this new feature integrates seamlessly with the existing database and confirm the numbers are accurate. Strive for 95-99.9% accuracy.

Use Case: This feature will be particularly useful for analysts and investors looking to aggregate or compare shareholder data across different companies, providing a clearer picture of investor engagement and stock distribution.

@JamesAlfonse
Copy link
Member Author

JamesAlfonse commented Sep 12, 2024

To add onto this, there are also extraction scripts available from @apes-on-parade here

Issuers are required to disclose the number of record holders each year in their 10K filing, but there is no strict requirement for the language used

Multiple phrase and sentence structures that refer to record holders would need to be identified and used.

Another challenge would be to identify issuers with multiple classes of stocks (Class A, Class B, etc.) and to be able to separate them accordingly. For now it may be useful to only extract record holder information of companies that have just one class of stock. Another issue can be created for a more refined extraction of multiple classes of stocks.

Ideally, this would be an automated script that runs daily and turns the data into a json file. Integrating that data into the database would be the final step; the .db file has CIK as the primary key and Ticker as the secondary key, so either of those columns should be used for merging purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant