SEC Filings

  SEC DATA -  https://github.com/uconnstamford/Extracting-Data-From-SEC


On Compute engine VM


In directory home/public/sec/Extracting-Data-From-SEC



SEC Extraction code

======================================================================


import csv

from sec_api import ExtractorApi

from bs4 import BeautifulSoup


extractorApi = ExtractorApi("ENTER SEC API HERE")


filing_url = "https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/tsla-10k_20201231.htm"


section_text = extractorApi.get_section(filing_url, "1A", "text")


section_html = extractorApi.get_section(filing_url, "7", "html")


soup = BeautifulSoup(section_html, 'html.parser')

section_text_html_stripped = soup.get_text()


with open('sec_data.csv', mode='w', encoding='utf-8') as csv_file:

    writer = csv.writer(csv_file)

    writer.writerow(['Section', 'Content'])

    writer.writerow(['1A', section_text])

    writer.writerow(['7', section_text_html_stripped])

    

print("Data extracted ti sec_data.csv")


=======================================================================


The first command import csv allows the reading and writing of csv files (comma separated values)


https://docs.python.org/3/library/csv.html


The sec_spi module allows for access to the financial database where all US companies are required to file information regarding the performance of that company so shareholders can make knowledgeable investment decisions. E.G Public us companies are required to follow quarterly and yearly documents that provide investors with the audited financial results of the specified time period. 10-Q for quarterly results. 10-K for yearly results.


https://www.sec.gov/edgar/searchedgar/companysearch



The python module from sec_api import ExtractorApi

https://pypi.org/project/sec-api/


Allows API calls (Application programming interface) to the SEC database so the information can be returned electronically


The module from bs4 import BeautifulSoup


Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.


extractorApi = ExtractorApi("ENTER SEC API HERE")


https://sec-api.io/docs/sec-filings-item-extraction-api


The Extractor API extracts any text section from 10-Q, 10-K and 8-K SEC filings, and returns the extracted content in cleaned and standardized text or HTML format. Send the URL of the filing, the section name (e.g. Item 1A) and the return data type (e.g. raw text) to the Extractor API and the extracted content is returned.

You can programmatically extract one or multiple text sections from any 10-Q, 10-K and 8-K filing. The extracted section item is returned as clear-text without HTML tags or standardized HTML. There is no need to develop your own item extraction algorithm anymore. Amended filings, such as 10-Q/A, 10-K/A and 8-K/A as well as all 10-K form variants, such as 10-KT, 10KSB, are also supported.

Dataset size: all sections of all 10-K, 10-Q and 8-K filings including their variants filed since 1994.


SEC Query API - Obtaining Data on URLs for latest 10k filings

 Specifications/Requirments/Notes:


Gets the latest 10k filings from Tesla and Apple.


Why we need this: in order to run sentiment analysis on how the company talks about itself, we need to extract from its SEC filings. The problem is the SEC extractor API requires urls. We cannot hard-code those urls in since new data will constantly have new urls. 

  • This code will get the most recent filings and a bunch of info about it, including a link to where the report is stored on the SEC website!



Questions and Breaking Down into Smaller Subtasks


In addition to tickers (like TSLA), SEC also assigns companies a CIK number. Ideally, we stick to only one form of identification. 

  • Research if we can make a query with the SEC Query API using tickers instead of CIK

  • Can we get a larger list of tickers/ work with other teams to see which tickers we need to collect.

  • Build a map between ticker and cik


Code for accessing from master table


I got a key  from https://sec-api.io/register but I will be limited to 100 queries a month.

  • Figure out if we can get a key with unlimited queries / see if we can use the API without  an API key.

  • Professor said this limit is okay.


How can we get the code to run on a regular basis?

  • Every week or something like that.

Google Scheduler / Cron


How to get a CSV file stored into Entity?

  • Talk to other teams

  • Data Store

  • Problem is already solved look at blog


Photos

I downloaded the CSV file generated by the file and it looks like this on my Trio Office Software:



VERSION 1 - gives some error - look at Version 2 for correct code

import csv

#pip install with this if sec is giving error (without the quotes): 'pip install sec-api'

from sec_api import QueryApi

myQuery = QueryApi(api_key = "key")

query = {

    "query": {

        "query_string": {

            "query": "cik:(320193 OR 1318605) AND formType:\"10-K\""

           

        }

    },

    "from": "0",

    "size": "5",

    "sort": [{ "filedAt": { "order": "desc" } }]

}

queryResponse = myQuery.get_filings(query)

filings = queryResponse["filings"]

#needed to look at documentation very helpful for next part

#https://sec-api.io/docs

field_names = ["id", "accessionNo", "companyName", "companyNameLong", "ticker",

               "cik", "filedAt", "items", "formType", "periodOfReport",

               "linkToHtml", "linkToFilingDetails", "linkToTxt", "description",

               "documentFormatFiles", "dataFiles", "seriesAndClassesContractsInformation",  

               "linkToXbrl", "entities"]

with open('recent_10k_filings.csv', 'w', encoding='UTF8') as file:

   writer = csv.DictWriter(file, fieldnames=field_names)

   writer.writeheader()

   writer.writerows(filings)



VERSION 2

import csv

#pip install with this if sec is giving error (without the quotes): 'pip install sec-api'

from sec_api import QueryApi

myQuery = QueryApi(api_key = "YOUR KEY HERE")

query = {

    "query": {

        "query_string": {

            "query": "cik:(320193 OR 1318605) AND formType:\"10-K\""

           

        }

    },

    "from": "0",

    "size": "20",

    "sort": [{ "filedAt": { "order": "desc" } }]

}

queryResponse = myQuery.get_filings(query)

filings = queryResponse["filings"]

#needed to look at documentation very helpful for next part

#https://sec-api.io/docs

field_names = ["id", "accessionNo", "companyName", "linkToHtml"]

with open('recent_10k_filings.csv', 'w', encoding='UTF8') as file:

   writer = csv.DictWriter(file, fieldnames=field_names, extrasaction='ignore')

   writer.writeheader()

   writer.writerows(filings)






                         


No comments:

Post a Comment

Metrics

  Build is going on now. Have company and security tables created and python code to access and insert data into SQL tables. Asked Gemini to...