Skip to content

Commit 5493ef3

Browse files
author
csanders-git
committed
updating readme and tools
1 parent 0790ea5 commit 5493ef3

File tree

155 files changed

+570028
-297609
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

155 files changed

+570028
-297609
lines changed

.gitignore

+132
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
.DS_Store
2+
.ipynb_checkpoints
3+
4+
# Byte-compiled / optimized / DLL files
5+
__pycache__/
6+
*.py[cod]
7+
*$py.class
8+
9+
# C extensions
10+
*.so
11+
12+
# Distribution / packaging
13+
.Python
14+
build/
15+
develop-eggs/
16+
dist/
17+
downloads/
18+
eggs/
19+
.eggs/
20+
lib/
21+
lib64/
22+
parts/
23+
sdist/
24+
var/
25+
wheels/
26+
pip-wheel-metadata/
27+
share/python-wheels/
28+
*.egg-info/
29+
.installed.cfg
30+
*.egg
31+
MANIFEST
32+
33+
# PyInstaller
34+
# Usually these files are written by a python script from a template
35+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
36+
*.manifest
37+
*.spec
38+
39+
# Installer logs
40+
pip-log.txt
41+
pip-delete-this-directory.txt
42+
43+
# Unit test / coverage reports
44+
htmlcov/
45+
.tox/
46+
.nox/
47+
.coverage
48+
.coverage.*
49+
.cache
50+
nosetests.xml
51+
coverage.xml
52+
*.cover
53+
*.py,cover
54+
.hypothesis/
55+
.pytest_cache/
56+
57+
# Translations
58+
*.mo
59+
*.pot
60+
61+
# Django stuff:
62+
*.log
63+
local_settings.py
64+
db.sqlite3
65+
db.sqlite3-journal
66+
67+
# Flask stuff:
68+
instance/
69+
.webassets-cache
70+
71+
# Scrapy stuff:
72+
.scrapy
73+
74+
# Sphinx documentation
75+
docs/_build/
76+
77+
# PyBuilder
78+
target/
79+
80+
# Jupyter Notebook
81+
.ipynb_checkpoints
82+
83+
# IPython
84+
profile_default/
85+
ipython_config.py
86+
87+
# pyenv
88+
.python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
96+
97+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
98+
__pypackages__/
99+
100+
# Celery stuff
101+
celerybeat-schedule
102+
celerybeat.pid
103+
104+
# SageMath parsed files
105+
*.sage.py
106+
107+
# Environments
108+
.env
109+
.venv
110+
env/
111+
venv/
112+
ENV/
113+
env.bak/
114+
venv.bak/
115+
116+
# Spyder project settings
117+
.spyderproject
118+
.spyproject
119+
120+
# Rope project settings
121+
.ropeproject
122+
123+
# mkdocs documentation
124+
/site
125+
126+
# mypy
127+
.mypy_cache/
128+
.dmypy.json
129+
dmypy.json
130+
131+
# Pyre type checker
132+
.pyre/

README.md

+26-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,33 @@
11
# Shmoocon 2020 update
22

3+
## Verifying the data
4+
5+
The original, unenriched, data is available from its respective sources. Unfiltered data is available in `dataset.csv` for further research.
6+
7+
To generate the data used in the presentation you should run the jupyter notebook. This can be accomplished via the following:
38

4-
Install
59
```
610
pip install -r requirements.txt
711
jupyter notebook
812
```
13+
14+
# Breach data
15+
The breach data comes from three different sources:
16+
17+
* VERIS Community Database (VCDB) (https://github.com/vz-risk/VCDB)
18+
* Privacy Rights Clearinghouse (https://privacyrights.org/data-breaches)
19+
* Wikipedia (https://en.wikipedia.org/wiki/List_of_data_breaches)
20+
21+
The original work on this effort was guided by Wikipedia. It became clear that this was not an exhaustive list, even of widely known breaches. As a result for more recent iterations more work was put into leveraging additional sources. These sources were combined, enriched, and corrected where needed to produce `dataset.csv`. For our research we identified breaches of publicly traded companies (NASDAQ or NYSE although the data is enriched with other exchanges as well) and determined if the breach affected over 100 customer records. Only records matching this description were included in the final 153 samples. We have isolated these samples as `dataset-samples.csv`.
22+
23+
## Stock information
24+
In this version of the release stock information comes from Financial Modeling Prep (https://financialmodelingprep.com/). Financial Modeling Prep provides a free API. Information on our usage of the API can be found in `tools/fetch_stock_info`. This script will read from our dataset sheet and downloaded all NYSE and NASDAQ stocks with the symbols listed. The data will be available in the created `data` subfolder and will be broken down by stock symbol on a day by day cadence from 1998-2020 (if available).
25+
26+
Some of the information provided by this service is incomplete. The following companies were updated from data on cnvesting.com (https://www.investing.com). These were transformed to match the format of the data downloaded from Financial Modeling Prep.
27+
VWAP values are just placeholders as the needed on these stocks are just placeholders as the needed information to perform the calculation wasn't present.
28+
29+
* Express script's stock info was downloaded from https://www.investing.com/equities/express-scripts-inc-historical-data
30+
* DirectTV's stock info was downloaded from https://www.investing.com/equities/directv-historical-data:
31+
* Barnes and Nobel's stock info was downloaded from https://www.investing.com/equities/barnes---noble-inc-historical-data
32+
* Aetna's stock info was downloaded from https://www.investing.com/equities/aetna-inc-historical-data
33+
* Time warner's stock info was downloaded from https://www.investing.com/equities/time-warner-historical-data

0 commit comments

Comments
 (0)