Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
84b389d
Draft versions of Data Science logo and individual elements used to c…
henrykaplan Nov 30, 2021
97d8d5e
Adding flattened version H4LA logo text
henrykaplan Apr 14, 2022
207e056
Link to logo files and add png and svg logo formats.
henrykaplan May 4, 2022
2d4c7c7
Merge pull request #164 from hackforla/160-survey-repo-labels
rbianchetti Jun 27, 2022
55273ea
Merge pull request #158 from henrykaplan/logo
henrykaplan Sep 15, 2022
c8e6a19
Add files via upload
willa-mannering Oct 24, 2022
1a60114
adding directory for la cime analysis
Lalla22 Oct 17, 2023
aaed528
git c
Lalla22 Oct 17, 2023
3e73404
Merge pull request #184 from hackforla/lalla_la_crime_analysis_2
Lalla22 Oct 17, 2023
17d7cf0
Create README.md
Lalla22 Oct 17, 2023
48f8683
Merge pull request #185 from hackforla/lalla_la_crime_analysis_2
Lalla22 Oct 17, 2023
dff8a8a
Adding bigram analysis on feedback from workshop
MalakH21 Jun 3, 2021
5d2a291
fix name of image to work with Windows
salice Feb 20, 2024
a7d62b5
Merge pull request #195 from hackforla/rename-images
salice Feb 20, 2024
e21ed3a
186 Adding Crime Data
dolla24 Feb 20, 2024
1280729
Merge pull request #196 from hackforla/186-Hack4LAcrime-
dolla24 Feb 20, 2024
ae6f789
DivyaPrakash_Commit1
MDivyaPrakash Mar 26, 2024
7b58207
Create CleaningRules
mru-hub Apr 8, 2024
5e892e4
Cleaning rules from the 311-data
mru-hub Apr 14, 2024
5b2c04f
Remove CleaningRules
mru-hub Apr 14, 2024
8ba5bce
Merge pull request #199 from hackforla/177-create-311-data-csv-files-…
salice Apr 16, 2024
b405d2e
Added data loading and cleaning Jupyter notebook
mru-hub Jun 18, 2024
d6a2aae
Create Data_csvfiles
mru-hub Jun 18, 2024
0acc68b
Pushing_Documentation
MDivyaPrakash Jun 24, 2024
57a3078
Merge pull request #205 from hackforla/177-create-311-data-csv-files-…
salice Jun 25, 2024
7b7b3fb
Adding Cleaning script version2
mru-hub Aug 19, 2024
9b84c5b
Update README.md
tpham16 Oct 29, 2024
b0bbb8e
Merge pull request #215 from hackforla/tpham16-patch-1
venkata-sai-swathi Jan 17, 2025
c2d1979
Update README.md
ExperimentsInHonesty Aug 11, 2025
73e8239
Temporary upload of webscraping tutorial along with other data collec…
parcheesime Sep 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,807 changes: 1,807 additions & 0 deletions 311-data/CSV_files/DataLoading_Script.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions 311-data/CSV_files/Data_csvfiles
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Binary file added 311-data/CSV_files/Docs/CleaningRules.txt
Binary file not shown.
62 changes: 62 additions & 0 deletions 311-data/webscraping/scrape_more_info.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""Scrapes extra info for desired tech categories

Takes tech_table produced by Rajinders' scrape.py script

Extra info includes: url of tech found on builtwith, text description of tech, url for tech website, subcategories listed on builtwith,
number of live sites that use technology, list of top 5 competitors of tech
"""

import requests
from bs4 import BeautifulSoup, element
import pandas as pd
import numpy as np

def get_extra_info(url_list):
columns = ['builtwith_tech_url', 'tech_description', 'tech_website', 'subcategories','num_live_sites','competitors']
extra_info = pd.DataFrame(columns=columns)

for url in url_list:
print(url)
res = requests.get(url)
soup = BeautifulSoup(res.content, "html.parser")

#first card search for tech description, tech website, subcategories
div = soup.findAll("div", {"class": "col-9 col-md-10"})

ls = [url]
for d in div:
info = d.findAll("p")
for i in info:
ls.append(i.text)

#second search, for top competitors
div2 = soup.find("div", {"class": "list-group small"})
comp = div2.findAll("a", href=True)
links = []

try:
for i in range(5): #get top 5 competitors
if 'trends' in str(comp[i]): #some sites don't have competitors listed
links.append(comp[i]["href"][2:])
else:
continue
except:
continue

try:
ls.append(soup.find("dd", {"class": "col-6"}).text) #get number of live websites
except:
ls.append(np.NaN)#some urls don't have this info yet

ls.append(links)
extra_info.loc[len(extra_info)] = ls

return extra_info

def main():
tech = pd.read_csv('tech_table.csv')
more_info = tech[tech['category'].isin(['widgets', 'analytics', 'cms','copyright','framework','link','mobile','payment','ssl','widgets'])] #choose which tech categories to collect more data
urls = list(more_info['URL of tech'])
info = get_extra_info(urls)
info['subcategories'] = info['subcategories'].str.replace(' · ', ', ') #formatting for subcategories column
info.to_csv('techtable_extrainfo.csv', index=False)
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,16 @@ Welcome to the Data Science Community of Practice. We are happy you are here!
If you have not read the [Guide for New Volunteers](https://www.hackforla.org/getting-started), please do so.

1. Join the [#data-science](https://hackforla.slack.com/archives/CGRATJCCF) Slack channel and introduce yourself.
1. Slack one of our Data Science Community of Practice leads [Ryan](https://hackforla.slack.com/team/UPB2FHJCX) or [Sophia](https://hackforla.slack.com/team/UN7V7L934) with your Gmail address.
1. Accept your Google Drive invite to access the [shared folder](https://drive.google.com/drive/u/0/folders/17VuPq--bK2RvBiAG87C0Vo1oM7nluuS7).
1. Add yourself to the [Team Roster](https://docs.google.com/spreadsheets/d/1QJltNh1gOybfebe-RkT-xS7m4OtxbuFfaJ4OujeA4h0/edit) and inform Ryan or Sophia after you have done so.
1. Join our meetings Thursday at 8 pm PST via [Zoom](https://us02web.zoom.us/j/81067015817?pwd=M3l6a0tQTWhLbnlTbEZNOWJ5UXN3QT09).
1. Check out the [open Data Science roles](https://github.com/hackforla/data-science/projects/2) we have available.
1. Come to one of the Data Science Community of Practice meetings and provide your gmail address to on of the leads so that they can add you to the drive.
1. After you have been added to the drive, Add yourself to the [Team Roster] and the leads know you have added yourself
1. Join our meetings Monday at 7- 8 pm PST via [Zoom] link in the Slack channel. We do not meet during holidays or on the first Monday of the month.

The Data Science Community of Practice is one of many. [See all our Communities of Practices](https://github.com/hackforla/communities-of-practice/blob/main/README.md)


## Focus

The Hack For LA Data Science team is a group within the LA brigade seeking to make analytical and machine learning services available to local communities and organizations. We have three main goals as a team:
The Hack For LA Data Science team is a group within Hack for LA seeking to make analytical and machine learning services available to local communities and organizations. We have three main goals as a team:

1. To provide data science services to communities in the Los Angeles area. Please [contact us](mailto:[email protected]) if you have a proposal for a data science project. We have many talented data scientists with track records in industry and academia, and we are excited to be able to support our communities.

Expand Down
Loading