Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoP: Data Science: Active and Inactive Businesses of LA County #182

Open
10 tasks
akhaleghi opened this issue Aug 30, 2023 · 25 comments
Open
10 tasks

CoP: Data Science: Active and Inactive Businesses of LA County #182

akhaleghi opened this issue Aug 30, 2023 · 25 comments

Comments

@akhaleghi
Copy link
Contributor

akhaleghi commented Aug 30, 2023

Prerequisite(s)

If you would like to work on this issue, please add a comment below and include the following information:

  • Your name
  • How many hours you can commit to working on this in the next week (minimum of 2)
  • Commit to providing an update with a comment before the next community of practice meeting

For example:

  • John Doe
  • I can commit to working on this issue 3 hours in the following week.
  • Yes, I will provide an update on my progress with a comment below.

Once you have done this, please add yourself to the “Assignees” section on the right and update the issue weekly to document your progress.

Overview

We want to create a usable dataset of active and inactive businesses to perform various time series analyses (i.e. visualizing business closures during the covid pandemic).

Action Items

Phase 1

  • Find available data sources and add to Resources section
  • Create data dictionary (EDA task)
  • Create issues required to fulfill project requirements, including exploratory data analysis, required tasks, and deliverables
    • Perform data cleaning (EDA task)
    • Understand and outline data context
  • Write one-sheet (see Resources below)
    • Define stakeholder
    • Summarize project, including value add
    • Define project 6 month roadmap
    • Detail history (if any)

Resources/Instructions

Data source for business listings in LA County.

@prishapuri
Copy link
Member

  • Prisha Puri
  • I can commit to working on this issue for 5 hours this week.
  • Yes, I will provide an update on my progress with a comment below.

@prishapuri prishapuri self-assigned this Oct 25, 2023
@xingstar97 xingstar97 self-assigned this Oct 27, 2023
@xingstar97
Copy link
Member

Ting Ai
I can commit to working on this issue 4 hours in the following week.
Yes, I will provide an update on my progress with a comment below.

@prishapuri
Copy link
Member

My Progress Updates

  • Utilized the dataset from the Office of Finance (link above)
  • Worked on data cleaning
  • Used Google Colab for code development

@prishapuri
Copy link
Member

My Progress Updates for This Week:

  • Acquired information regarding time series analysis
  • Changed the way the dataset was retrieved in Google Colab
  • Worked on creating another data frame for time series analysis

Note: I will resume working on this issue in December.

@xingstar97
Copy link
Member

My Progress Updates for last Week:

  • Cleaned data
  • Did EDA
  • Visualized the number of business start and closure by time

@xingstar97
Copy link
Member

My progress updates for this week:

  • learned time series analysis
  • prepared data for time series analysis

@prishapuri
Copy link
Member

My Progress Updates for this Week:

  • Replaced null values in one of the columns
  • Created a dictionary that will contain the number of active businesses in each year
  • Working on data cleaning and obtaining the number of active businesses in each year

@xingstar97
Copy link
Member

My progress updates for this week:
building SARIMA model (did Augmented Dickey - Fuller Test, Removed the trend to achieve stationary, did ACF and PACF)

@prishapuri
Copy link
Member

My Progress Updates for this Week:

  • Dropped rows with one or more null values
  • Noticed that the number of rows in my pandas DataFrame significantly reduced after data cleaning
  • Created another pandas DataFrame that displays the number of active businesses per year between the years 2000 and 2021

@prishapuri
Copy link
Member

My Progress Updates for this Week:

  • Worked on data visualization using the Plotly library
  • Performed data analysis and data cleaning

@prishapuri
Copy link
Member

My Progress Updates for this Week:

  • Looked at another individual’s work for this project
  • Learned about the Python library, GeoPandas

@akhaleghi
Copy link
Contributor Author

@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.

@SachinChodavarapu
Copy link
Member

Sachin Chodavarapu
I can commit to working on this issue for 5 hours this week.
Yes, I will provide an update on my progress via comment

@SachinChodavarapu
Copy link
Member

@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.

@prishapuri @xingstar97 @SachinChodavarapu Is this issue still being worked on? There have been no updates since April.

Hey Abe, I've been working on a different issue that should be completed this week. I'll be able to start working on this project on Thursday.

@max1million101
Copy link
Member

Max Kasbar
I can commit at least 2 hours per week on this issue
I'm also willing to provide updates via the comment section

@max1million101 max1million101 self-assigned this Jun 16, 2024
@prishapuri
Copy link
Member

@akhaleghi Hi Abe! I am currently working on this issue. I will send a message to you on Slack with more information.

@max1million101
Copy link
Member

As of this comment, I'm working on a data dictionary. I'll post it either here, somewhere else, or at least a link to it on Slack.

@max1million101
Copy link
Member

As of this comment, I'm working on a data dictionary. I'll post it either here, somewhere else, or at least a link to it on Slack.

Here is the link to the data dictionary. If anything is wrong with it, please let me know.
https://docs.google.com/spreadsheets/d/1tL11Ce6x_jYo3aitxbalo_NcoHaZ6o1CxQUHeHTUIDM/edit?usp=sharing

@SachinChodavarapu
Copy link
Member

  • Utilized the old dataset (from department of finance) for data cleaning, dropped NAICS column for accuracy.
  • Performed EDA and figured out business survival analysis would give better data insights from the given data set.
  • Created a new column 'duration' which helps in analyzing patterns in business closures and identify key factors that contribute to business success or failure.

@max1million101
Copy link
Member

A copy of the data dictionary for those unable to access Google Spreadsheets:

BusinessDataDictionary.xlsx

@ExperimentsInHonesty ExperimentsInHonesty closed this as completed by moving to Filled in HfLA: Open Roles Jun 18, 2024
@github-project-automation github-project-automation bot moved this from In progress (actively working) to Done in CoP: Data Science: Project Board Jun 18, 2024
@github-project-automation github-project-automation bot moved this from Done to In progress (actively working) in CoP: Data Science: Project Board Jun 18, 2024
@ExperimentsInHonesty ExperimentsInHonesty changed the title Active and Inactive Businesses of LA County CoP: Data Science: Active and Inactive Businesses of LA County Jun 18, 2024
@max1million101
Copy link
Member

An additional resource to add to resources. The following is a listing of business that register with Office of Finance during that month: https://finance.lacity.gov/new-monthly-business-listings

@rahul897 rahul897 self-assigned this Jun 24, 2024
@rahul897
Copy link
Member

rahul897 commented Jun 24, 2024

@akhaleghi
Copy link
Contributor Author

@rahul897 @prishapuri @max1million101 @xingstar97 @SachinChodavarapu @SathvikLingabathula There have been no updates on this issue since June. Are any of you still actively working on this?

@akhaleghi akhaleghi removed the status in HfLA: Open Roles Sep 23, 2024
@akhaleghi akhaleghi moved this to Filled in HfLA: Open Roles Sep 23, 2024
@akhaleghi akhaleghi closed this as completed by moving to Filled in HfLA: Open Roles Sep 23, 2024
@github-project-automation github-project-automation bot moved this from In progress (actively working) to Done in CoP: Data Science: Project Board Sep 23, 2024
@akhaleghi akhaleghi reopened this Sep 24, 2024
@prishapuri
Copy link
Member

@rahul897 @prishapuri @max1million101 @xingstar97 @SachinChodavarapu @SathvikLingabathula There have been no updates on this issue since June. Are any of you still actively working on this?

@akhaleghi Hi Abe! I sent a message to you on Slack.

@prishapuri
Copy link
Member

My Progress Updates for this Week:

  • Explored how to depict spatial data with Python to visualize the locations of the LA County businesses over time
  • Worked on eliminating null values for businesses that did not have latitude and longitude coordinates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress (actively working)
Status: Filled
Development

No branches or pull requests

10 participants