Skip to content

sangje-lee/non-profit-organization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

non-profit-organization

Final Result
https://github.com/sangje-lee/non-profit-org-employment

Process of this analysis

  • Export csv file into the big dataset.
  • Filtered some columns/attributes and removed null values that are founded.
  • Division into different datasets based on the Indicators (There's should be seven datasets)
  • Division into four different datasets based on the year. Contains three years worth of data (2010-2012, 2013-2015, 2016-2018, 2019-2021)
  • Division into four different characteristics into four dataasets.
  • Division based on the GEO, provinces.

Variable names involve during the analysis

  • df - Whole dataset without any filtering or division
  • df_sorted - Whole dataset with any filtering like removing non-important attributes.
  • df_sorted_na - Whole dataset with removal of the null values inside the dataset.

Division of into new dataset based on Indicator

  • df_AvgAnnHrsWrk - Average annual hours worked
  • df_AvgAnnWages - Average annual wages and salaries
  • df_AvgHrsWages - Average hourly wage
  • df_AvgWeekHrsWrked - Average weekly hours worked
  • df_Hrs_Wrked - Hours Worked
  • df_NumOfJob - Number of jobs
  • df_WagesAndSalaries - Wages and Salaries

Division of into new dataset based on the GEO/year

  • df_AvgAnnHrsWrk_2010 - Average annual hours worked in 2010
  • df_AvgAnnHrsWrk_2013 - Average annual hours worked in 2013
  • df_AvgAnnHrsWrk_2016 - Average annual hours worked in 2016
  • df_AvgAnnHrsWrk_2019 - Average annual hours worked in 2019
Then merge into
  • training_df_AvgAnnHrsWrk - Average annual hours worked for training set (2013-2018)
  • testing_df_AvgAnnHrsWrk - Average annual hours worked for testing set (2019-2021)
Not being used anymore
  • df_AvgAnnHrsWrk_below_2016 - Average annual hours worked below 2016
  • df_AvgAnnHrsWrk_above_2017 - Average annual hours worked above 2017

Variable names involve during the analysis

Division of into new dataset based on the group of Characteristics

  • testing_df_WagesAndSalaries_ByAge - Wages and Salaries By Age For Testing set
  • testing_df_WagesAndSalaries_ByGender - Wages and Salaries By Gender Group For Testing set
  • testing_df_WagesAndSalaries_ByEducation - Wages and Salaries By Education level For Testing set
  • testing_df_WagesAndSalaries_ByImmigrant - Wages and Salaries By Immigrant level For Testing set
  • testing_df_WagesAndSalaries_ByIndigenous - Wages and Salaries By Indigenous status For Testing set

Division of into new dataset based on the provinces

  • testing_df_AvgAnnHrsWrk_ByAge_Provinces - Average annual hours worked for testing set by age group grouped by provinces
  • testing_df_AvgAnnHrsWrk_ByGender_Provinces - Average annual hours worked for testing set by gender grouped by provinces
  • testing_df_AvgAnnHrsWrk_ByEducation_Provinces - Average annual hours worked for testing set by education level grouped by provinces
  • testing_df_AvgAnnHrsWrk_ByImmigrant_Provinces - Average annual hours worked for testing set by immigrant status grouped by provinces
  • testing_df_AvgAnnHrsWrk_ByIndigenous_Provinces - Average annual hours worked for testing set by indigenous status grouped by provinces

ProvinceAnalysis(df_AvgAnnHrsWrk_201x_ByAge, pd, np, pp) - Create new object using ProvinceAnalysis using datasets and other necessary part.
Variables:

  • self.df = Dataset, the dataset that import
  • self.provinces = array of provinces
  • self.indicators = array of indicators
  • self.characteristics = array of characteristics
  • self.year = array of years being analysis
  • self.dfProvinces = array of analysis based of division by provinces, do analysis from the df Dataset
Methods:
  • outputAnalysis(province_id) - Output detail analysis including sum, mean, and skewness.
  • outputAnalysisSimple(province_id) - Summarized the output details.
  • outputList(province_id, num) - Output first "num" amount of dataset.
  • outputPandaProfiling(province_id) - Do Panda profiling for specific provinces in specific year.

Province Code [0-13]: ['Alberta', 'BC', 'GEO = Canada' , 'Manitoba' , 'New Brunswick', 'Newfoundland', 'Northwest Territories' , 'Nova Scotia' , 'Nunavut', 'Ontario' , 'PEI', 'Quebec', 'Saskatchewan', 'Yukon']

OutputProvinceAnalysis(df_AvgAnnHrsWrk_201x_ByAge_Provinces, ProCode, "201x", pd, np, pp) - Create new object using ProvinceAnalysis using dataset and other necessary part.

  • ProCode is code for the provinces mentions above.
  • "201x" here is the year of the analysis.
  • self.df_output - dataset that are analyzing
  • self.ProCode - province to analysis (in numeric code)
  • self.YearOutput - year that was analyized (more for panda-profiling)
  • OutputResult(self) - Display the result that was analyzed.
  • OutputPandaProfiling(self) - Do Panda Analysis in specific provinces

For custom output for provinces

For first input (variable categorized_province),

Input the province to analysis, full province name required. Otherwise, error sign will rise.

For second input,

From the numeric code below from 0 - 6 (variable list_indicator),

  • "0. Average annual hours worked"
  • "1. Average annual wages and salaries"
  • "2. Average hourly wage"
  • "3. Average weekly hours worked"
  • "4. Hours Worked"
  • "5. Number of jobs"
  • "6. Wages and Salaries"

Input the indicators required, numerics sign required, if not prompted, it will raise error.

Contents in this pages

  • Data_Anlaysis_x - Contain last modified work. Last one is Data_Analysis_v07.
  • 36100651-eng.zip - Contain original dataset employment of non-profit organizations.
  • 36100651.csv - Contain original dataset employment of non-profit organizations in csv file.
  • EDA_Report_v00.pdf - Inital EDA Report before spliting dataset
  • data_analysis_categorized_technical_report.ipynb - Contain techncial report in Jupiter Notebook
  • data_analysis_categorized_technical_report.py - contain technical report in Python file.
  • data_analysis_categorized_technical_report.html - contain technical report in html file.
  • data_analysis_categorized_technical_report.pdf - contain technical report in pdf file.

About

Analysis of number of people worked in non-profit organizations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors