Skip to content

liudmylaru/data-science-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Portfolio

Here is repository of data csience projects made by me presented in form of Jupyter Notebooks. Used data for the projects is in each project directory and for demonstration purposes only.

Contents

  • Deep Learning with ResNet50

    • Food-101
      Goals: automatically recognize pictured dishes.
      Methods: Residual Network ResNet50, TensorFlow, Keras, image load and show, image preprocess, callbacks to control training, schedule learning rate, data generator.
  • Postgres Database and Python

    • Storing Storm Data
      Goals: create database to store data about hurricanes with efficient way of sharing data, insert data into table.
      Methods: create tablein a smart way to store data, create users with restricted access, insert data crom CSV file directly from webpage.
  • Machine Learning in Python

    • Titanic competition
      Goals: explore a workflow to make competing in the Kaggle Titanic competition.
      Methods: preprocess and explore the data, engineer new features, select the best-performing features, select and tune different algorithms, make a submission to kaggle.

    • Building A Handwritten Digits Classifier
      Goals: build models that can classify handwritten digits.
      Methods: K-nearest neighbors model, neural network with one, two and three hidden layers.

    • Predicting Bike Rentals
      Goals: using detailed data on the number of bicycles people rent by the hour and day, predict the total number of bikes people rented in a given hour.
      Methods: explore correlated data, calculate features, linear regression, decision trees, random forests.

    • Predicting the stock market
      Goals: using historical data on the price of the S&P500 Index to make predictions about future prices.
      Methods: handle datetime data, use rolling function to generate indicators for model, LinearRegression.

    • Predicting House Sale Prices
      Goals: predict house sale price with housing data for the city of Ames, Iowa, United States from 2006 to 2010.
      Methods: set up a pipeline of functions, feature engineering, feature selection, train and test with LinearRegression model.

    • Predicting Car Prices
      Goals: predict a car's market price using its attributes.
      Methods: k-nearest neighbors algorithm.

  • Probability and Statistics in Python

    • Winning Jeopardy
      Goals: work with a dataset of Jeopardy questions to figure out some patterns in the questions that could help to win.
      Methods: normalize text and columns, find ovelapping answers and questions, apply chi-squared test for low-value and high-value questions.

    • Finding the Best Markets to Advertise In
      Goals: to find out the two best markets to advertise programming courses of the e-learning company.
      Methods: summarize distributions, measure the variability of a distribution.

    • Investigating Fandango Movie Ratings
      Goals: analyze more recent movie ratings data to determine whether there has been any change in Fandango's rating system after Hickey's analysis.
      Methods: sampling, variables, scales of measurement, and frequency distributions.

  • SQLite and Python

    • Designing and Creating a Database
      Goals: use statistics on baseball games from the 1800s (from https://www.retrosheet.org/) to design and create a database.
      Methods: import data into SQLite, design a normalized database schema, create tables for schema, insert data into schema.

    • Answering Business Questions using SQL
      Goals: analyze the Chinook database, which contains information about a fictional digital music shop - like a mini-iTunes store.
      Methods: sql query to extract the relevant data and create plots where necessary to visualize the data.

    • Analyzing CIA Factbook Data
      Goals: use Python SQLite workflow to explore, analyze, and visualize data from CIA Factbook Data.
      Methods: select summary statistics, outliers, use subqueries, cast resuls.

  • Python for Data Analysis

    • Star Wars Survey
      Goals: use survey data to declare that “The Empire Strikes Back” is the best of the Star Wars movies.
      Methods: clean and mmap yes/no, checkbox and rank columns, use mean() and sum() on columns to find highest-ranked and most seen movie, use binary segments (gender) to analize data.

    • Analyzing NYC High School Data
      Goals: compare demographic factors such as race, income, and gender with SAT scores to determine whether the SAT is a fair test.
      Methods: plot bar of the correlations, scatter plot, make a map by district.

    • Clean And Analyze Employee Exit Surveys
      Goals: combine the results for surveys from employees in two depatments of institute to answer the qestions about reasons of resigning.
      Methods: clean data with vectorized string methods, transform data with apply() and applymap(), drop missing or unnecessary values with fillna(), dropna(), and drop(), combine data with concat().

    • Visualizing The Gender Gap In College Degrees
      Goals: visualizing the gender gap across college degrees in STEM fields, which stands for science, technology, engineering, and mathematics.
      Methods: subplot grid layout, hide x-axis labels, set y-axis labels, add a horizontal line, export to a file.

    • Visualizing Earnings Based On College Majors
      Goals: explore and analize dataset on the job outcomes of students who graduated from college using visualizations.
      Methods: pandas, scatter plots, histograms, scatter matrix plot, bar plots.

    • Exploring Ebay Car Sales Data
      Goals: clean the data and analyze the included used car listings.
      Methods: map column names, convert string columns to numeric, work with outliers, explore datetime data, handle incorrect years data, aggregate and combine data to explore.

    • Exploring Hacker News Posts
      Goals: analyse "Ask" and "Show" posts from Hacker News site to determine which more popular and find out is there a better time to publish posts to get more reads.
      Methods: manipulate strings, work with timedate datatype, use object-oriented concepts.

    • App Profiles for the App Store and Google Play Markets
      Goals: analyze data about approximately 10,000 Android apps from Google Play and approximately 7,000 iOS apps from the App Store to understand what type of apps are likely to attract more users.
      Methods: clean data, select relevant to the goal data, work with frequency tables.

About

Portfolio of my data science projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published