Skip to content

Latest commit

 

History

History
88 lines (63 loc) · 4.97 KB

README_Project_Instructions.md

File metadata and controls

88 lines (63 loc) · 4.97 KB

IronHack Logo

Project: API and Web Data Scraping

Content

Project Description

The goal of this project is for you to perform a data collection from open sources, or basically practice what you have learned in the APIs and Web Scraping chapter. For this project, you will choose both an API to obtain data from and a web page to scrape.

You will need to collect a database based on the chosen web-site. Depending on the structure of the site, you will need to perform calls to API, scrape webpages or define a way to obtain NoSQL database directly from the server. Be sure to have a clean and normalized database in the end. Finally, export the database into CSV file(s) and MySQL database.

You will be working individually for this project, but we'll be guiding you along the process and helping you as you go. Show us what you've got!


Project Goals

During this project you will:

  • Manage your own git repository.
  • Build your own code from scratch.
  • Get your hands on data collection and efficient code.
  • Put into practice data processing concepts learned so far.
  • Practice public presentations skills.

Technical Requirements

The technical requirements for this project are as follows:

  • You must obtain data website or webapp.
  • You must clean and normalize your database.
  • You must have at least 200 rows and 8 columns 9in the final clean database. More data is always welcome.
  • The result should be stored in CSV format and sql format.
  • Your code should be saved in a Jupyter Notebook and your results should be saved in a folder named output.
  • You should include a README.md file that describes the steps you took and your thought process for obtaining data from the API and web page.

Necessary Deliverables

The following deliverables should be pushed to your Github repo for this chapter.

  • A Jupyter Notebook (.ipynb) file that contains the code used to get the data.
  • An output folder containing the outputs of your API and scraping efforts.
  • A README.md file containing a detailed explanation of your approach and code for retrieving data from the API and scraping the web page as well as your results, obstacles encountered, and lessons learned.

Presentation

The presentation time limit is 10 minutes! You will have 7 minutes to present your project to the class and then 3 minutes for Q&A.

The slides of your presentation must include the content listed below:

  • Title of the project + Student name
  • Description of your idea and project
  • Challenges
  • Process
  • Learnings
  • If I were to start from scratch...
  • Improvements
  • Highlights

Tip: you have only 7 minutes for this presentation so keep it simple!

Suggested Ways to Get Started

  • Define a problem - think what exactly you are willing to study. Prices on Black Friday? Biggest discounts? Bees population of Paris? Global warming? Select your topic based on your points of interest and search for websites that contain some useful information.
  • Confirm the difficulty level of your website with teacher - Some websites are nuts to crack. We don't expect you hacking the CIA website, so validate the website with your teacher.
  • Break the project down into different steps - note the steps covered in the API and web scraping lessons, try to follow them, and make adjustments as you encounter the obstacles that are inevitable due to all APIs and web pages being different.
  • Use the tools in your tool kit - your knowledge of intermediate Python as well as some of the things you've learned in previous chapters. This is a great way to start tying everything you've learned together!
  • Work through the lessons in class & ask questions when you need to! Think about adding relevant code to your project each night, instead of, you know... procrastinating.
  • Commit early, commit often, don’t be afraid of doing something incorrectly because you can always roll back to a previous version.
  • Consult documentation and resources provided to better understand the tools you are using and how to accomplish what you want.

Useful Resources