Crit Findings Log

Jump to bottom

ysc321 edited this page Nov 13, 2017 · 8 revisions

Crit #1 (9/26)

A persistent question in our project is audience: we began with our project with a specified use (historical, handwritten medical notes). During our ideation phase, we explored expanding our potential user to different industries: legal, administrative, etc. The crit advisors recommended highly that we keep our focus on doctors.
Focus on the wedge: How are we going to do this project? Do we need to transcribe all the words? How are we going to tackle this problem differently then the N other groups working on this problem.
Specificity: we already know that our models will not give us 100% accuracy. The technical difficulty of this problem sets and our time constraint puts hard boundaries on what is possible for our project. Therefore, our crit advisors recommended that we train our models on specific words and phrases, possibly building a tiering system for indexed documents.
To further the point on specificty, if there is a way to presort the documents, we should pursue that option.
Ask ourselves: what is step 2? The crit advisors told us to assume the model's ceiling (e.g. 40% recall rate), and work from there.
Unlike many other HMW questions that the advisors saw, for our project, scale is not our friend. Focus on constraining problem to make it manageable.
Focus on steps after the algorithm: if our the model does well with task X, Y and Z, then tailor our solution around those tasks.

Crit #2 (10/24)

The critters emphasized the distinction between historical and present handwritten documents. Since writing trends change over time (due to handwriting education, advancements in writing utensils, etc.), our models may not be as useful for modern day handwritten documents. Being more specific in our potential users may help direct our project.
The business idea of taking unstructured data and generating data for processing (especially from transcription) already exists--researching such companies may help motivate our business idea.
Ancestry.com does a ton of work using historical handwritten documents--the critters suggested that we study their methods and find out how much the company relies on human transcription.
Iron Mountain was another firm that was mentioned as a potential inspiration for the use of data.
Finding out what is the highest value dataset for hospitals are and directing our project toward catering to that niche may be a viable business solution.
The critters discussed the idea of "do things that won't scale"--i.e. forcing makeshift and temporary solutions to pain points to iterate quickly.

Crit #3 (11/7)

Investigate what groups such as the United States Postal Service, United States Patent and Trademark Office or online pharmacies that process massive amounts of handwritten text data are doing.
Game-ifying the human transcription element (in a CAPTCHA-like logic) to help entice human actors to transcribe text that the algorithms cannot. From a business standpoint, this might help create an in-house mTurk alternative or at least create different incentive structures to solve issues that we cannot solve (immediately) via technology. Issues with HIPAA and feasibility, danger ahead.
Ask ourselves whether we want to begin working from the data we have or from what the industry wants.
Follow what trends are in healthcare right now--personalization, data democratization etc.