Add obfuscator - Second Draft

## State of first draft data obfuscation:
 - We have a logging obfuscation function where we simulate the behaviours of patients logging their meal
  1. All meals - keep all meals for now

  2. Multiple meals per day (1-2 largest meals) - Find a threshold so that we have an average of 1.8 meals logged per day

  3. Once per day (largest meal) - Find the largest one in a day

  4. A few times per week - Find a threshold so that we have an average of 3 meals logged per week

  5. Never - Wipe all data

 - We have a logging timing habit function where we simulate the habits of patients logging when theyare  actually log their meals.
  1. Temporally right skewed -> forgetful loggers - Gamma function with right-skewed.   Fixed value distribution with minor randomness. 

  2. Temporally left skewed -> hasty loggers - Gamma function with left-skewed (less skewed because a patient probably won't log their meal too early most of the time) - Fixed value distribution with minor randomness. 

  4. Normal Distribution - Gaussian distribution with fixed valued spread

  6. Unchanged

## Data flow: 
`data/raw/sim` -> ` logging obfuscation function ` to create `msg_type_log` -> `logging timing habit function` to create 'msg_type_log_shifted` from `msg_type_log` -> `data/raw/obfuscated`

## Improvement:
1. Find out the right distribution between each type of user for both functions. For example, loggers who might log all of their meal consist of 25% rather than 30%. 
2. Fine-tune the default distribution (we need a better param for gamma distribution to reflect the true behaviour of patients) or find a better distribution. 
3. Left and right skewed distribution should be different. For hasty loggers, maybe on average, they log their meals 10 mins early and probably wouldn't be longer than that but for forgetful loggers, it may go up to >40 mins.

4. Remove the original csv file when generating a new file name (bug)
5. Investigate new line characters at the end of some files (bug?)
6. Clean up columns from the `simulation_data_generation` script. We have `Unnamed: 0` column maybe we should have dropped it. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add obfuscator - Second Draft #202

State of first draft data obfuscation:

Data flow:

Improvement:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add obfuscator - Second Draft #202

Description

State of first draft data obfuscation:

Data flow:

Improvement:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions