MTurk codebase for the study "SQL vs. Visual Diagrams on time and correctness matching relational query patterns"
Code and instructions for running the study "SQL vs. Visual Diagrams on time and correctness matching relational query patterns" using Amazon Mechanical Turk (MTurk), Heroku, and Postgres.
- Remarks
- MTurk Initial Setup and Overview
- Useful Commands
- Instructions for dealing with MTurk interactions
Notice that some fields such as: DATABASE_URL
, AWS_ACCESS_KEY_ID
, and AWS_SECRET_ACCESS_KEY
need to be specified accordingly when setting up the Postgres database on Heroku and using AWS keys with MTurk.
!!Warning!! Tutorial time is not currently captured correctly due to a database bug.
- Register on https://requester.mturk.com/ for deployment and https://requester.mturk.com/developer/sandbox for testing.
- Deploy to Heroku by committing and pushing the repository with
git push heroku master.
- Run
post_hits.py
to post the hits on Amazon Mechanical Turk - Amazon Mechanical Turk will post your HIT, and IFrame your URL when a user accepts it.
- Once a user completes the HIT it will be logged in the database. For more options, check the hit_manager.py
-
Update WSL
wsl --update
-
Install Unbuntu from the Windows store to keep it current.
-
With VSCode open from Windows, install https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack
-
Within WSL Bash, run VSCode from the folder with:
code .
(See https://code.visualstudio.com/docs/remote/wsl for details.)
-
Upgrade packages:
sudo apt update sudo apt upgrade
-
Upgrade Python to 3.11 (necessary for Heroku deployment)
sudo apt install software-properties-common sudo add-apt-repository ppa:deadsnakes/ppa sudo apt update sudo apt install python3.11-full python3.11-dev python3.11-venv gcc python3.11 -m ensurepip
!Danger! Don't do the following unless you want to risk breaking your terminal! But it does let you set the default
python3
to bepython3.11
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
And choose which one to use as Python3 via the command:
sudo update-alternatives --config python3
-
Install Postgres (and libssl-dev)
sudo apt install postgresql postgresql-contrib libssl-dev
-
Optional: Install pgadmin for managing the DB:
-
If using WSL, install on Windows by downloading from pgadmin.org. See details at StackOverflow.
-
If pure Ubuntu:
sudo apt install pgadmin4
-
-
Set a postgres Ubuntu user password:
sudo passwd postgres
E.g.,
it56uZ
. -
Set a postgres database user password:
sudo -u postgres psql
Inside the
psql
shell, set the password. Make sure to set your own value forNEWPASSWORD
before running:ALTER USER postgres PASSWORD 'NEWPASSWORD';
-
Set up the postgres databases and user for the app. Still in the
psql
shell:-
Create the database and list the ones present.
CREATE DATABASE rdstudy; \l
-
Then, create the user
flask
. Make sure to set your own value forNEWPASSWORD
before running:CREATE USER flask WITH PASSWORD 'NEWPASSWORD'; GRANT ALL PRIVILEGES ON DATABASE rdstudy to flask;
-
Exit
psql
by running:\q
-
-
See status with
service postgresql status
-
Start the server with
sudo service postgresql start
-
To avoid getting connection refused errors, edit the
postgresql.conf
file.-
Locate the conf file:
sudo -u postgres psql -c 'SHOW config_file'
-
Edit the file. E.g.:
sudo nano /etc/postgresql/14/main/postgresql.conf
-
In the file, uncomment
listen_addresses
and change it like so:listen_addresses = '*'
-
Then restart postgres using
sudo service postgresql restart
-
-
-
Create a
.env
file that holds your environmental variables.-
Generate a
FLASK_SECRET_KEY
, e.g., running this in the Python interpreter:import os os.urandom(24) '\xfd{H\xe5<\x95\xf9\xe3\x96.5\xd1\x01O<!\xd5\xa2\xa0\x9fR"\xa1\xa8' print(os.urandom(24).hex())
-
Then fill out the
.env
file something like this, ensuring that you fill in the values forXXXXX
below. Use:AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
from https://requestersandbox.mturk.com/developer for the Sandbox or https://requester.mturk.com/developer for live deployment.- The
FLASK_SECRET_KEY
you generated - The password you set for the postgres
flask
account as part ofLOCAL_SQLALCHEMY_DATABASE_URI
.
FLASK_DEBUG=True FLASK_APP=rd_study_server.py LOCAL=True TESTING=True AWS_SANDBOX=True AWS_ACCESS_KEY_ID=XXXXX AWS_SECRET_ACCESS_KEY=XXXXX AWS_CHECK_QUAL=True AWS_ALLOW_QUAL_ERROR=True FLASK_SECRET_KEY=XXXXX LOCAL_SQLALCHEMY_DATABASE_URI=postgresql://flask:XXXXX@localhost:5432/rdstudy SQLALCHEMY_TRACK_MODIFICATIONS=False WEB_CONCURRENCY=2
-
-
Install Python requirements in a virtual environment.
-
Install wheel for building packages and ensure
libpq-fe.h
is available from libpq-dev:python3.11 -m pip install wheel sudo apt-get install --reinstall libpq-dev
-
Create the virtual environment:
sudo python3.11 -m venv env source env/bin/activate
-
Install wheel and then the requirements:
python3.11 -m pip install -r requirements.txt
-
Run db_create.py
to populate the database. This currently only works with running debugging from VSCode..., i.e., debugpy
. It is unclear why...
sudo -u postgres psql -d rdstudy
in postgres
\dt
shows you the tables and SELECT * FROM USERS;
shows you an empty table with columns.
flask run
To view the running site, use, for example: http://127.0.0.1:5000/?workerId=AA&assignmentId=BB&hitId=CC
Create a pipeline on Heroku that will be created from GitHub.
Locally:
heroku plugins:install heroku-config
heroku login
to switch to our app
heroku domains -a rd-study
where rd-study
is the app name on Heroku.
This opens the website:
heroku open -a rd-study
To overwrite existing values, use
heroku config:push --file=.env.live -a rd-study -o
Warning: This fails silently if the file doesn't exist.
make sure to log in fresh to the latest deployment, then:
heroku run bash --app rd-study
python3 db_create.py
You can test it with gunicorn like so:
gunicorn --preload rd_study_server:app --log-file - --log-level=debug
To view the running site, use, for example:
- Local: http://127.0.0.1:8000/?workerId=AA&assignmentId=BB&hitId=CC
- Live https://rd-study.herokuapp.com?workerId=AA&assignmentId=BB&hitId=CC
For testing MTurk
heroku config:push --file=.env.sandbox -a rd-study -o
heroku ps:restart -a rd-study
For live MTurk
heroku config:push --file=.env.live -a rd-study -o
heroku ps:restart -a rd-study
Papertrail logging (paid)—Note that this plan has a 65MB/day limit which you can easily exceed even running 60 participants. We recommended you use a higher plan.
heroku addons:create papertrail:fixa
To export logs, you can use the scripts found in /logs/papertrail
.
Access Papertrail through the Heroku site.
To see the database with, e.g., PGAdmin:
-
Get the value of
DATABASE_URL
on Heroku:heroku config:get DATABASE_URL -a rd-study
It is of the form
postgres://USERNAME:PASSWORD@HOST:PORT/DATABASE
-
Set under Connection:
- Hostname/address:
HOST
- Port:
PORT
- Maintenance database:
DATABASE
- Username:
USERNAME
- Password:
PASSWORD
- Hostname/address:
-
Set under Advanced:
- DB restriction:
DATABASE
- DB restriction:
-
Click Save.
-
Navigate to the database > Schemas > public > Tables > users. Right-click and select View/Edit Data > All Rows.
Here are some options you can create:
- .env.local.sandbox for local development and sandbox grading
- .env.local.live for local development and MTurk live grading
- .env.sandbox.test to use for testing the MTurk Sandbox site.
- .env.sandbox for more production-ready testing on the MTurk Sandbox site. Turns off error display to users and requires qualifications.
- .env.live.test to use for the live MTurk website.
- .env.live to use for the live MTurk website. Turns off error display to users and requires qualifications.
Create your AWS account and an associated MTurk account.
- Ensure your environment variables are set. You can use a line like this to load one of the environment files into environment variables, in this case
.env.sandbox.text
:
set -o allexport && source .env.sandbox.test && set +o allexport
likewise, for the actual grading of the submitted HITs:
set -o allexport && source .env.live && set +o allexport
Note! All your .env
files need to have LF and not CRLF line endings for this to work properly. Otherwise, you'll get errors like
botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: Invalid header value
. You can check this with, e.g., cat -t .env.sandbox.text
.
You can check variables in general with printenv | grep AWS
.
Creates a qualification using questions from qualification_questions.xml
and answers from qualification_answers.xml
.
Uses the AWS_SANDBOX
, AWS_ACCESS_KEY_ID
, and AWS_SECRET_ACCESS_KEY
environment variables.
!!!WARNING!!! hard-coded text for the qualification details! Make sure to at least change the Name
and hard-coded bits in post_hits.py
.
Run in the terminal. Pass in one of these arguments:
test
: Creates the basic qualification.custom
: Creates a custom qualification for invited workers only, e.g., those who had errors taking the test.test_taken
: Creates a test taken qualification to eliminate workers who have taken the test previously.
E.g., inside the virtual environment, you'll need to run both:
python ./create_qualification.py test
python ./create_qualification.py test_taken
Record the QualificationId
s to use in post_hits.py
for the qualification_id
and taken_test_qualification_id
variables.
If you get a RequestError
about having a QualificationType
with this name already, you need to change the hard-coded Name=
part of the file or delete the existing qualification at
https://requestersandbox.mturk.com/qualification_types or https://requester.mturk.com/qualification_types.
Creates a HIT. Uses the AWS_SANDBOX
, AWS_ACCESS_KEY_ID
, and AWS_SECRET_ACCESS_KEY
environment variables.
!!!WARNING!!! hard-coded text!
!!!WARNING!!! The HITs you create programmatically here Do Not show up on the web management interface! Amazon has deprecated that feature—aargh!
-
Update the
<ExternalURL>
tag inexternal_question.xml
to be the URL of your Heroku app. -
Update all these hard-coded elements in
post_hits.py
(some docs on MTurk docs), and read the file!qualification_id
: The basic qualification.custom_qualification_id
: A custom qualification for invited workers.taken_test_qualification_id
: A test taken qualification to eliminate workers who previously took the test.base_pay
: The lowest level of reward.approval_percentage
minimum_qualification_score
title_str
description-str
MaxAssignments
LifetimeInSeconds
AssignmentDurationInSeconds
Run inside the virtual environment with one of these arguments:
full
: Regular full-duration HIT.pilot
: Shorter pilot HIT.custom WID QID
: Post a custom hit for the worker with IDWID
who has been given a custom qualification with IDQID
.
E.g.,
python ./post_hits.py full
Has lots of code for various things. Make sure to read the code before running it! Run in the terminal. Pass in one of these arguments followed by parameters:
summary
: Provides a summary of the last 100 hits
!!!Warning!!! Everything below needs to be checked to see if it needs a paginator added to handle more than 100 records.
balance
: Gets current prepaid HIT balance.clear
: Deletes all HITs except the ones in a !!!WARNING!!! hard-codedexcept_list
. Will auto-reject all assignments pending in the HIT!extend NUM
: AddNUM
more assignments. !!!WARNING!!! hard-codedhit_id
.hits_detail HID1 HID2
: Get details for two HIT IDs.get_assignments HID STATUS
: Get assignments for HIT IDHID
with statusSTATUS
one of['Approved', 'Rejected', 'Submitted']
.get_worker_id_list HID
: Get worker IDs for HIT with IDHID
that are Approved or Rejected.approve_qualifications QID
: Approve qualifications for qualification IDQID
. !!!WARNING!!! hard-codedaccept_list
inapprove_qualifications
definition.update_expiration HID
: Update the expiration for HIT IDHID
. !!!WARNING!!! hard-codedExpireAt
inupdate_expiration
definition.give_worker_qualification QID WID
: Give qualification with IDQID
to worker with IDWID
.set_taken_test_qualification QID WFILE
: Read worker IDs fromWFILE
which has one ID per line and ADD to each worker the qualification with IDQID
.remove_qualification QID WFILE
: Read worker IDs fromWFILE
which has one ID per line and REMOVE from each worker the qualification with IDQID
.get_workers_with_qualification QID
: List the workers with qualification IDQID
.get_qualification_score QID WID
: Get the qualification score on qualification with IDQID
for worker with IDWID
.notify_workers_with_qualification QFILE TFILE
: Notify all workers listed in the qualified workers fileQFILE
(one ID per line) that are in the file of workers that haven't taken the HITTFILE
(one ID per line). !!!WARNING!!! Hard-coded advertisement message.
Deals with submissions.
!!!Warning!!! Hard-coded messages to workers here, including the reject_message
variable.
Depends on the REMOTE_DATABASE_URI
environment variable being set to point to the Heroku Postges Database. Note: This will change regularly! There are two ways to get this value:
-
Access through Heroku site, e.g., https://dashboard.heroku.com/apps/rd-study/settings
-
Use the Heroku CLI:
heroku config:get DATABASE_URL -a rd-study
It is of the form
postgres://USERNAME:PASSWORD@HOST:PORT/DATABASE
Ensure the environment variables are set. E.g., for live payment:
set -o allexport && source .env.local.live && set +o allexport
!!!Warning currently these claim to update the DB saying who was paid but do not actually. Use the contents of the /logs folder to check status.
Pass in one of these arguments:
batch_grade HID
: Check Submitted assignments for a given HIT ID and Approve them.batch_grade_test HID
: Check Submitted assignments for a given HIT ID and Approve them.send_manual_bonus WID AID
: Send a bonus to a given worker ID for given assignment ID (because we accepted but didn't send a bonus the first time.)reject AID FEEDBACK
: Reject the given assignment ID with a given feedback. E.g., rejecting speeders.approve AID
: Approve the given assignment ID like normal.grade AID WID
: Grade and approve hits as necessary for a given assignment ID and worker ID
E.g., using your HID
:
python ./approve_hits.py batch_grade HID