This is the software for the Early Detection Research Network (EDRN) public portal and knowledge environment. It nominally runs the site at https://edrn.nci.nih.gov/
First, clone the repository and then run pre-commit install
to get the pre-commit hooks installed.
To develop the portal software for the Early Detection Research Network, you'll need Python, PostgreSQL, Elasticsearch, Redis, and a couple of environment variables. Note that these environment variables should be provided in the development environment, by the continuous integration, by the containerization system, etc. They must be set always:
Variable Name | Use | Value |
---|---|---|
DATABASE_URL |
URL to the database where the portal persists data | postgresql://:@/edrn |
LDAP_BIND_PASSWORD |
Credential for the EDRN Directory service user |
Contact the directory administrator |
Next, set up a PostgreSQL database:
$ createdb edrn
This has to be done just for the first time—or if you ever get rid of the database with dropdb edrn
. Then, set up the software and database schema and content:
$ python3 -m venv .venv
$ .venv/bin/pip install --quiet --upgrade pip setuptools wheel build
$ .venv/bin/pip install --editable 'src/eke.geocoding[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.streams[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.controls[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.content[dev]'
$ .venv/bin/pip install --editable 'src/edrn.collabgroups[dev]'
$ .venv/bin/pip install --editable 'src/eke.knowledge[dev]'
$ .venv/bin/pip install --editable 'src/eke.biomarkers[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.search[dev]'
$ .venv/bin/pip install --editable 'src/edrn.theme[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.ploneimport[dev]'
$ .venv/bin/pip install --editable 'src/edrn.metrics[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.policy[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.test[dev]'
$ .venv/bin/django-admin makemigrations --pythonpath . --settings local
$ .venv/bin/django-admin migrate --pythonpath . --settings local
$ .venv/bin/django-admin createsuperuser --pythonpath . --settings local --username root --email [email protected]
When prompted for a password, enter a suitably secure root-level password for the Django super user (twice).
👉 Note: This password is for the application server's "manager" or "root" superuser and is unrelated to any usernames or passwords used with the EDRN Directory Service. But because it affords such deep and penetrative access, it must be kept double-plus super-secret probationary secure.
Then, to run a local server so you can point your browser at http://localhost:8000/ simply do:
$ .venv/bin/django-admin runserver --pythonpath . --settings local
You can also visit the Wagtail admin at http://localhost:8000/admin/ and the Django admin at http://localhost:8000/django-admin/
To see all the commands besides runserver
and migrate
that Django supports:
$ .venv/bin/django-admin help --pythonpath . --settings local
This repository provides a Taskfile.yaml
which lets you use Taskfile.dev to simplify a lot of commands. For example, much of the above can be done with
task run
which builds the virtual environment, installs the software, and starts the server. Run
task --list
to see more. The environment variables used by the Taskfile (below) should be in a .env
file.
Here is a table of the environment variables that may affect the portal server (some of these have explicit values depending on context, such as containerization):
Variable | Use | Default |
---|---|---|
ALLOWED_HOSTS |
What valid hostnames to serve the site on (comma-separated) | .nci.nih.gov,.cancer.gov |
AWS_ACCESS_KEY_ID |
Amazon Location Service account access key | Unset |
AWS_SECRET_ACCESS_KEY |
Amazon Location Service secret access key | Unset |
BASE_URL |
Full URL base for generating URLs in notification emails | https://edrn.nci.nih.gov/ |
CACHE_URL |
URL to the caching & message brokering service | redis:// |
CSRF_TRUSTED_ORIGINS |
Comma-separated list of origins we implicity trust in form req | http://*.nci.nih.gov,https://*.nci.nih.gov |
DATABASE_URL |
URL to persistence | Unset |
ELASTICSEARCH_URL |
Where the search engine's ReST API is | http://localhost:9200/ |
FORCE_SCRIPT_NAME |
Base URI path (Apache "script name") if app is not on / |
Unset |
LDAP_BIND_DN |
Distinguished name to use for looking up users in the directory | uid=service, dc=edrn, dc=jpl, dc=nasa, dc=gov |
LDAP_BIND_PASSWORD |
Password for the LDAP_BIND_DN |
Unset |
LDAP_CACHE_TIMEOUT |
How many seconds to cache directory lookups | 3600 seconds (1 hour) |
LDAP_URI |
URI to locate the EDRN Directory Service | ldaps://edrn-ds.jpl.nasa.gov |
MEDIA_ROOT |
Where to save media files | Current dir + /media |
MEDIA_URL |
URL prefix of media files; must end with / |
/media/ |
MQ_URL |
URL to the message queuing service | redis:// |
RECAPTCHA_PRIVATE_KEY |
Private key for reCAPTCHA | Unset |
RECAPTCHA_PUBLIC_KEY |
Public key for ereCAPTCHA | Unset |
SECURE_COOKIES |
True for secure handling of session and CSRF cookies |
True |
SIGNING_KEY |
Cryptographic key to protect sessions, messages, tokens, etc. | Unset in operations; set to a known bad value in development |
STATIC_ROOT |
Where to collect static files | Current dir + /static |
STATIC_URL |
URL prefix of static files; must end with / |
/static/ |
Note that the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
can be set through-the-web; the environment variables are just a fallback. Sadly, neither RECAPTCHA_PRIVATE_KEY
nor RECAPTCHA_PUBLIC_KEY
can due to limitations of wagtail-django-recaptcha.
This section describes how you'd use Apache HTTPD with Jenkins in order to make the site accessible to the world.
First up, the HTTPD configuration:
WSGIDaemonProcess edrnportal user=edrn group=edrn python-home=/usr/local/edrn/portal/p5-renaissance/venv
WSGIProcessGroup edrnportal
WSGIScriptAlias /portal/renaissance /usr/local/edrn/portal/p5-renaissance/jenkins.wsgi process-group=edrnportal
Alias /portal/renaissance/media/ /usr/local/edrn/portal/p5-renaissance/media/
Alias /portal/renaissance/static/ /usr/local/edrn/portal/p5-renaissance/static/
<Directory "/usr/local/edrn/portal/p5-renaissance">
<IfVersion < 2.4>
Order allow,deny
Allow from all
</IfVersion>
<IfVersion >= 2.4>
Require all granted
</IfVersion>
</Directory>
<Directory "/usr/local/edrn/portal/p5-renaissance/static/">
Options FollowSymLinks
</Directory>
Next, here's the jenkins.wsgi
that was referenced in the HTTPD configuration above (Jenkins should generate this with each build):
from django.core.wsgi import get_wsgi_application
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'edrnsite.policy.settings.ops')
os.environ.setdefault('LDAP_BIND_DN', 'uid=service,dc=edrn,dc=jpl,dc=nasa,dc=gov')
os.environ.setdefault('LDAP_BIND_PASSWORD', 'REDACTED')
os.environ.setdefault('SIGNING_KEY', 'REDACTED')
os.environ.setdefault('DATABASE_URL', 'postgresql://:@/edrn')
os.environ.setdefault('ALLOWED_HOSTS', '.jpl.nasa.gov')
os.environ.setdefault('STATIC_ROOT', '/usr/local/edrn/portal/p5-renaissance/static')
os.environ.setdefault('MEDIA_ROOT', '/usr/local/edrn/portal/p5-renaissance/media')
os.environ.setdefault('BASE_URL', 'https://edrn-dev.jpl.nasa.gov/portal/renaissance/')
os.environ.setdefault('STATIC_URL', '/portal/renaissance/static/')
os.environ.setdefault('MEDIA_URL', '/portal/renaissance/media/')
os.environ.setdefault('SECURE_COOKIES', 'False')
os.environ.setdefault('ELASTICSEARCH_URL', 'http://localhost:9200/')
os.environ.setdefault('CACHE_URL', 'redis://')
# We don't need FORCE_SCRIPT_NAME here since Apache's WSGISCriptAlias does the right thing
application = get_wsgi_application()
Finally, this needs to be run on each deployment:
$ venv/bin/django-admin collectstatic --settings erdnsite.policy.settings.ops --clear --link
$ mkdir media
To use this software in a Docker container environment, first collect the wheels by running:
support/build-wheels.sh
Or by hand:
.venv/bin/python -m build --outdir dist src/eke.geocoding
.venv/bin/python -m build --outdir dist src/edrnsite.streams
.venv/bin/python -m build --outdir dist src/edrnsite.controls
.venv/bin/python -m build --outdir dist src/edrnsite.content
.venv/bin/python -m build --outdir dist src/edrn.collabgroups
.venv/bin/python -m build --outdir dist src/edrn.theme
.venv/bin/python -m build --outdir dist src/edrnsite.search
.venv/bin/python -m build --outdir dist src/eke.knowledge
.venv/bin/python -m build --outdir dist src/eke.biomarkers
.venv/bin/python -m build --outdir dist src/edrnsite.ploneimport
.venv/bin/python -m build --outdir dist src/edrnsite.policy
You don't need src/edrnsite.test
since it's just used for testing.
Repeat this for any other source directory in src
. Then build the image:
docker image build --build-arg user_id=NUMBER --tag edrndocker/edrn-portal:latest --file docker/Dockerfile .
Replace NUMBER
with the number of the user ID of the user under which to run the software in the container. Typically you'll want
- 500 for running at the Jet Propulsion Laboratory.
- 26013 for running at the National Cancer Institute.
Or do it all at once with Taskfile:
task image
That builds the wheels and the image for you.
Spot check: see if the image is working by running:
docker container run --rm --env LDAP_BIND_PASSWORD='[REDACTED]' --env SIGNING_KEY='s3cr3t' \
--env ALLOWED_HOSTS='*' --publish 8000:8000 edrndocker/edrn-portal:latest
and visit http://localhost:8000/ and you should get Sever Error (500)
since the database connection isn't established.
For a Docker Composition, the accompanying docker/docker-compose.yaml
file enables you to run the orchestrated set of needed processes in production, including the portal, maintenance worker, search engine, cache and message queue, and a database. You can launch all the processes at once with docker compose up
when at the National Cancer Institute. For local development, use task compose-up
.
Next, establish an SSH tunnel to the EDRN Directory Service ldaps://edrn-ds.jpl.nasa.gov
to localhost port 1636, so that ldaps://localhost:1636
will be the EDRN Directory Service—and therefore ldaps://host.docker.internal:1636
is also be the EDRN Directory Service, but from a container's point of view.
Finally, pick a convenient directory to contain the database (the PostgreSQL database plus the media blobs and static files); @nutjob4life uses ~/dockerdata/renaissance
. Call this the $EDRN_DATA_DIR
. Copy the daily database copies from production to tumor.jpl.nasa.gov as follows:
$ mkdir ${HOME}/dockerdata/renaissance
$ export EDRN_DATA_DIR=${HOME}/dockerdata/renaissance
$ env WORKSPACE=$EDRN_DATA_DIR support/sync-from-dev.sh
Feel free to do this as often as you like. Monthly is more than sufficient unless you're looking for some specific items recently added to the production database.
You'll also need to ensure your Docker environment can support the EDRN P5 application stack, which is large. If you're using Docker Desktop, you might want to adjust the settings as follows (under Settings → Resources):
- CPU limit: 20
- Memory limit: 48 GB
- Swap: 2 GB
- Disk usage limit: 96 GB
Ater setting the needed variables and resource limits, start the composition as follows:
task compose-up
You can now proceed to set up the database, search engine, and populate the portal with its content.
Next, we need to set up the database with initial structure and content. This section tells you how.
To set up the initial database and its schema inside a Docker Composition, we start by deleting and then creating the database and loading the current production data:
rm -rf $EDRN_DATA_HOME/postgresql
mkdir $EDRN_DATA_HOME/postgresql
task compose -- exec db createdb --username=postgres --encoding=UTF8 --owner=postgres edrn
bzip2 --decompress --stdout $EDRN_DATA_HOME/edrn.sql.bz2 | task compose -- \
exec --no-tty db psql --username=postgres --dbname=edrn --echo-errors
We can then upgrade the database schema to the latest software, fix any database tree issues, and collect the static files:
task compose -- exec portal /app/bin/django-admin migrate
task compose -- exec portal /app/bin/django-admin fixtree
task compose -- exec portal /app/bin/django-admin collectstatic --no-input --clear
Normally we'd also sync the LDAP groups, but this command will fail until the number of groups in EDRN is reduced; so just carry right on:
task compose -- exec portal /app/bin/django-admin ldap_group_sync
Then you can point a browser at https://localhost:2348/ (or whatever the HTTPS_PORT
is) and see if it worked. Note that this uses a self-signed certificate so ignore any certificate warnings.
The Docker Composition itself is not enough, of course. The last step is to set up an actual web server to accept requests, serve static and media files, reverse-proxy to the portal container, handle TLS/SSL encryption, load balancing, and so forth.
The web server is also responsible for serving media files and static assets. This is for efficiency: there's no need to involve the backend content management system for such files (which can be large). Furthermore, by giving the server direct filesystem access, it can use the sendfile
system call, which is blazingly efficient.
In a nutshell, the web server must serve MEDIA_URL requests to the MEDIA_ROOT directory (which is EDRN_DATA_DIR/media), STATIC_URL requests to the STATIC_ROOT directory (which is EDRN_DATA_DIR/static), and all other requests reverse-proxied to the EDRN_PUBLISHED_PORT (or EDRN_TLS_PORT if you're using it) TCP socket.
How you configure an Elastic Load Balancer, Application Load Balancer, Nginx, Apache HTTPD, or other web server to handle reverse-proxying to the portal container as well as serving static and medial files depends on the software in use. In the interests of including a working example, though, see the following Nginx configuration:
server {
listen …;
location /media/ { # Request = http://whatever/media/documents/sentinel.dat
root /local/web/content/edrn; # Response = /local/web/content/edrn/media/documents/sentinel.dat
}
location /static/ { # Request = http://whatever/static/edrn.theme/css/edrn-overlay.css
root /local/web/content/edrn; # Response = /local/web/content/edrn/static/edrn.theme/css/edrn-overlay.css
}
location / { # All other requests go to the portal container
proxy_pass http://localhost:4135; # EDRN_PUBLISHED_PORT = 4135
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect default;
}
}
Normally, the EDRN portal is hosted on the root path /
of a host; for example, in production it's at https://edrn.nci.nih.gov/
. However, for certain demonstrations and other expositions, it may be necessary to host it on a "subpath", such as https://edrn.jpl.nasa.gov/portal/renaissance/
. Here, the subpath is /portal/renaissance/
.
Depending on the web server, you may not need to do anything to support such a configuration, because the web server recognizes the subpath and sets the SCRIPT_NAME
environment variable to the subpath (this is the case for mod_wsgi). Others, such as reverse-proxies, make no assumptions and make no such setting. When this is the case, you can set the FORCE_SCRIPT_NAME
environment variable in the Docker composition to force the portal to believe a SCRIPT_NAME
was set even when it wasn't.
As an example, if the web server is reverse-proxying to the Docker composition for URLs such as https://edrn.jpl.nasa.gov/portal/renaissance/
, then we'd set FORCE_SCRIPT_NAME
when starting the composition to /portal/renaissance/
.
To develop for this system, you'll need
- PostgreSQL 17 or later
- Python 3.12 or later, but not 4.0 or later
- Elasticsearch 7.17 or later, but not 8.0 or later
- Redis 7.0 or later, but not 8.0 or later
You can start by looking at the open issues, forking the project, and submitting a pull request. You can also contact us by email with suggestions.
We use the SemVer philosophy for versioning this software. For versions available, see the releases made on this project. We're starting off with version 5 because reasons.
The principal developer is:
The QA team consists of:
To contact the team as a whole, email the Informatics Center.
The project is licensed under the Apache version 2 license.
- Image by OpenClipart-Vectors from Pixabay
- Image by OpenClipart-Vectors from Pixabay