Skip to content

EDRN/P5

Early Detection Research Network Portal

This is the software for the Early Detection Research Network (EDRN) public portal and knowledge environment. It nominally runs the site at https://edrn.nci.nih.gov/

🤓 Development

First, clone the repository and then run pre-commit install to get the pre-commit hooks installed.

To develop the portal software for the Early Detection Research Network, you'll need Python, PostgreSQL, Elasticsearch, Redis, and a couple of environment variables. Note that these environment variables should be provided in the development environment, by the continuous integration, by the containerization system, etc. They must be set always:

Variable Name Use Value
DATABASE_URL URL to the database where the portal persists data postgresql://:@/edrn
LDAP_BIND_PASSWORD Credential for the EDRN Directory service user Contact the directory administrator

Next, set up a PostgreSQL database:

$ createdb edrn

This has to be done just for the first time—or if you ever get rid of the database with dropdb edrn. Then, set up the software and database schema and content:

$ python3 -m venv .venv
$ .venv/bin/pip install --quiet --upgrade pip setuptools wheel build
$ .venv/bin/pip install --editable 'src/eke.geocoding[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.streams[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.controls[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.content[dev]'
$ .venv/bin/pip install --editable 'src/edrn.collabgroups[dev]'
$ .venv/bin/pip install --editable 'src/eke.knowledge[dev]'
$ .venv/bin/pip install --editable 'src/eke.biomarkers[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.search[dev]'
$ .venv/bin/pip install --editable 'src/edrn.theme[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.ploneimport[dev]'
$ .venv/bin/pip install --editable 'src/edrn.metrics[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.policy[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.test[dev]'
$ .venv/bin/django-admin makemigrations --pythonpath . --settings local
$ .venv/bin/django-admin migrate --pythonpath . --settings local
$ .venv/bin/django-admin createsuperuser --pythonpath . --settings local --username root --email [email protected]

When prompted for a password, enter a suitably secure root-level password for the Django super user (twice).

👉 Note: This password is for the application server's "manager" or "root" superuser and is unrelated to any usernames or passwords used with the EDRN Directory Service. But because it affords such deep and penetrative access, it must be kept double-plus super-secret probationary secure.

Then, to run a local server so you can point your browser at http://localhost:8000/ simply do:

$ .venv/bin/django-admin runserver --pythonpath . --settings local

You can also visit the Wagtail admin at http://localhost:8000/admin/ and the Django admin at http://localhost:8000/django-admin/

To see all the commands besides runserver and migrate that Django supports:

$ .venv/bin/django-admin help --pythonpath . --settings local

📋 Taskfile.dev

This repository provides a Taskfile.yaml which lets you use Taskfile.dev to simplify a lot of commands. For example, much of the above can be done with

task run

which builds the virtual environment, installs the software, and starts the server. Run

task --list

to see more. The environment variables used by the Taskfile (below) should be in a .env file.

🍃 Environment Variables

Here is a table of the environment variables that may affect the portal server (some of these have explicit values depending on context, such as containerization):

Variable Use Default
ALLOWED_HOSTS What valid hostnames to serve the site on (comma-separated) .nci.nih.gov,.cancer.gov
AWS_ACCESS_KEY_ID Amazon Location Service account access key Unset
AWS_SECRET_ACCESS_KEY Amazon Location Service secret access key Unset
BASE_URL Full URL base for generating URLs in notification emails https://edrn.nci.nih.gov/
CACHE_URL URL to the caching & message brokering service redis://
CSRF_TRUSTED_ORIGINS Comma-separated list of origins we implicity trust in form req http://*.nci.nih.gov,https://*.nci.nih.gov
DATABASE_URL URL to persistence Unset
ELASTICSEARCH_URL Where the search engine's ReST API is http://localhost:9200/
FORCE_SCRIPT_NAME Base URI path (Apache "script name") if app is not on / Unset
LDAP_BIND_DN Distinguished name to use for looking up users in the directory uid=service, dc=edrn, dc=jpl, dc=nasa, dc=gov
LDAP_BIND_PASSWORD Password for the LDAP_BIND_DN Unset
LDAP_CACHE_TIMEOUT How many seconds to cache directory lookups 3600 seconds (1 hour)
LDAP_URI URI to locate the EDRN Directory Service ldaps://edrn-ds.jpl.nasa.gov
MEDIA_ROOT Where to save media files Current dir + /media
MEDIA_URL URL prefix of media files; must end with / /media/
MQ_URL URL to the message queuing service redis://
RECAPTCHA_PRIVATE_KEY Private key for reCAPTCHA Unset
RECAPTCHA_PUBLIC_KEY Public key for ereCAPTCHA Unset
SECURE_COOKIES True for secure handling of session and CSRF cookies True
SIGNING_KEY Cryptographic key to protect sessions, messages, tokens, etc. Unset in operations; set to a known bad value in development
STATIC_ROOT Where to collect static files Current dir + /static
STATIC_URL URL prefix of static files; must end with / /static/

Note that the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY can be set through-the-web; the environment variables are just a fallback. Sadly, neither RECAPTCHA_PRIVATE_KEY nor RECAPTCHA_PUBLIC_KEY can due to limitations of wagtail-django-recaptcha.

🪶 Apache HTTPD Configuration with Jenkins

This section describes how you'd use Apache HTTPD with Jenkins in order to make the site accessible to the world.

First up, the HTTPD configuration:

WSGIDaemonProcess edrnportal user=edrn group=edrn python-home=/usr/local/edrn/portal/p5-renaissance/venv 
WSGIProcessGroup edrnportal
WSGIScriptAlias /portal/renaissance /usr/local/edrn/portal/p5-renaissance/jenkins.wsgi process-group=edrnportal
Alias /portal/renaissance/media/ /usr/local/edrn/portal/p5-renaissance/media/
Alias /portal/renaissance/static/ /usr/local/edrn/portal/p5-renaissance/static/
<Directory "/usr/local/edrn/portal/p5-renaissance">
    <IfVersion < 2.4>
        Order allow,deny
        Allow from all
    </IfVersion>
    <IfVersion >= 2.4>
        Require all granted
    </IfVersion>
</Directory>
<Directory "/usr/local/edrn/portal/p5-renaissance/static/">
    Options FollowSymLinks
</Directory>

Next, here's the jenkins.wsgi that was referenced in the HTTPD configuration above (Jenkins should generate this with each build):

from django.core.wsgi import get_wsgi_application
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'edrnsite.policy.settings.ops')
os.environ.setdefault('LDAP_BIND_DN', 'uid=service,dc=edrn,dc=jpl,dc=nasa,dc=gov')
os.environ.setdefault('LDAP_BIND_PASSWORD', 'REDACTED')
os.environ.setdefault('SIGNING_KEY', 'REDACTED')
os.environ.setdefault('DATABASE_URL', 'postgresql://:@/edrn')
os.environ.setdefault('ALLOWED_HOSTS', '.jpl.nasa.gov')
os.environ.setdefault('STATIC_ROOT', '/usr/local/edrn/portal/p5-renaissance/static')
os.environ.setdefault('MEDIA_ROOT', '/usr/local/edrn/portal/p5-renaissance/media')
os.environ.setdefault('BASE_URL', 'https://edrn-dev.jpl.nasa.gov/portal/renaissance/')
os.environ.setdefault('STATIC_URL', '/portal/renaissance/static/')
os.environ.setdefault('MEDIA_URL', '/portal/renaissance/media/')
os.environ.setdefault('SECURE_COOKIES', 'False')
os.environ.setdefault('ELASTICSEARCH_URL', 'http://localhost:9200/')
os.environ.setdefault('CACHE_URL', 'redis://')
# We don't need FORCE_SCRIPT_NAME here since Apache's WSGISCriptAlias does the right thing
application = get_wsgi_application()

Finally, this needs to be run on each deployment:

$ venv/bin/django-admin collectstatic --settings erdnsite.policy.settings.ops --clear --link
$ mkdir media

🚢 Container Setup

To use this software in a Docker container environment, first collect the wheels by running:

support/build-wheels.sh

Or by hand:

.venv/bin/python -m build --outdir dist src/eke.geocoding
.venv/bin/python -m build --outdir dist src/edrnsite.streams
.venv/bin/python -m build --outdir dist src/edrnsite.controls
.venv/bin/python -m build --outdir dist src/edrnsite.content
.venv/bin/python -m build --outdir dist src/edrn.collabgroups
.venv/bin/python -m build --outdir dist src/edrn.theme
.venv/bin/python -m build --outdir dist src/edrnsite.search
.venv/bin/python -m build --outdir dist src/eke.knowledge
.venv/bin/python -m build --outdir dist src/eke.biomarkers
.venv/bin/python -m build --outdir dist src/edrnsite.ploneimport
.venv/bin/python -m build --outdir dist src/edrnsite.policy

You don't need src/edrnsite.test since it's just used for testing.

Repeat this for any other source directory in src. Then build the image:

docker image build --build-arg user_id=NUMBER --tag edrndocker/edrn-portal:latest --file docker/Dockerfile .

Replace NUMBER with the number of the user ID of the user under which to run the software in the container. Typically you'll want

  • 500 for running at the Jet Propulsion Laboratory.
  • 26013 for running at the National Cancer Institute.

Or do it all at once with Taskfile:

task image

That builds the wheels and the image for you.

Spot check: see if the image is working by running:

docker container run --rm --env LDAP_BIND_PASSWORD='[REDACTED]' --env SIGNING_KEY='s3cr3t' \
    --env ALLOWED_HOSTS='*' --publish 8000:8000 edrndocker/edrn-portal:latest

and visit http://localhost:8000/ and you should get Sever Error (500) since the database connection isn't established.

For a Docker Composition, the accompanying docker/docker-compose.yaml file enables you to run the orchestrated set of needed processes in production, including the portal, maintenance worker, search engine, cache and message queue, and a database. You can launch all the processes at once with docker compose up when at the National Cancer Institute. For local development, use task compose-up.

Next, establish an SSH tunnel to the EDRN Directory Service ldaps://edrn-ds.jpl.nasa.gov to localhost port 1636, so that ldaps://localhost:1636 will be the EDRN Directory Service—and therefore ldaps://host.docker.internal:1636 is also be the EDRN Directory Service, but from a container's point of view.

Finally, pick a convenient directory to contain the database (the PostgreSQL database plus the media blobs and static files); @nutjob4life uses ~/dockerdata/renaissance. Call this the $EDRN_DATA_DIR. Copy the daily database copies from production to tumor.jpl.nasa.gov as follows:

$ mkdir ${HOME}/dockerdata/renaissance
$ export EDRN_DATA_DIR=${HOME}/dockerdata/renaissance
$ env WORKSPACE=$EDRN_DATA_DIR support/sync-from-dev.sh

Feel free to do this as often as you like. Monthly is more than sufficient unless you're looking for some specific items recently added to the production database.

You'll also need to ensure your Docker environment can support the EDRN P5 application stack, which is large. If you're using Docker Desktop, you might want to adjust the settings as follows (under Settings → Resources):

  • CPU limit: 20
  • Memory limit: 48 GB
  • Swap: 2 GB
  • Disk usage limit: 96 GB

Ater setting the needed variables and resource limits, start the composition as follows:

task compose-up

You can now proceed to set up the database, search engine, and populate the portal with its content.

📀 Containerized Database Setup

Next, we need to set up the database with initial structure and content. This section tells you how.

🏛 Database Structure

To set up the initial database and its schema inside a Docker Composition, we start by deleting and then creating the database and loading the current production data:

rm -rf $EDRN_DATA_HOME/postgresql
mkdir $EDRN_DATA_HOME/postgresql
task compose -- exec db createdb --username=postgres --encoding=UTF8 --owner=postgres edrn
bzip2 --decompress --stdout $EDRN_DATA_HOME/edrn.sql.bz2 | task compose -- \
    exec --no-tty db psql --username=postgres --dbname=edrn --echo-errors

We can then upgrade the database schema to the latest software, fix any database tree issues, and collect the static files:

task compose -- exec portal /app/bin/django-admin migrate
task compose -- exec portal /app/bin/django-admin fixtree
task compose -- exec portal /app/bin/django-admin collectstatic --no-input --clear

Normally we'd also sync the LDAP groups, but this command will fail until the number of groups in EDRN is reduced; so just carry right on:

task compose -- exec portal /app/bin/django-admin ldap_group_sync

Then you can point a browser at https://localhost:2348/ (or whatever the HTTPS_PORT is) and see if it worked. Note that this uses a self-signed certificate so ignore any certificate warnings.

🕸 Reverse Proxy: ELB, ALB, Nginx, Apache HTTPD, etc.

The Docker Composition itself is not enough, of course. The last step is to set up an actual web server to accept requests, serve static and media files, reverse-proxy to the portal container, handle TLS/SSL encryption, load balancing, and so forth.

The web server is also responsible for serving media files and static assets. This is for efficiency: there's no need to involve the backend content management system for such files (which can be large). Furthermore, by giving the server direct filesystem access, it can use the sendfile system call, which is blazingly efficient.

In a nutshell, the web server must serve MEDIA_URL requests to the MEDIA_ROOT directory (which is EDRN_DATA_DIR/media), STATIC_URL requests to the STATIC_ROOT directory (which is EDRN_DATA_DIR/static), and all other requests reverse-proxied to the EDRN_PUBLISHED_PORT (or EDRN_TLS_PORT if you're using it) TCP socket.

How you configure an Elastic Load Balancer, Application Load Balancer, Nginx, Apache HTTPD, or other web server to handle reverse-proxying to the portal container as well as serving static and medial files depends on the software in use. In the interests of including a working example, though, see the following Nginx configuration:

server {
    listen;
    location /media/ {                 # Request = http://whatever/media/documents/sentinel.dat
        root /local/web/content/edrn;  # Response = /local/web/content/edrn/media/documents/sentinel.dat
    }
    location /static/ {                # Request = http://whatever/static/edrn.theme/css/edrn-overlay.css
        root /local/web/content/edrn;  # Response = /local/web/content/edrn/static/edrn.theme/css/edrn-overlay.css
    }
    location / {                           # All other requests go to the portal container
        proxy_pass http://localhost:4135;  # EDRN_PUBLISHED_PORT = 4135
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_redirect default;        
    }
}

🔻 Subpath Serving

Normally, the EDRN portal is hosted on the root path / of a host; for example, in production it's at https://edrn.nci.nih.gov/. However, for certain demonstrations and other expositions, it may be necessary to host it on a "subpath", such as https://edrn.jpl.nasa.gov/portal/renaissance/. Here, the subpath is /portal/renaissance/.

Depending on the web server, you may not need to do anything to support such a configuration, because the web server recognizes the subpath and sets the SCRIPT_NAME environment variable to the subpath (this is the case for mod_wsgi). Others, such as reverse-proxies, make no assumptions and make no such setting. When this is the case, you can set the FORCE_SCRIPT_NAME environment variable in the Docker composition to force the portal to believe a SCRIPT_NAME was set even when it wasn't.

As an example, if the web server is reverse-proxying to the Docker composition for URLs such as https://edrn.jpl.nasa.gov/portal/renaissance/, then we'd set FORCE_SCRIPT_NAME when starting the composition to /portal/renaissance/.

👩‍💻 Software Environment

To develop for this system, you'll need

  • PostgreSQL 17 or later
  • Python 3.12 or later, but not 4.0 or later
  • Elasticsearch 7.17 or later, but not 8.0 or later
  • Redis 7.0 or later, but not 8.0 or later

👥 Contributing

You can start by looking at the open issues, forking the project, and submitting a pull request. You can also contact us by email with suggestions.

🔢 Versioning

We use the SemVer philosophy for versioning this software. For versions available, see the releases made on this project. We're starting off with version 5 because reasons.

👩‍🎨 Creators

The principal developer is:

The QA team consists of:

To contact the team as a whole, email the Informatics Center.

📃 License

The project is licensed under the Apache version 2 license.

🎨 Art Credits

About

EDRN Production Program for the Public/Private Portal (P5)

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages