Skip to content

Commit

Permalink
Merge branch 'pre-prod' into prod
Browse files Browse the repository at this point in the history
  • Loading branch information
pascalaldo committed Dec 17, 2024
2 parents 000f508 + 1c5737c commit 4abc5bf
Show file tree
Hide file tree
Showing 12 changed files with 505 additions and 43 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
persist-credentials: false

- name: Copy files using SCP
uses: appleboy/[email protected]
Expand All @@ -22,7 +24,7 @@ jobs:
target: "/projects/pankb_web/django_project"

- name: Create the .env file and (re-)start containers over SSH
uses: appleboy/ssh-action@v0.1.7
uses: appleboy/ssh-action@v1.2.0
with:
host: ${{ secrets.PANKB_PREPROD_HOST }}
username: ${{ secrets.PANKB_PREPROD_SSH_USERNAME }}
Expand Down Expand Up @@ -59,6 +61,6 @@ jobs:
echo "## URL address of the separately deployed AI Assistant Web Application" >> .env
echo AI_ASSISTANT_APP_URL="${{vars.PANKB_PREPROD_AI_ASSISTANT_APP_URL}}" >> .env
cat .env
docker compose down
docker compose up -d --build --force-recreate --remove-orphans
docker compose --profile dev down
docker compose --profile dev up -d --build --force-recreate --remove-orphans
docker system prune --all --force
8 changes: 5 additions & 3 deletions .github/workflows/deploy-prod-to-azurevm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
persist-credentials: false

- name: Copy files using SCP
uses: appleboy/[email protected]
Expand All @@ -22,7 +24,7 @@ jobs:
target: "/projects/pankb_web/django_project"

- name: Create the .env file and (re-)start containers over SSH
uses: appleboy/ssh-action@v0.1.7
uses: appleboy/ssh-action@v1.2.0
with:
host: ${{ secrets.PANKB_PROD_HOST }}
username: ${{ secrets.PANKB_PROD_SSH_USERNAME }}
Expand Down Expand Up @@ -59,6 +61,6 @@ jobs:
echo "## URL address of the separately deployed AI Assistant Web Application" >> .env
echo AI_ASSISTANT_APP_URL="${{vars.PANKB_PROD_AI_ASSISTANT_APP_URL}}" >> .env
cat .env
docker compose down
docker compose up -d --build --force-recreate --remove-orphans
docker compose --profile prod down
docker compose --profile prod up -d --build --force-recreate --remove-orphans
docker system prune --all --force
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
<b>The dynamic Python-based version of the website. The Django framework is used as the back-end. Data about organisms, genes, genomes, locus_tags and KEGG pathways are stored in a database (in a cloud-based Azure Cosmos DB for MongoDB). The Microsoft Azure Blob Storage is still used as a data lake to store static unstructured or semi-structured data, e.g., plots, bibliome and phylogenetic trees (i.e., data that are not used by search or any other scripts generating dynamic content).</b>

## Contributors
- Front-end, analytics, LLM, data processing via a bioinformatics pipeline: Pascal A. Pieters, [email protected]; Binhuan Sun, [email protected]
- Back-end, ETL pipeline, the website and vector databases, CI/CD pipeline, the github repo maintenance, versioning and backup systems, infrastructure, DevOps: Pascal A. Pieters, [email protected]
- Front-end, analytics, LLM, data processing via a bioinformatics pipeline: Binhuan Sun (v1.0.0), Pascal A. Pieters (>=v2.0.0)
- Back-end, ETL pipeline, the website and vector databases, CI/CD pipeline, the github repo maintenance, versioning and backup systems, infrastructure, DevOps: Liubov Pashkova (v2.0.0) Pascal A. Pieters (>=v3.0.0)

For more info, contact Pascal A. Pieters, [email protected]
## Server Configuration
Tested on Linux Ubuntu 20.04 (may need tweaks for other systems).
Tested on Linux Ubuntu 20.04 and 24.04 (may need tweaks for other systems).

Min hardware requirements solely for the PanKB website deployment (excl. the PanKB DB, ETL and AI Assistant app):
- 4GB RAM
Expand Down Expand Up @@ -132,4 +133,4 @@ CONTAINER ID IMAGE COMMAND CREATED
6523c2afddd3 pankb_nginx:latest "/docker-entrypoint.…" About an hour ago Up About an hour 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp pankb-nginx
c3bbd55e070d pankb_llm:latest "streamlit run strea…" 2 hours ago Up 2 hours 0.0.0.0:8501->8501/tcp, :::8501->8501/tcp pankb-llm
```
After the Github Actions deployment job has successfully run, the web-application must be available at <a href="pankb.org" target="_blank">pankb.org</a>.
After the Github Actions deployment job has successfully run, the web-application must be available at <a href="pankb.org" target="_blank">pankb.org</a>.
10 changes: 10 additions & 0 deletions django_project/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,11 @@
name="pathway_info_genes_json",
),
path("search/", search_views.search_results, name="search_results"),
path(
"search/genomes_json/",
search_views.genomes_json,
name="search_genomes_json",
),
path(
"search/genes_json/",
search_views.gene_annotation_json,
Expand All @@ -168,6 +173,11 @@
search_views.download_search_pathway_csv,
name="download_search_pathway_csv",
),
path(
"search/genomes/csv/",
search_views.download_search_genomes_csv,
name="download_search_genomes_csv",
),
path(
"search/genes/csv/",
search_views.download_search_genes_csv,
Expand Down
97 changes: 96 additions & 1 deletion search/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from django.template import loader
from organisms.models import Organisms
from pangenome_analyses.models import GeneAnnotations
from gene_function.models import PathwayInfo
from gene_function.models import PathwayInfo, GenomeInfo
from common import csv_export
import json, time, re

Expand Down Expand Up @@ -187,13 +187,60 @@ def download_search_genes_csv(request):
return response


def download_search_genomes_csv(request):
q_orig = request.GET.get("q")
q = clean_query(q_orig)
genome_keys = [
"pangenome_analysis",
"genome_id",
"species",
"strain",
"phylo_group",
"gc_content",
"country",
"broad_context",
"local_context",
"extra_context",
"isolation_source",
]
if len(q) >= 2 and not re.search(
q, "Missing", flags=re.IGNORECASE
): # Prevent too many results:
genomes = GenomeInfo.objects.aggregate(
GenomeInfo.get_genome_and_isolation_info_pipeline({})
+ build_multi_search_aggregation(
q, ["genome_id", "strain", "country", "iso_cat", "isolation_source"]
)
+ [
{
"$addFields": {
"broad_context": {"$arrayElemAt": ["$iso_cat", 0]},
"local_context": {"$arrayElemAt": ["$iso_cat", 1]},
"extra_context": {"$slice": ["$iso_cat", 2, 10]},
}
},
{"$project": {gk: int(gk != "_id") for gk in ["_id"] + genome_keys}},
]
)
else:
genomes = []
downloaded_file_name = (
"Search__genomes__" + time.strftime("%Y-%m-%d_%H-%M") + ".csv"
)
response = csv_export.dict_writer_response(
downloaded_file_name, genome_keys, genomes
)
return response


# JSON data for gene datatable
def gene_annotation_json(request):
q_orig = str(request.GET["q"])
q = clean_query(q_orig)

gene_keys = [
"gene",
"species",
"cog_category",
"cog_name",
"description",
Expand All @@ -214,3 +261,51 @@ def gene_annotation_json(request):
else:
genes = []
return JsonResponse({"results": genes})

# JSON data for genome datatable
def genomes_json(request):
q_orig = str(request.GET["q"])
q = clean_query(q_orig)

genome_keys = [
"pangenome_analysis",
"genome_id",
"species",
"strain",
"phylo_group",
"gc_content",
"country",
"broad_context",
"local_context",
"extra_context",
"isolation_source",
]
if len(q) >= 2 and not re.search(
q, "Missing", flags=re.IGNORECASE
): # Prevent too many results
genomes = GenomeInfo.objects.aggregate(
GenomeInfo.get_genome_and_isolation_info_pipeline({})
+ build_multi_search_aggregation(
q, ["genome_id", "strain", "country", "iso_cat", "isolation_source"]
)
+ [
{
"$addFields": {
"broad_context": {"$arrayElemAt": ["$iso_cat", 0]},
"local_context": {"$arrayElemAt": ["$iso_cat", 1]},
"extra_context": {"$slice": ["$iso_cat", 2, 10]},
}
},
{"$project": {gk: int(gk != "_id") for gk in ["_id"] + genome_keys}},
]
)
genomes = [
[
(str(g.get(gk, None)) if gk == "strain" else g.get(gk, None))
for gk in genome_keys
]
for g in genomes
]
else:
genomes = []
return JsonResponse({"results": genomes})
105 changes: 103 additions & 2 deletions static/phylotree.js/js/phylotree.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 4abc5bf

Please sign in to comment.