Merge branch 'pre-prod' into prod

biosustain · Dec 17, 2024 · 4abc5bf · 4abc5bf
2 parents 000f508 + 1c5737c
commit 4abc5bf
Show file tree

Hide file tree

Showing 12 changed files with 505 additions and 43 deletions.
diff --git a/.../workflows/deploy-preprod-to-azurevm.yaml → ...b/workflows/deploy-preprod-to-azurevm.yml b/.../workflows/deploy-preprod-to-azurevm.yaml → ...b/workflows/deploy-preprod-to-azurevm.yml
@@ -11,6 +11,8 @@ jobs:
     steps:
       - name: Checkout code
         uses: actions/checkout@v4
+        with:
+          persist-credentials: false
 
       - name: Copy files using SCP
         uses: appleboy/[email protected]
@@ -22,7 +24,7 @@ jobs:
           target: "/projects/pankb_web/django_project"
 
       - name: Create the .env file and (re-)start containers over SSH
-        uses: appleboy/ssh-action@v0.1.7
+        uses: appleboy/ssh-action@v1.2.0
         with:
           host: ${{ secrets.PANKB_PREPROD_HOST }}
           username: ${{ secrets.PANKB_PREPROD_SSH_USERNAME }}
@@ -59,6 +61,6 @@ jobs:
             echo "## URL address of the separately deployed AI Assistant Web Application" >> .env
             echo AI_ASSISTANT_APP_URL="${{vars.PANKB_PREPROD_AI_ASSISTANT_APP_URL}}" >> .env
             cat .env
-            docker compose down
-            docker compose up -d --build --force-recreate --remove-orphans
+            docker compose --profile dev down
+            docker compose --profile dev up -d --build --force-recreate --remove-orphans
             docker system prune --all --force
diff --git a/.github/workflows/deploy-prod-to-azurevm.yml b/.github/workflows/deploy-prod-to-azurevm.yml
@@ -11,6 +11,8 @@ jobs:
     steps:
       - name: Checkout code
         uses: actions/checkout@v4
+        with:
+          persist-credentials: false
 
       - name: Copy files using SCP
         uses: appleboy/[email protected]
@@ -22,7 +24,7 @@ jobs:
           target: "/projects/pankb_web/django_project"
 
       - name: Create the .env file and (re-)start containers over SSH
-        uses: appleboy/ssh-action@v0.1.7
+        uses: appleboy/ssh-action@v1.2.0
         with:
           host: ${{ secrets.PANKB_PROD_HOST }}
           username: ${{ secrets.PANKB_PROD_SSH_USERNAME }}
@@ -59,6 +61,6 @@ jobs:
             echo "## URL address of the separately deployed AI Assistant Web Application" >> .env
             echo AI_ASSISTANT_APP_URL="${{vars.PANKB_PROD_AI_ASSISTANT_APP_URL}}" >> .env
             cat .env
-            docker compose down
-            docker compose up -d --build --force-recreate --remove-orphans
+            docker compose --profile prod down
+            docker compose --profile prod up -d --build --force-recreate --remove-orphans
             docker system prune --all --force
diff --git a/README.md b/README.md
@@ -2,11 +2,12 @@
 <b>The dynamic Python-based version of the website. The Django framework is used as the back-end. Data about organisms, genes, genomes, locus_tags and KEGG pathways are stored in a database (in a cloud-based Azure Cosmos DB for MongoDB). The Microsoft Azure Blob Storage is still used as a data lake to store static unstructured or semi-structured data, e.g., plots, bibliome and phylogenetic trees (i.e., data that are not used by search or any other scripts generating dynamic content).</b>
 
 ## Contributors
-- Front-end, analytics, LLM, data processing via a bioinformatics pipeline: Pascal A. Pieters, [email protected]; Binhuan Sun, [email protected]
-- Back-end, ETL pipeline, the website and vector databases, CI/CD pipeline, the github repo maintenance, versioning and backup systems, infrastructure, DevOps: Pascal A. Pieters, [email protected]
+- Front-end, analytics, LLM, data processing via a bioinformatics pipeline: Binhuan Sun (v1.0.0), Pascal A. Pieters (>=v2.0.0)
+- Back-end, ETL pipeline, the website and vector databases, CI/CD pipeline, the github repo maintenance, versioning and backup systems, infrastructure, DevOps: Liubov Pashkova (v2.0.0) Pascal A. Pieters (>=v3.0.0)
 
+For more info, contact Pascal A. Pieters, [email protected]
 ## Server Configuration
-Tested on Linux Ubuntu 20.04 (may need tweaks for other systems).
+Tested on Linux Ubuntu 20.04 and 24.04 (may need tweaks for other systems).
 
 Min hardware requirements solely for the PanKB website deployment (excl. the PanKB DB, ETL and AI Assistant app):
 - 4GB RAM
@@ -132,4 +133,4 @@ CONTAINER ID   IMAGE                COMMAND                  CREATED
 6523c2afddd3   pankb_nginx:latest   "/docker-entrypoint.…"   About an hour ago   Up About an hour   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp   pankb-nginx
 c3bbd55e070d   pankb_llm:latest     "streamlit run strea…"   2 hours ago         Up 2 hours         0.0.0.0:8501->8501/tcp, :::8501->8501/tcp                                  pankb-llm
 ```
-After the Github Actions deployment job has successfully run, the web-application must be available at <a href="pankb.org" target="_blank">pankb.org</a>.
+After the Github Actions deployment job has successfully run, the web-application must be available at <a href="pankb.org" target="_blank">pankb.org</a>.
diff --git a/django_project/urls.py b/django_project/urls.py
@@ -148,6 +148,11 @@
         name="pathway_info_genes_json",
     ),
     path("search/", search_views.search_results, name="search_results"),
+    path(
+        "search/genomes_json/",
+        search_views.genomes_json,
+        name="search_genomes_json",
+    ),
     path(
         "search/genes_json/",
         search_views.gene_annotation_json,
@@ -168,6 +173,11 @@
         search_views.download_search_pathway_csv,
         name="download_search_pathway_csv",
     ),
+    path(
+        "search/genomes/csv/",
+        search_views.download_search_genomes_csv,
+        name="download_search_genomes_csv",
+    ),
     path(
         "search/genes/csv/",
         search_views.download_search_genes_csv,

diff --git a/search/views.py b/search/views.py
@@ -2,7 +2,7 @@
 from django.template import loader
 from organisms.models import Organisms
 from pangenome_analyses.models import GeneAnnotations
-from gene_function.models import PathwayInfo
+from gene_function.models import PathwayInfo, GenomeInfo
 from common import csv_export
 import json, time, re
 
@@ -187,13 +187,60 @@ def download_search_genes_csv(request):
     return response
 
 
+def download_search_genomes_csv(request):
+    q_orig = request.GET.get("q")
+    q = clean_query(q_orig)
+    genome_keys = [
+        "pangenome_analysis",
+        "genome_id",
+        "species",
+        "strain",
+        "phylo_group",
+        "gc_content",
+        "country",
+        "broad_context",
+        "local_context",
+        "extra_context",
+        "isolation_source",
+    ]
+    if len(q) >= 2 and not re.search(
+        q, "Missing", flags=re.IGNORECASE
+    ):  # Prevent too many results:
+        genomes = GenomeInfo.objects.aggregate(
+            GenomeInfo.get_genome_and_isolation_info_pipeline({})
+            + build_multi_search_aggregation(
+                q, ["genome_id", "strain", "country", "iso_cat", "isolation_source"]
+            )
+            + [
+                {
+                    "$addFields": {
+                        "broad_context": {"$arrayElemAt": ["$iso_cat", 0]},
+                        "local_context": {"$arrayElemAt": ["$iso_cat", 1]},
+                        "extra_context": {"$slice": ["$iso_cat", 2, 10]},
+                    }
+                },
+                {"$project": {gk: int(gk != "_id") for gk in ["_id"] + genome_keys}},
+            ]
+        )
+    else:
+        genomes = []
+    downloaded_file_name = (
+        "Search__genomes__" + time.strftime("%Y-%m-%d_%H-%M") + ".csv"
+    )
+    response = csv_export.dict_writer_response(
+        downloaded_file_name, genome_keys, genomes
+    )
+    return response
+
+
 # JSON data for gene datatable
 def gene_annotation_json(request):
     q_orig = str(request.GET["q"])
     q = clean_query(q_orig)
 
     gene_keys = [
         "gene",
+        "species",
         "cog_category",
         "cog_name",
         "description",
@@ -214,3 +261,51 @@ def gene_annotation_json(request):
     else:
         genes = []
     return JsonResponse({"results": genes})
+
+# JSON data for genome datatable
+def genomes_json(request):
+    q_orig = str(request.GET["q"])
+    q = clean_query(q_orig)
+
+    genome_keys = [
+        "pangenome_analysis",
+        "genome_id",
+        "species",
+        "strain",
+        "phylo_group",
+        "gc_content",
+        "country",
+        "broad_context",
+        "local_context",
+        "extra_context",
+        "isolation_source",
+    ]
+    if len(q) >= 2 and not re.search(
+        q, "Missing", flags=re.IGNORECASE
+    ):  # Prevent too many results
+        genomes = GenomeInfo.objects.aggregate(
+            GenomeInfo.get_genome_and_isolation_info_pipeline({})
+            + build_multi_search_aggregation(
+                q, ["genome_id", "strain", "country", "iso_cat", "isolation_source"]
+            )
+            + [
+                {
+                    "$addFields": {
+                        "broad_context": {"$arrayElemAt": ["$iso_cat", 0]},
+                        "local_context": {"$arrayElemAt": ["$iso_cat", 1]},
+                        "extra_context": {"$slice": ["$iso_cat", 2, 10]},
+                    }
+                },
+                {"$project": {gk: int(gk != "_id") for gk in ["_id"] + genome_keys}},
+            ]
+        )
+        genomes = [
+            [
+                (str(g.get(gk, None)) if gk == "strain" else g.get(gk, None))
+                for gk in genome_keys
+            ]
+            for g in genomes
+        ]
+    else:
+        genomes = []
+    return JsonResponse({"results": genomes})
diff --git a/static/phylotree.js/js/phylotree.js b/static/phylotree.js/js/phylotree.js