From 3beafb4ab7696b7facf060e9d4932d78e5f5aa7d Mon Sep 17 00:00:00 2001
From: "Alan B. Christie" <29806285+alanbchristie@users.noreply.github.com>
Date: Tue, 12 Mar 2024 10:47:09 +0100
Subject: [PATCH] Align production with staging (#555)

* Some changes to cset_upload.py to allow site observation short codes (#527)

* stashing

* fix: cset_upload.py updated to allow new-style site observation codes

NB! this probably still won't work! I suspect the file I was given is
broken and I cannot test it further

* stashing

* stashing

* Short code prefix and tooltip to backend

Target loader now reads short code prefix and tooltip from
meta_aligner.yaml. Tooltip is saved to Experiment model.

TODO: make tooltip available via API

* Prefix tooltip now serverd by api/site_observation

* stashing

* Site observation groups for shortcodes now by experiment

* feat: download structure fixed

TODO: add all the yamls

* All yaml files added to download

* New format to download zip (issue 1326) (#530)

* stashing

* stashing

* feat: download structure fixed

TODO: add all the yamls

* All yaml files added to download

* cset_upload.py: lhs_pdb renamed to ref_pdb

* Renamed canon- and conf site tags

* Adds support for key-based SSH connections (#534)

* Centralised environment variables (#529)

* refactor: Restructured settings.py

* docs: Minor tweaks

* refactor: Move security and infection config to settings

* refactor: b/e & f/e/ tags now in settings (also fixed f/e tag value)

* refactor: Move Neo4j config to settings

* refactor: More variables into settings

* refactor: Moved remaining config

* docs: Adds configuration guide as comments

* docs: Variable prefix now 'stack_' not 'stack_env_'

---------

Co-authored-by: Alan Christie <alan.christie@matildapeak.com>

* feat: Adds support for private keys on SSH tunnel

* fix: Fixes key-based logic

---------

Co-authored-by: Alan Christie <alan.christie@matildapeak.com>

* build(deps): bump cryptography from 42.0.0 to 42.0.2 (#533)

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.0...42.0.2)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: Updates documentation (#536)

Co-authored-by: Alan Christie <alan.christie@matildapeak.com>

* build(deps): bump django from 3.2.20 to 3.2.24 (#535)

Bumps [django](https://github.com/django/django) from 3.2.20 to 3.2.24.
- [Commits](https://github.com/django/django/compare/3.2.20...3.2.24)

---
updated-dependencies:
- dependency-name: django
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: reverting wrong changes

* fix: reverting wrong changes (#538)

* stashing

* add site observation's ligand sdf to aligned_files

* fix: custom pdb now downloadable

* fix: increased loglevel to error on unexpected exceptions block

* fix: Discourse service check now checks API key before creating a service (#544)

Co-authored-by: Alan Christie <alan.christie@matildapeak.com>

* build(deps): bump cryptography from 42.0.2 to 42.0.4 (#539)

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.2...42.0.4)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* metadata.csv populated

Started working on issue 1355 as well, it's too tightly coupled. Some
work remaining re that:
- when tag created in UI, make sure upload_name attribute is populated

* upload_name automatically pouplated when creating tags in UI

Only populated on creation, updates won't touch it

* changes to api/download_structures

- apo_file, bound_file, sdf_info and smiles_info merged into
all_aligned_structures
- added pdb_info field

NB! download_structures was requred to provide ligand_pdb as
well. This wasn't tracked previously, so I added field to
SiteObservation model. Meaning there's a migration and on stack
deployment data needs to be wiped and reuploaded

* don't download neighbourhoods.yaml unless trans_matrix_info is checked

* fixed error handling (errors.csv) and not returning combined sdf

* fix: Added parsing directives to DownloadStructuresserializer

* Consecutive numbering of observations under canon site

* SiteObservatdion.tag split to tag and tag_prefix (1361)

* fix: crystallographic_files folders in download now sans suffix (#550)

* fix: tag names underdand prefix in download's metadata.csv

* fix: return all proteins listed in api/download_structures

* fix: fixed 'All structures' option not working in download dialog

* Migrations for new file fields

* Issue 1326 - mol and smiles added to download bundle

NB! not prodction/staging ready, still contains a hack for testing
because XCA doesn't provide all the attributes.

* Target loader should handle empty code_prefix and tooltip

'Should' because haven't tested yet with real data

* Column 'Downloaded' to metadata.csv in downloads

* fix: restore 'upload_name' in site obvs tags to prefix-tag format

* Removed ligand_smiles workaround

All necessary files are now tracked by the database and returned in download.

* fix: Add force_error_display to connection functions (default False) (#559)

Co-authored-by: Alan Christie <alan.christie@matildapeak.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Kalev Takkis <ktakkis@informaticsmatters.com>
Co-authored-by: Warren Thompson <waztom@gmail.com>
Co-authored-by: Alan Christie <alan.christie@matildapeak.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Warren Thompson <warren.thompson+1@diamond.ac.uk>
---
 api/security.py                              | 14 ++---
 viewer/download_structures.py                | 57 ++++++++++++++++----
 viewer/migrations/0049_auto_20240307_1344.py | 36 +++++++++++++
 viewer/models.py                             |  6 +++
 viewer/target_loader.py                      | 32 +++++++----
 viewer/views.py                              | 44 ++++++---------
 6 files changed, 135 insertions(+), 54 deletions(-)
 create mode 100644 viewer/migrations/0049_auto_20240307_1344.py

diff --git a/api/security.py b/api/security.py
index 01605352..12d4a987 100644
--- a/api/security.py
+++ b/api/security.py
@@ -47,7 +47,7 @@
 # response = view(request)
 
 
-def get_remote_conn() -> Optional[SSHConnector]:
+def get_remote_conn(force_error_display=False) -> Optional[SSHConnector]:
     credentials: Dict[str, Any] = {
         "user": settings.ISPYB_USER,
         "pw": settings.ISPYB_PASSWORD,
@@ -71,7 +71,8 @@ def get_remote_conn() -> Optional[SSHConnector]:
     #          Assume the credentials are invalid if there is no host.
     #          If a host is not defined other properties are useless.
     if not credentials["host"]:
-        logger.debug("No ISPyB host - cannot return a connector")
+        if logging.DEBUG >= logger.level or force_error_display:
+            logger.info("No ISPyB host - cannot return a connector")
         return None
 
     # Try to get an SSH connection (aware that it might fail)
@@ -81,14 +82,14 @@ def get_remote_conn() -> Optional[SSHConnector]:
     except Exception:
         # Log the exception if DEBUG level or lower/finer?
         # The following will not log if the level is set to INFO for example.
-        if logging.DEBUG >= logger.level:
+        if logging.DEBUG >= logger.level or force_error_display:
             logger.info("credentials=%s", credentials)
             logger.exception("Got the following exception creating SSHConnector...")
 
     return conn
 
 
-def get_conn() -> Optional[Connector]:
+def get_conn(force_error_display=False) -> Optional[Connector]:
     credentials: Dict[str, Any] = {
         "user": settings.ISPYB_USER,
         "pw": settings.ISPYB_PASSWORD,
@@ -101,7 +102,8 @@ def get_conn() -> Optional[Connector]:
     #          Assume the credentials are invalid if there is no host.
     #          If a host is not defined other properties are useless.
     if not credentials["host"]:
-        logger.debug("No ISPyB host - cannot return a connector")
+        if logging.DEBUG >= logger.level or force_error_display:
+            logger.info("No ISPyB host - cannot return a connector")
         return None
 
     conn: Optional[Connector] = None
@@ -110,7 +112,7 @@ def get_conn() -> Optional[Connector]:
     except Exception:
         # Log the exception if DEBUG level or lower/finer?
         # The following will not log if the level is set to INFO for example.
-        if logging.DEBUG >= logger.level:
+        if logging.DEBUG >= logger.level or force_error_display:
             logger.info("credentials=%s", credentials)
             logger.exception("Got the following exception creating Connector...")
 
diff --git a/viewer/download_structures.py b/viewer/download_structures.py
index 9f154a6a..1ec33714 100644
--- a/viewer/download_structures.py
+++ b/viewer/download_structures.py
@@ -20,7 +20,8 @@
 
 import pandoc
 from django.conf import settings
-from django.db.models import Exists, OuterRef, Subquery
+from django.db.models import CharField, Exists, F, OuterRef, Subquery, Value
+from django.db.models.functions import Concat
 
 from viewer.models import (
     DownloadLinks,
@@ -48,6 +49,8 @@
     'apo_desolv_file': ('aligned'),  # SiteObservation: apo_desolv_file
     'bound_file': ('aligned'),  # SiteObservation: bound_file
     'sdf_info': ('aligned'),  # SiteObservation: ligand_mol_file (indirectly)
+    'ligand_mol': ('aligned'),  # SiteObservation: ligand_mol
+    'ligand_smiles': ('aligned'),  # SiteObservation: ligand_smiles
     'ligand_pdb': ('aligned'),  # SiteObservation: ligand_pdb
     'smiles_info': (''),  # SiteObservation: smiles_info (indirectly)
     # those above are all controlled by serializer's all_aligned_structures flag
@@ -79,6 +82,7 @@ class TagSubquery(Subquery):
     """Annotate SiteObservation with tag of given category"""
 
     def __init__(self, category):
+        # fmt: off
         query = SiteObservationTag.objects.filter(
             pk=Subquery(
                 SiteObvsSiteObservationTag.objects.filter(
@@ -88,8 +92,16 @@ def __init__(self, category):
                     ),
                 ).values('site_obvs_tag')[:1]
             )
-        ).values('tag')[0:1]
+        ).annotate(
+            combitag=Concat(
+                F('tag_prefix'),
+                Value(' - '),
+                F('tag'),
+                output_field=CharField(),
+            ),
+        ).values('combitag')[0:1]
         super().__init__(query)
+        # fmt: on
 
 
 class CuratedTagSubquery(Exists):
@@ -126,6 +138,8 @@ class ArchiveFile:
         'diff_file': {},
         'sigmaa_file': {},
         'ligand_pdb': {},
+        'ligand_mol': {},
+        'ligand_smiles': {},
     },
     'molecules': {
         'sdf_files': {},
@@ -219,6 +233,10 @@ def _patch_molecule_name(site_observation):
     lines = site_observation.ligand_mol_file.split('\n')
     if not lines[0].strip():
         lines[0] = site_observation.long_code
+
+    # the db contents is mol file but what's requested here is
+    # sdf. add sdf separator
+    lines.append('$$$$\n')
     return '\n'.join(lines)
 
 
@@ -400,13 +418,13 @@ def _trans_matrix_files_zip(ziparchive, target):
             _add_empty_file(ziparchive, archive_path)
 
 
-def _metadate_file_zip(ziparchive, target):
+def _metadata_file_zip(ziparchive, target, site_observations):
     """Compile and add metadata file to archive."""
     logger.info('+ Processing metadata')
 
     annotations = {}
-    values = ['code', 'longcode', 'cmpd__compound_code', 'smiles']
-    header = ['Code', 'Long code', 'Compound code', 'Smiles']
+    values = ['code', 'longcode', 'cmpd__compound_code', 'smiles', 'downloaded']
+    header = ['Code', 'Long code', 'Compound code', 'Smiles', 'Downloaded']
 
     for category in TagCategory.objects.filter(category__in=TAG_CATEGORIES):
         tag = f'tag_{category.category.lower()}'
@@ -414,7 +432,7 @@ def _metadate_file_zip(ziparchive, target):
         header.append(category.category)
         annotations[tag] = TagSubquery(category.category)
 
-    pattern = re.compile(r'\W+')
+    pattern = re.compile(r'\W+')  # non-alphanumeric characters
     for tag in SiteObservationTag.objects.filter(
         category__in=TagCategory.objects.filter(category__in=CURATED_TAG_CATEGORIES),
         target=target,
@@ -431,6 +449,12 @@ def _metadate_file_zip(ziparchive, target):
     ).prefetch_related(
         'cmpd',
         'siteobservationtags',
+    ).annotate(
+        downloaded=Exists(
+            site_observations.filter(
+                pk=OuterRef('pk'),
+            ),
+        )
     ).annotate(**annotations).values_list(*values)
     # fmt: on
 
@@ -488,7 +512,7 @@ def _extra_files_zip(ziparchive, target):
                     ziparchive.write(
                         filepath,
                         os.path.join(
-                            _ZIP_FILEPATHS[f'extra_files_{num_extra_dir}'], file
+                            f'{_ZIP_FILEPATHS["extra_files"]}_{num_extra_dir}', file
                         ),
                     )
                     num_processed += 1
@@ -603,7 +627,9 @@ def _build_readme(readme, original_search, template_file, ziparchive):
         readme.write(f'- {filename}' + '\n')
 
 
-def _create_structures_zip(target, zip_contents, file_url, original_search, host):
+def _create_structures_zip(
+    target, zip_contents, file_url, original_search, host, site_observations
+):
     """Write a ZIP file containing data from an input dictionary."""
 
     logger.info('+ _create_structures_zip(%s)', target.title)
@@ -666,7 +692,7 @@ def _create_structures_zip(target, zip_contents, file_url, original_search, host
 
         # compile and add metadata.csv
         if zip_contents['metadata_info']:
-            _metadate_file_zip(ziparchive, target)
+            _metadata_file_zip(ziparchive, target, site_observations)
 
         if zip_contents['trans_matrix_info']:
             _trans_matrix_files_zip(ziparchive, target)
@@ -766,6 +792,8 @@ def _create_structures_dict(site_obvs, protein_params, other_params):
                     'artefacts_file',
                     'pdb_header_file',
                     'ligand_pdb',
+                    'ligand_mol',
+                    'ligand_smiles',
                     'diff_file',
                 ]:
                     # siteobservation object
@@ -880,6 +908,8 @@ def get_download_params(request):
         'apo_solv_file': serializer.validated_data['all_aligned_structures'],
         'apo_desolv_file': serializer.validated_data['all_aligned_structures'],
         'ligand_pdb': serializer.validated_data['all_aligned_structures'],
+        'ligand_mol': serializer.validated_data['all_aligned_structures'],
+        'ligand_smiles': serializer.validated_data['all_aligned_structures'],
         'cif_info': serializer.validated_data['cif_info'],
         'mtz_info': serializer.validated_data['mtz_info'],
         'map_info': serializer.validated_data['map_info'],
@@ -984,7 +1014,14 @@ def create_or_return_download_link(request, target, site_observations):
     zip_contents = _create_structures_dict(
         site_observations, protein_params, other_params
     )
-    _create_structures_zip(target, zip_contents, file_url, original_search, host)
+    _create_structures_zip(
+        target,
+        zip_contents,
+        file_url,
+        original_search,
+        host,
+        site_observations,
+    )
 
     download_link = DownloadLinks()
     # Note: 'zip_file' and 'zip_contents' record properties are no longer used.
diff --git a/viewer/migrations/0049_auto_20240307_1344.py b/viewer/migrations/0049_auto_20240307_1344.py
new file mode 100644
index 00000000..862b6045
--- /dev/null
+++ b/viewer/migrations/0049_auto_20240307_1344.py
@@ -0,0 +1,36 @@
+# Generated by Django 3.2.23 on 2024-03-07 13:44
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+    dependencies = [
+        ('viewer', '0048_auto_20240305_1038'),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name='historicalsiteobservation',
+            name='ligand_mol',
+            field=models.TextField(max_length=255, null=True),
+        ),
+        migrations.AddField(
+            model_name='historicalsiteobservation',
+            name='ligand_smiles',
+            field=models.TextField(max_length=255, null=True),
+        ),
+        migrations.AddField(
+            model_name='siteobservation',
+            name='ligand_mol',
+            field=models.FileField(
+                max_length=255, null=True, upload_to='target_loader_data/'
+            ),
+        ),
+        migrations.AddField(
+            model_name='siteobservation',
+            name='ligand_smiles',
+            field=models.FileField(
+                max_length=255, null=True, upload_to='target_loader_data/'
+            ),
+        ),
+    ]
diff --git a/viewer/models.py b/viewer/models.py
index ac9742b8..31336fa4 100644
--- a/viewer/models.py
+++ b/viewer/models.py
@@ -465,6 +465,12 @@ class SiteObservation(models.Model):
     seq_id = models.IntegerField()
     chain_id = models.CharField(max_length=1)
     ligand_mol_file = models.TextField(null=True)
+    ligand_mol = models.FileField(
+        upload_to="target_loader_data/", null=True, max_length=255
+    )
+    ligand_smiles = models.FileField(
+        upload_to="target_loader_data/", null=True, max_length=255
+    )
     ligand_pdb = models.FileField(
         upload_to="target_loader_data/", null=True, max_length=255
     )
diff --git a/viewer/target_loader.py b/viewer/target_loader.py
index 5613973f..2590f163 100644
--- a/viewer/target_loader.py
+++ b/viewer/target_loader.py
@@ -635,7 +635,7 @@ def logfunc(key, message):
 
         # memo to self: added type ignore directives to return line
         # below and append line above because after small refactoring,
-        # mypy all of the sudden started throwing errors on bothe or
+        # mypy all of the sudden started throwing errors on both of
         # these. the core of it's grievance is that it expects the
         # return type to be list[str]. no idea why, function signature
         # clearly defines it as list[str | None]
@@ -734,7 +734,6 @@ def process_experiment(
         """
         del kwargs
         assert item_data
-        assert prefix_tooltips
         logger.debug("incoming data: %s", item_data)
         experiment_name, data = item_data
 
@@ -814,8 +813,12 @@ def process_experiment(
         # version	int	old versions are kept	target loader
         version = 1
 
-        code_prefix = extract(key="code_prefix")
-        prefix_tooltip = prefix_tooltips.get(code_prefix, "")
+        # if empty or key missing entirely, ensure code_prefix returns empty
+        code_prefix = extract(key="code_prefix", level=logging.INFO)
+        # ignoring type because tooltip dict can legitimately be empty
+        # and in such case, assert statement fails. need to remove it
+        # and use the ignore
+        prefix_tooltip = prefix_tooltips.get(code_prefix, "")  # type: ignore[union-attr]
 
         fields = {
             "code": experiment_name,
@@ -1279,7 +1282,7 @@ def process_site_observation(
         longcode = f"{experiment.code}_{chain}_{str(ligand)}_{str(idx)}"
         key = f"{experiment.code}/{chain}/{str(ligand)}"
 
-        smiles = extract(key="ligand_smiles")
+        smiles = extract(key="ligand_smiles_string")
 
         try:
             compound = compounds[experiment_id].instance
@@ -1319,11 +1322,13 @@ def process_site_observation(
             apo_desolv_file,
             apo_file,
             artefacts_file,
-            ligand_mol,
+            ligand_mol_file,
             sigmaa_file,
             diff_file,
             event_file,
             ligand_pdb,
+            ligand_mol,
+            ligand_smiles,
         ) = self.validate_files(
             obj_identifier=experiment_id,
             file_struct=data,
@@ -1340,16 +1345,19 @@ def process_site_observation(
                 "diff_map",  # NB! keys in meta_aligner not yet updated
                 "event_map",
                 "ligand_pdb",
+                "ligand_mol",
+                "ligand_smiles",
             ),
             validate_files=validate_files,
         )
 
-        logger.debug('looking for ligand_mol: %s', ligand_mol)
+        logger.debug('looking for ligand_mol: %s', ligand_mol_file)
+
         mol_data = None
-        if ligand_mol:
+        if ligand_mol_file:
             with contextlib.suppress(TypeError, FileNotFoundError):
                 with open(
-                    self.raw_data.joinpath(ligand_mol),
+                    self.raw_data.joinpath(ligand_mol_file),
                     "r",
                     encoding="utf-8",
                 ) as f:
@@ -1377,6 +1385,8 @@ def process_site_observation(
             "event_file": str(self._get_final_path(event_file)),
             "artefacts_file": str(self._get_final_path(artefacts_file)),
             "ligand_pdb": str(self._get_final_path(ligand_pdb)),
+            "ligand_mol": str(self._get_final_path(ligand_mol)),
+            "ligand_smiles": str(self._get_final_path(ligand_smiles)),
             "pdb_header_file": "currently missing",
             "ligand_mol_file": mol_data,
         }
@@ -1483,7 +1493,7 @@ def process_bundle(self):
         self.version_number = meta["version_number"]
         self.version_dir = meta["version_dir"]
         self.previous_version_dirs = meta["previous_version_dirs"]
-        prefix_tooltips = meta["code_prefix_tooltips"]
+        prefix_tooltips = meta.get("code_prefix_tooltips", {})
 
         # check transformation matrix files
         (  # pylint: disable=unbalanced-tuple-unpacking
@@ -1874,7 +1884,7 @@ def _tag_observations(self, tag, prefix, category, so_list):
             so_tag = SiteObservationTag()
             so_tag.tag = tag
             so_tag.tag_prefix = prefix
-            so_tag.upload_name = tag
+            so_tag.upload_name = f"{prefix} - {tag}"
             so_tag.category = TagCategory.objects.get(category=category)
             so_tag.target = self.target
             so_tag.mol_group = so_group
diff --git a/viewer/views.py b/viewer/views.py
index 8c33b318..1d1600e9 100644
--- a/viewer/views.py
+++ b/viewer/views.py
@@ -1473,40 +1473,30 @@ def create(self, request):
             return Response(content, status=status.HTTP_404_NOT_FOUND)
 
         logger.info('Found Target record %r', target)
-        site_obvs = models.SiteObservation.objects.none()
-        proteins_list = []
-        if request.data['proteins']:
-            logger.info('Given Proteins in request')
-            # Get first part of protein code
-            proteins_list = [
-                p.strip().split(":")[0] for p in request.data['proteins'].split(',')
-            ]
+        proteins_list = [
+            p.strip() for p in request.data.get('proteins', '').split(',') if p
+        ]
+        if proteins_list:
             logger.info('Given %s Proteins %s', len(proteins_list), proteins_list)
-
             logger.info('Looking for SiteObservation records for given Proteins...')
-            # Filter by protein codes
-            for code_first_part in proteins_list:
-                # prot = models.Protein.objects.filter(code__contains=code_first_part).values()
-                # I don't see why I need to drop out of django objects here
-                prot = models.SiteObservation.objects.filter(
-                    experiment__experiment_upload__target=target, code=code_first_part
+
+            site_obvs = models.SiteObservation.objects.filter(
+                experiment__experiment_upload__target=target,
+                code__in=proteins_list,
+            )
+
+            missing_obvs = set(proteins_list).difference(
+                set(site_obvs.values_list('code', flat=True))
+            )
+            if missing_obvs:
+                logger.warning(
+                    'Could not find SiteObservation record for "%s"',
+                    missing_obvs,
                 )
-                if prot.exists():
-                    # even more than just django object, I need an
-                    # unevaluated queryset down the line
-                    site_obvs = models.SiteObservation.objects.filter(
-                        pk=prot.first().pk,
-                    )
-                else:
-                    logger.warning(
-                        'Could not find SiteObservation record for "%s"',
-                        code_first_part,
-                    )
 
         else:
             logger.info('Request had no Proteins')
             logger.info('Looking for Protein records for %r...', target)
-            # proteins = models.Protein.objects.filter(target_id=target.id).values()
             site_obvs = models.SiteObservation.objects.filter(
                 experiment__experiment_upload__target=target
             )