From 5560517a5517936d73ca4f29c92619bd5c71154d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tibor=20=C5=A0imko?= Date: Fri, 20 Sep 2024 08:30:44 +0200 Subject: [PATCH 1/2] fix(scripts): update fixture checking after repository split --- scripts/check_fixtures.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/check_fixtures.py b/scripts/check_fixtures.py index 4355616c70..741d40b12d 100755 --- a/scripts/check_fixtures.py +++ b/scripts/check_fixtures.py @@ -25,7 +25,7 @@ def print_warning(filename, recid="", field="", message="missing-field"): def main(): """Check record fixtures for basic fields.""" problems_found = False - fixtures_directory = "cernopendata/modules/fixtures/data/records" + fixtures_directory = "data/records" for filename in os.listdir(fixtures_directory): records = json.loads(open(fixtures_directory + os.sep + filename, "r").read()) for record in records: From 0f99b6b691ab675bec21d4a0612ed3888baae8e8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tibor=20=C5=A0imko?= Date: Wed, 18 Sep 2024 08:48:31 +0200 Subject: [PATCH 2/2] docs(developing): adapt documentation after infrastructure code split Adapts documentation for data curators following up the split of the portal's infrastructure code from the content in two repositories. Implements comments from the #3590 pull request review. Upgrades the `cernopendata-portal` image version to 0.2.9. Closes #3678. --- CONTRIBUTING.rst | 45 ++-- DEVELOPING.rst | 493 +++++++++++++++++++++++++++++++----- README.rst | 26 +- docker-compose-override.yml | 4 +- 4 files changed, 461 insertions(+), 107 deletions(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 7a71ff3f54..31a5b959a3 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -19,33 +19,26 @@ portal developments, you can `watch ongoing discussions `become part of the team `_. -Code contributions ------------------- +Contributions +------------- We follow typical `GitHub flow -`_. +`_. 1. Fork this repository into your personal space. -2. Start a new topical branch for any contribution. Name it sensibly, - say ``fix-event-display-icons``. -3. Test your branch on a local site. If everything works as expected, - please `sign your commits - `_ - to indicate its quality. -4. Create `logically separate commits for logically separate things - `_. - Check out our usual `development practices - `_. -5. Please add any ``(closes #123)`` directives in your commit log - message if your pull request closes an open issue. -6. Issue a pull request. If the branch is not quite ready yet, please - indicate ``WIP`` (=work in progress) in the pull request title. - -For more information on how we work with branches, see our `developing -guide `_. - -Chatroom --------- - -Our chatroom is `on gitter -`_. +2. Start a new topical branch for any contribution. Name it descriptively, for + example ``fix-cms-2012-collision-energy``. +3. Ideally, test your branch on a local development site to see if everything + works correctly. +4. Open a pull request against this repository. Verify that all the continuous + integration checks are passing. If not, please amend the pull request + accordingly. + +For more information on the process, please see our `developing guide +`_. + +Support +------- + +You can also get in touch via our `Mattermost +`_ chat room. diff --git a/DEVELOPING.rst b/DEVELOPING.rst index 3c5d7f320b..eda9942ca9 100644 --- a/DEVELOPING.rst +++ b/DEVELOPING.rst @@ -5,98 +5,306 @@ .. contents:: :backlinks: none +This document describes how you can run a local instance of the CERN Open Data +portal in order to work with the content records and associated documentation. + +Prerequisites +============= + +You will need to fork and clone two repositories: + +- `opendata.cern.ch `_ which + contains the open data content; + +- `cernopendata-portal `_ + which contains the portal infrastructure code. + +Please make sure to also install `Docker +`_ and `Docker Compose +`_ (version 2) that is used for local +developments. + +You will also need to pull Docker container images from `CERN Harbor registry +`_. If this is your first time using CERN Harbor, +you may need to login using a command-line secret obtained at `registry.cern.ch +`_: + +.. code-block:: console + + $ docker login registry.cern.ch + Installation ============ -This module contains the content of the CERN Open Data instance. It does not contain the code of the portal itself. -For local development, it is recommended to have an installation of the CERN Open Data Portal. For detail instructions -on how to install the portal, please follow `these instructions `_. - -Quick start -------------------- -For a quickstart guide, do the following: +In order to create a local CERN Open Data portal instance, please proceed as +follows: .. code-block:: console - $ # Checkout this repository - $ git clone https://github.com/cernopendata/opendata.cern.ch.git - $ # Checkout the module with the portal - $ git clone https://github.com/cernopendata/cernopendata-portal.git - $ # Move to the directory of the content - $ cd opendata.cern.ch - $ # Make sure that the latest images are available - $ docker compose pull - $ # Start the services - $ docker compose up -d - $ # Give enough time to the containers to start properly. Note that there are some dependencies among them, - $ # and the web container starts by setting up the development environment - $ sleep 120 - $ # Create the basic structure - $ docker exec -i -t opendatacernch-web-1 /code/scripts/populate-instance.sh --skip-records --skip-docs - $ docker exec -i -t opendatacernch-web-1 cernopendata fixtures records \ - --mode insert-or-replace \ - -f /content/data/records/cms-primary-datasets.json -.. + $ git clone https://github.com/cernopendata/opendata.cern.ch + $ git clone https://github.com/cernopendata/cernopendata-portal + $ cd cernopendata-portal && git checkout -B stable v0.2.9 + $ cd ../opendata.cern.ch + $ docker compose pull + $ docker compose up -d + $ sleep 120 # give enough time for the containers to start properly + $ docker exec -i -t opendatacernch-web-1 /code/scripts/populate-instance.sh \ + --skip-records --skip-docs --skip-glossary +This will create a running instance of the CERN Open Data portal with a +relatively empty content. The portal will be accessible locally at +`http://127.0.0.1:500 `_. -At this point, all the services should be up and running. If you go to a web browser to http://0.0.0.0:5000/, you should -see the web portal, with the vocabularies and some documents about the portal itself. +If you would like to stop and delete your local instance, you can do: -From this moment on, +.. code-block:: console -Defining new entries + $ docker compose down -v + +Working with records ==================== -This repository has the following data structure: +If you would like to work with certain data records and test your edits on your +local instance, you can proceed as follows. + +Edit the record file, such as CMS 2012 collision dataset records: + +.. code-block:: console + + $ vim data/records/cms-primary-datasets.json -* data: - * records: Put here the entries that should be inserted as records - * docs: This folder will be for the documents - * images: And this is for static images that might be needed -* scripts: Directory with shell scripts to help the development +Upload the locally-modified file into your instance: -If you want to modify the json schema, mappings or templates, you will find these folders in the -`cernopendata portal `_ repository +.. code-block:: console -Working with docs/records -------------------------- + $ docker exec -i -t opendatacernch-web-1 cernopendata fixtures records \ + --mode insert-or-replace \ + -f /content/data/records/cms-primary-datasets.json -The recommended development process is the following: +You can then check your changes at `http://127.0.0.1:500 +`_. -1. Create the entries under data/(records/docs) -2. Validate that the yaml syntax is correct +Note that you can take advantage of shell scripting if you would like to upload +all experiment records locally, for example for ATLAS: .. code-block:: console - $ my_docker exec -it web /content/scripts/check_fixtures.py + $ for file in data/records/atlas-*; do \ + docker exec -i -t opendatacernch-web-1 cernopendata fixtures records \ + --mode insert-or-replace -f $file; \ + done + +Understanding metadata fields +============================= + +When working with data records, there are several fields such as +`collision_energy` that you can use to store the content. The list of all +available record fields, together with their semantic meaning, is described in +the JSON Schema files. You can find the `record schema +`_ +in the portal infrastructure repository. + +If you would like to modify the JSON schema, for example to add a new field, +this would require working with the `cernopendata-portal` repository. Please +see its own `documenation +`_ about how to add new +metadata fields. We would be happy to assist with the process. + +Understanding output templates +============================== + +If you would like to change the way how the data records are displayed on the +web, for example to introduce new section displaying newly added field, this is +something that is governed by `Jinja templating language +`_ in the +`cernopendata-portal` repository. Please see its own `documenation +`_ about how to amend +look and feel of the record metadata. We would be happy to assist with the +process. + +Verifying metadata conformance +============================== + +You can use the provided helper script `check_fixtures.py` to check the +conformance of record files to the required minimal standard: -.. +.. code-block:: console + + $ ./scripts/check_fixtures.py + +Working with documents: metadata +================================ -3. Load the entries in the system. To reload all the entries defined in this repo, do: +If you would like to work with certain documents and test your edits on your +local instance, you can proceed as follows. + +Edit the record file, such as About LHCb documentation: .. code-block:: console - $ my_docker exec -it web /content/scripts/load-fixtures.sh + $ vim data/docs/lhcb-about/lhcb-about.json + $ vim data/docs/lhcb-about/lhcb-about.md + +Upload the locally-modified file into your instance: -.. +.. code-block:: console -4. If you want to load only some records/docs + $ docker exec -i -t opendatacernch-web-1 cernopendata fixtures docs \ + --mode insert-or-replace \ + -f data/docs/lhcb-about/lhcb-about.json + +Note that, similarly as for records, we are uploading document JSON files, +using the `fixtures docs` command. Even if you would like to change only the +document content that is living in the associated Markdown files, the document +JSON file is to be uploaded. + +You can then check your changes at `http://127.0.0.1:500 +`_. + +Working with documents: Markdown +================================ + +The portal uses `Python-markdown `_ for +Markdown rendering. There are `some differences +`_ between this implementation +and the `syntax rules `_, +mainly concerning lists: + +* You must always use 4 spaces (or a tab) for indentation and the same + character (-, \*, +, numbers) for items list. +* To add a Table Of Contents to a document, please place the identifier + ``[TOC]`` where you want it to be. + +The following extensions are enabled: + +* `markdown.extensions.attr_list `_ +* `markdown.extensions.tables `_ +* `markdown.extensions.toc `_ +* `pymdownx.magiclink `_ +* `pymdownx.betterem `_ +* `pymdownx.tilde `_ +* `pymdownx.emoji `_ +* `pymdownx.tasklist `_ +* `pymdownx.superfences `_ +* `mdx_math `_ + +Working with document: LaTeX +============================ + +LaTeX is enabled with the `mdx_math` extension. Inline equations are between +single ``$``, e.g. ``$E = m c^2$``. For standalone math, use ``\[...\]``. + +Working with documents: images +============================== + +Sometimes the document pages may have illustrating images. The images should be +placed into the `data/images` directory following the document slug. They can +then be referred to in your Markdown content by means of links. Please check an +existing documentation page such as +``totem-releases-first-set-of-open-data.md`` and where it stores and how it +loads the illustrating image ``totem-roman-pots-in-the-lhc-tunnel.jpeg``. + +After you add an image and reference it in your Markdown source file, you +should load the image into the system: +portal instance, .. code-block:: console - $ my_docker exec -it web cernopendata fixtures records --file /content/data/records/ - $ my_docker exec -it web cernopendata fixtures docs --file /content/data/docs/ + $ docker compose exec -it web /content/scripts/load-images.sh + +You should now be able to see the image locally in the document record. + +Appendix A: repository structure +================================ + +This repository holds the sources behind the CERN Open Data portal content. The +bibliographic records live as JSON files, the documentation records live as +JSON files with Markdown content and possible associated images. The repository +is structured as follows: + +- ``data/docs``: This directory contains the source of the documentation pages. + Each documentation page is identifies by a slug under which it is exposed in + the portal web interface. The documentation sources are then usually living + in a dedicated directory named with the slug. The documentation page lives as + a JSON file with the appropriate metadata describing title, authors, short + abstract, etc. The documentation page body usually lives as a separate + Markdown file that is linked from the JSON file. + +- ``data/images``: This directory contains any illustrative images that the + documentation pages may use. The images are usually stored in a similar + slug-based directories to make a link to the documentation page where they + are used. + +- ``data/records``: This directory contains the source of the bibliographic + records representing the main open data content (collision data, simulated + data, derived data, software, examples, configuration files, etc). The master + format is JSON following the schema of allowed optional and required fields. + It is usually in this directory where you would prepare new records for + inclusion into the open data portal. + +- ``data/skeletons``: This is a special directory that holds only "skeletons" + of bibliographic records, i.e. snippets of record JSON files containing only + persistent identifies such as record IDs, DOIs, and record titles. This is + used only in cases where the record content is huge, such as 40k of CMS 2016 + simulated data, which would not be practical to store in a git repository. + You could consider record skeletons to serve as a sort of "git lfs" pointer + to where the record JSON tarball is hosted, all the while keeping persistent + identifiers in this repository in order to avoid any mishap of "reserved" + identifiers. Usually, you would not work in this repository. + +- ``scripts``: This directory contains helper scripts assisting in record + preparation, such as metadata formatters and checkers. This helps to make + sure that the record JSON files are correct, and that they are formatted in + the unique way regardless of different text editors the different + collaborators may be using, preventing their subsequent reformatting. + +- ``run-tests.sh``: This helper script is used to perform all the metadata + checks in the Continuous Integration process. You can also run it locally + prior to submitting your pull requests. + +Appendix B: Git workflow +======================== + +Here is detailed example of our `GitHub flow +`_. + +Setting up repository +--------------------- + +Let's assume your GitHub account name is ``johndoe``. + +Firstly, fork `opendata.cern.ch repository +`_ by using the "Fork" +button on the top right. This will give you your personal repository: + +.. code-block:: console -.. + http://github.com/johndoe/opendata.cern.ch -5. Finally, if there are new images, ensure that they appear in the correct folder +Secondly, clone this repository onto your laptop and set up remotes so that +``origin`` would point to your repository and ``upstream`` would point to the +canonical location: .. code-block:: console - $ my_docker exec -it web /content/scripts/load-images.sh + $ cd ~/private/src + $ git clone git@github.com:johndoe/opendata.cern.ch + $ cd opendata.cern.ch + $ git remote add upstream git@github.com:cernopendata/opendata.cern.ch -.. +Optionally, if you are also going to integrate work of others, you may want to +set up `special PR branches +`_ +like this: +.. code-block:: console + + $ vim .git/config + $ cat .git/config + [remote "upstream"] + url = git@github.com:cernopendata/opendata.cern.ch + fetch = +refs/heads/*:refs/remotes/upstream/* + fetch = +refs/pull/*/head:refs/remotes/upstream/pr/* Understanding repository branches --------------------------------- @@ -104,21 +312,172 @@ Understanding repository branches We use three official base branches: master - What is installed on the `development server `_. + What is installed on the bleeding edge `development server `_. qa - What is installed on the `pre-production server `_. + What is installed on the pre-production `quality assurance server `_. production What is installed on the `production server `_. -The life-cycle of a typical releasing new content is therefore: -(1) development starts on a personal laptop in a new topical branch stemming from the -``master`` branch; -(2) when the new content is ready, the developer issues a pull request against master, the branch is reviewed by the system -integrator, and merged if appropriate; -(3) If there are no issues with development, it will also be merged into the ``qa`` branch, and deployed on the pre-production -server; -(3) after sufficient testing time on the pre-publication -server, the new content is merged into the ``production`` branch and -deployed on the production server. +The life-cycle of a typical new feature is therefore: (1) development starts on +a personal laptop in a new topical branch stemming from the ``master`` branch; +(2) when the feature is ready, the developer issues a pull request, the branch +is reviewed by the system integrator, merged into the ``qa`` branch , and +deployed on the pre-production server; (3) after sufficient testing time on the +pre-publication server, the feature is merged into the ``production`` branch +and deployed on the production server. + +The following sections document the development life cycle in fuller detail. + +Working on topical branches +--------------------------- + +You are now ready to work on something. You should always create +separate topical branches for separate issues. + +Here is example: + +.. code-block:: console + + $ git checkout master + $ git checkout -b fix-cms-about-page-content-typos + $ vim data/docs/cms-about/cms-about.md + $ git commit -a -m 'fix(docs): correct About CMS page typos' + $ vim data/docs/cms-about/cms-about.md + $ git commit -a -m 'fix(docs): more About CMS grammatical fixes' + +When everything is ready, you may want to rebase your topical branch +to get rid of unnecessary commits: + +.. code-block:: console + + $ git checkout fix-cms-about-page-content-typos + $ git rebase master -i # squash commits here + +Making pull requests +-------------------- + +You are now ready to issue a pull request: just push your branch in +your personal repository: + +.. code-block:: console + + $ git push origin fix-cms-about-page-content-typos + +and use GitHub's "Pull request" button to make the pull request. + +Watch GitHub Actions build status report to see whether your pull request +is OK or whether there are some troubles. + +Updating pull requests +---------------------- + +Consider the integrator had some remarks about your branch and you +have to update your pull request. + +Firstly, update to latest upstream "master" branch, in case it may +have changed in the meantime: + +.. code-block:: console + + $ git checkout master + $ git fetch upstream + $ git merge upstream/master --ff-only + +Secondly, make any required changes on your topical branch: + +.. code-block:: console + + $ git checkout fix-cms-about-page-content-typos + $ vim data/docs/cms-about/cms-about.md + $ git commit -a --no-edit + +Thirdly, when done, interactively rebase your topical branch into +nicely organised commits: + +.. code-block:: console + + $ git rebase master -i # squash commits here + +Finally, re-push your topical branch with a force option in order to +update your pull request: + +.. code-block:: console + + $ git push origin fix-cms-about-page-content-typos -f + +Finishing pull requests +----------------------- + +If your pull request has been merged upstream, you should update your +local sources: + +.. code-block:: console + + $ git checkout master + $ git fetch upstream + $ git merge upstream/master --ff-only + +You can now delete your topical branch locally: + +.. code-block:: console + + $ git branch -d fix-cms-about-page-content-typos + +and remove it from your repository as well: + +.. code-block:: console + + $ git push origin master + $ git push origin :fix-cms-about-page-content-typos + +This would conclude your work on ``fix-cms-about-page-content-typos`` branch. + +Appendix C: Git commit messages +=============================== + +We are using `conventional commits +`_ style which is also checked +by the continuous integration process. + +The commit message structure is as follows: + +.. code-block:: text + + (scope): + + [optional body line 1] + [optional body line 2] + [optional body line 3] + + [optional footer: BREAKING CHANGE: foo bar blah] + [optional footer: Closes #] + +The commit message headline examples: + +.. code-block:: text + + feat(skeletons): add CMS 2016 SIM record skeletons + fix(docs): remove trailing slash from TOTEM release image URL + fix(records): improve description of DELPHI full DST manuals + build(docker): upgrade cernopendata-portal to 0.2.5 + +The commit message types are: + +- **build** for changes affecting the build process or external dependencies (e.g. docker) +- **chore** for miscellaneous tasks not affecting source code or tests (e.g. release) +- **ci** for changes affecting continuous integration (e.g. linting) +- **docs** for documentation-only changes +- **feat** for changes introducing new features or backwards-compatible improvements to existing features +- **fix** for changes fixing bugs +- **perf** for changes improving performance without changing functionality +- **refactor** for changes that do not fix bugs or add features +- **style** for changes not affecting the meaning (e.g. formatting) +- **test** for adding missing tests or correcting existing tests + +The commit message scope refers to an internal module of this repository, which +would be typically ``records`` and ``docs``, and occasionally something else +such as ``scripts``. So, in the vast majority of cases, you may be writing +``feat(records)`` when adding new records, ``fix(docs)`` when fixing existing +docs, etc. diff --git a/README.rst b/README.rst index 5734b2ec09..3abf53a8e4 100644 --- a/README.rst +++ b/README.rst @@ -8,34 +8,36 @@ .. image:: https://img.shields.io/badge/licence-GPL_2-green.svg?style=flat :target: https://raw.githubusercontent.com/cernopendata/opendata.cern.ch/master/LICENSE -.. image:: https://badges.gitter.im/Join%20Chat.svg - :target: https://gitter.im/cernopendata/opendata.cern.ch?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge - About ----- -This is the source code behind CERN Open Data portal. You can access the portal -at `http://opendata.cern.ch/ `_. The source code uses -`Invenio `_ digital repository framework. +This repository is part of for the `CERN Open Data portal +`_, hosting the source versions of the content, +notably data records and associated documentation pages. The CERN Open Data +portal's infrastructure code is living in the `cernopendata-portal +`_ repository. The CERN +Open Data portal is built upon the `Invenio `_ +digital repository framework. Developing ---------- -If you'd like to install a demo site locally for personal developments, please -see `developing guide `_ for more information. +If you'd like to install a demo site of the CERN Open Data portal locally for +personal developments, please see the `developing guide `_ for +more information. Contributing ------------ -Bug reports, feature requests and code contributions are encouraged and -welcome! Please see `contributing guide `_ for more -information. +Bug reports, feature requests and contributions are encouraged and welcome! +Please see the `contributing guide `_ for more information. Support ------- You can ask questions at our `Forum `_ or get -in touch via our `Chatroom `_. +in touch via our `Mattermost +`_ chat room. Authors ------- diff --git a/docker-compose-override.yml b/docker-compose-override.yml index 5b0f670af3..d497ca7c4f 100644 --- a/docker-compose-override.yml +++ b/docker-compose-override.yml @@ -1,7 +1,7 @@ # -*- coding: utf-8 -*- # # This file is part of CERN Open Data Portal. -# Copyright (C) 2015, 2016, 2017, 2018, 2021, 2022, 2023, 2024 CERN. +# Copyright (C) 2015, 2016, 2017, 2018, 2021, 2022, 2023, 2024, 2025 CERN. # # CERN Open Data Portal is free software; you can redistribute it # and/or modify it under the terms of the GNU General Public License as @@ -30,7 +30,7 @@ services: - TEMPLATES_AUTO_RELOAD=True command: bash -c "/content/scripts/start-server-debug.sh" restart: "unless-stopped" - image: registry.cern.ch/cernopendata/cernopendata-portal:0.2.5 + image: registry.cern.ch/cernopendata/cernopendata-portal:0.2.9 volumes: - ../opendata.cern.ch:/content - ./cernopendata:/code/cernopendata