Processing survey images and testing the overall time using various endpoint storages

Configuration

This benchmarking code must be cloned into your homespace
This has now been updated to use mulitprocessing, to edit how many processes you have to change "-j 8" in the gen.sh scripts for all of pipetask operations (multiband, processccd and coaddtions)
Step 1 : Setting up the latest lsst environment
- Installing the newinstall.sh and eups distrib
  - The eups distirb is updated weekly and the newinstall scripts is update less frequently but if you notice that eups install doesn't fully excuted you'll need to redownload the latest version of the newinstall
  - Make an installation directory:
    - run: mkdir -p lsst_stack
    - run: cd lsst_stack
  - Run newinstall.sh
    - Make sure you get the latest tag (https://github.com/lsst/lsst/tags)
    - run: curl -OL https://raw.githubusercontent.com/lsst/lsst/w.2021.34/scripts/newinstall.sh
    - run: unset LSST_HOME EUPS_PATH LSST_DEVEL EUPS_PKGROOT REPOSITORY_PATH
    - run: bash newinstall.sh -ct
    - We recommend that you opt into the provided Miniconda Python environment (see https://pipelines.lsst.io/v/weekly/install/newinstall.html). Then load the LSST software environment into your shell:
      - run: source loadLSST.bash # for bash
      - run: source loadLSST.csh # for csh
      - run: source loadLSST.ksh # for ksh
      - run: source loadLSST.zsh # for zsh
  - Install Science Pipelines packages
    - The LSST Science Pipelines is distributed as the lsst_distrib EUPS package. Install the current version, w_latest:
      - run: eups distrib install -t w_latest lsst_distrib
      - run: curl -sSL https://raw.githubusercontent.com/lsst/shebangtron/master/shebangtron | python
      - run: setup lsst_distrib
      - run: setup lsst_apps
Step 2 : Changing the configuration of obs_subaru and meas_base package
- Bright object mask error solution
  - path to repo (Note: this is a guide to the path, your path is likely to be a tiny bit different (different numbers))cd ~/lsst_stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/obs_subaru/22.0.1-20-g904645ea+7c6b33a4e9/config/
  - Within this repository there is the "measureCoaddSources.py" and "forcedPhotCoadd.py"
  - Comment out these lines in both of the scripts above:
    - config.measurement.plugins['base_PixelFlags'].masksFpCenter.append('BRIGHT_OBJECT')
    - config.measurement.plugins['base_PixelFlags'].masksFpAnywhere.append('BRIGHT_OBJECT')
- Fixing the id factory "large exporsure ID "
  - path to repo (Note: this is a guide to the path, your path is likely to be a tiny bit different (different numbers)) cd lsst_stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/meas_base/22.0.1-10-gba590ab+1f0801fda2/python/lsst/meas/base/
  - comment out this line:
    - idFactory = lsst.afw.table.IdFactory.makeSource(expId, 64 - expBits)
  - Copy the import in the import section and the rest in the generateMeasCat
  - from lsst.obs.base import ExposureIdInfo
    
    exposureIdInfo = ExposureIdInfo.fromDataId(exposureDataId, "tract_patch_band")
    idFactory = exposureIdInfo.makeSourceIdFactory()
Step 3 : Installing all of the profiling tools
- iostat: (disk profiling)
  - run: yum -y install sysstat
- IFDATA: (network profiling)
  - run: sudo yum install moreutils
Step 4 : installing and configurating the s3 tools
- Rclone
  - More information about rclone is within the rclone_docs directory:
    - run: cd ~/LSST-RAL-ECHO-EXP/lsst/rclone_docs
  - Installation:
    - run: curl https://rclone.org/install.sh | sudo bash
  - Configuration:
    - run: rclone config
      - Type of storage to configure = "s3"
        Choose your S3 provider = "Ceph"
        AWS Access Key ID = "access_key"
        AWS Secret Access Key ="secret_access_key"
        Endpoint for S3 API ="s3.echo.stfc.ac.uk"
        All of the other configs settings can be set to default
- S3cmd
  - Installation:
    - run: yum install s3cmd
  - Configuration:
    - s3cmd's default configuration files is located at ~/.s3cfg. To set up s3cmd to use Echo replace the contents of the file with:
      - [default]
        access_key =
        secret_key =
        host_base = s3.echo.stfc.ac.uk
        host_bucket = s3.echo.stfc.ac.uk/%(bucket)
      - Filling the access key and secret key fields with your Echo S3 credentials. Ensure this file is only readable by your user if you are on a shared system.
Step 5 : CephFS configuration
- Mount your CephFS instance in your home space (e.g. ~/cephfs_lsst)
- make sure you call it "cephfs_lsst"

Downloading the raw HSC Data

you have to been in the LSST conda environment to do this step (look at the beginning of running the test)
Step 1: run: git lfs install (in the sourced conda environment )
Step 2: Downloading the sample data
- cd ~/LSST-RAL-ECHO-EXP/lsst
- git clone https://github.com/lsst/testdata_ci_hsc
Step 3: Moving the data so it works with the testing
- run: setup lsst_distrib
- run: setup lsst_apps
- run: butler ingest-raws ~/LSST-RAL-ECHO-EXP/lsst/DATA_gen3 ~/LSST-RAL-ECHO-EXP/lsst/testdata_ci_hsc/raw

Running the test

Before running the test you have source the lsst environment
- run: cd ~/lsst_stack
- run: source loadLSST.bash
- run: setup lsst_distrib
- run: setup lsst_apps
- run: export S3_ENDPOINT_URL=S3_endpoint (only needed if you are going to test S3 storage)
To run all of them:
- cd ~/LSST-RAL-ECHO-EXP/lsst/pipeline_runners
- run: source all_runner.sh
To run one of them source the approperiate script in the pipeline_runner
- run: source <>_runner.sh

Analysing the data

The data can be found in the time_test directory under its respective endpoint storages and command line tasks/ pipetasks
The notebook directory has examples of how I analysed the data, feel free to use this or create your own anaylsis code

Useful links

pipetasks (https://pipelines.lsst.io/modules/lsst.ctrl.mpexec/pipetask.html?highlight=pipetasks)
butler command line task (https://pipelines.lsst.io/modules/lsst.daf.butler/scripts/butler.html)
My Community post about the GEN 3 run (https://community.lsst.org/t/recreating-the-lsst-science-pipeline-tutorial-gen-2-only-using-generation-3-command-line-tasks-and-the-pipetasks/5606/52)

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
lsst		lsst
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Processing survey images and testing the overall time using various endpoint storages

Configuration

Downloading the raw HSC Data

Running the test

Analysing the data

Useful links

About

Releases

Packages

Languages

joshdimanteto/LSST-RAL-ECHO-EXP

Folders and files

Latest commit

History

Repository files navigation

Processing survey images and testing the overall time using various endpoint storages

Configuration

Downloading the raw HSC Data

Running the test

Analysing the data

Useful links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages