Skip to content

Timing tests for HSC processing using the LSST science pipelines and the third generation Butler with S3 storage

Notifications You must be signed in to change notification settings

joshuadkitenge/LSST-RAL-ECHO-EXP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 

Repository files navigation

Processing survey images and testing the overall time using various endpoint storages

Configuration

  • This benchmarking code must be cloned into your homespace
  • This has now been updated to use mulitprocessing, to edit how many processes you have to change "-j 8" in the gen.sh scripts for all of pipetask operations (multiband, processccd and coaddtions)
  • Step 1 : Setting up the latest lsst environment

    • Installing the newinstall.sh and eups distrib

      • The eups distirb is updated weekly and the newinstall scripts is update less frequently but if you notice that eups install doesn't fully excuted you'll need to redownload the latest version of the newinstall
      • Make an installation directory:
        • run: mkdir -p lsst_stack
        • run: cd lsst_stack

      • Run newinstall.sh

      • Install Science Pipelines packages

  • Step 2 : Changing the configuration of obs_subaru and meas_base package

    • Bright object mask error solution

      • path to repo (Note: this is a guide to the path, your path is likely to be a tiny bit different (different numbers))cd ~/lsst_stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/obs_subaru/22.0.1-20-g904645ea+7c6b33a4e9/config/
      • Within this repository there is the "measureCoaddSources.py" and "forcedPhotCoadd.py"
      • Comment out these lines in both of the scripts above:

        • config.measurement.plugins['base_PixelFlags'].masksFpCenter.append('BRIGHT_OBJECT')
        • config.measurement.plugins['base_PixelFlags'].masksFpAnywhere.append('BRIGHT_OBJECT')

    • Fixing the id factory "large exporsure ID "

      • path to repo (Note: this is a guide to the path, your path is likely to be a tiny bit different (different numbers)) cd lsst_stack/stack/miniconda3-py38_4.9.2-0.6.0/Linux64/meas_base/22.0.1-10-gba590ab+1f0801fda2/python/lsst/meas/base/
      • comment out this line:
        • idFactory = lsst.afw.table.IdFactory.makeSource(expId, 64 - expBits)

      • Copy the import in the import section and the rest in the generateMeasCat
      • from lsst.obs.base import ExposureIdInfo

        exposureIdInfo = ExposureIdInfo.fromDataId(exposureDataId, "tract_patch_band")
        idFactory = exposureIdInfo.makeSourceIdFactory()

  • Step 3 : Installing all of the profiling tools

    • iostat: (disk profiling)
      • run: yum -y install sysstat

    • IFDATA: (network profiling)
      • run: sudo yum install moreutils

  • Step 4 : installing and configurating the s3 tools

    • Rclone

      • More information about rclone is within the rclone_docs directory:
        • run: cd ~/LSST-RAL-ECHO-EXP/lsst/rclone_docs

      • Installation:
      • Configuration:
        • run: rclone config
          • Type of storage to configure = "s3"
            Choose your S3 provider = "Ceph"
            AWS Access Key ID = "access_key"
            AWS Secret Access Key ="secret_access_key"
            Endpoint for S3 API ="s3.echo.stfc.ac.uk"
            All of the other configs settings can be set to default

    • S3cmd

      • Installation:
        • run: yum install s3cmd
      • Configuration:
        • s3cmd's default configuration files is located at ~/.s3cfg. To set up s3cmd to use Echo replace the contents of the file with:
          • [default]
            access_key =
            secret_key =
            host_base = s3.echo.stfc.ac.uk
            host_bucket = s3.echo.stfc.ac.uk/%(bucket)
          • Filling the access key and secret key fields with your Echo S3 credentials. Ensure this file is only readable by your user if you are on a shared system.

  • Step 5 : CephFS configuration

    • Mount your CephFS instance in your home space (e.g. ~/cephfs_lsst)
    • make sure you call it "cephfs_lsst"

Downloading the raw HSC Data

  • you have to been in the LSST conda environment to do this step (look at the beginning of running the test)

  • Step 1: run: git lfs install (in the sourced conda environment )

  • Step 2: Downloading the sample data

  • Step 3: Moving the data so it works with the testing

    • run: setup lsst_distrib
    • run: setup lsst_apps
    • run: butler ingest-raws ~/LSST-RAL-ECHO-EXP/lsst/DATA_gen3 ~/LSST-RAL-ECHO-EXP/lsst/testdata_ci_hsc/raw

Running the test

  • Before running the test you have source the lsst environment

    • run: cd ~/lsst_stack
    • run: source loadLSST.bash
    • run: setup lsst_distrib
    • run: setup lsst_apps
    • run: export S3_ENDPOINT_URL=S3_endpoint (only needed if you are going to test S3 storage)

  • To run all of them:

    • cd ~/LSST-RAL-ECHO-EXP/lsst/pipeline_runners
    • run: source all_runner.sh

  • To run one of them source the approperiate script in the pipeline_runner

    • run: source <>_runner.sh

Analysing the data

  • The data can be found in the time_test directory under its respective endpoint storages and command line tasks/ pipetasks
  • The notebook directory has examples of how I analysed the data, feel free to use this or create your own anaylsis code

Useful links

About

Timing tests for HSC processing using the LSST science pipelines and the third generation Butler with S3 storage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published