Skip to content

Improvments to THEMIS GMAG RMD file download and processing #364

@DCarSunDr

Description

@DCarSunDr

The recent data gap with THEMIS GBO data revealed some areas where the download and processing scripts could be improved.

In the case of the download script, the actual wget calls complete relatively quickly; however, the script seems to spend a long time completing database updates, calling gbo_uc_rmd_mirror.php and updating a single file per query. We need to:

  • Determine if this database is still being used by any other scripts/if it's still needed.
  • If we still want to keep this database, we need to pass the queries to a single .sql file, and then have that sql file passed to single a mysql database call, minimizing repeated connections.
    • If we prefer, we can have a separate script/cronjob actually process the .sql file in a type of "mysql workdir", minimizing the number of scripts which require password access and enabling the database logging to function even if the original script encounters a fatal error.
  • If not, we can remove the database and database logging calls.

In the case of the processing script, completing the backlog took several days; this is fine on its own, but since it works by checking for files containing directory information in a single location, it interferes with routine processing for not just GBO sites, but all networks which we retrieve data via RMD files. Even if the processing script was run on a different machine, the directory files and the directory that the script checks would need to be different as well (to prevent the routine RMD processing script from trying to process the same files).

Additionally, processing a single month's worth of data seems to take a long time, but the exact reason for why this is is currently unknown. We need to:

  • Reconfigure the RMD scripts to allow routine processing while a reprocessing job is going on.
  • Analyze and improve the performance of RMD processing for individual directories/files.
  • Enable the processing script to be run in parallel and allow script options of start and end dates to aid future reprocessing jobs.

Metadata

Metadata

Assignees

Labels

GMAG-OtherIssues relating to non-THEMIS GMAG networks supported by SPEDAS, PySPEDAS, or the THEMIS SOCGMAG-THEMISIssues related to THEMIS GMAG sites and data productsSOCTHEMISdownloadenhancementperformance

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions