Skip to content

Overview

Alvin Sebastian edited this page Jul 8, 2013 · 3 revisions

Overview

Application Description

The Semaphore tools consists of a web application to run biogeochemical modelling tools online and a Microsoft Excel addins to help importing the modelling output. For those who want to use Kepler Scientific Workflow to run the biogeochemical models, a Kepler workflow example bundled with custom actors is also provided.

Background

Predictive modelling of carbon and nitrogen dynamics in vegetation and soils is critical for understanding and sustainably managing Australian ecosystems. But the value of these predictions depends on how well the models can simulate the real world. Outputs of the models can vary markedly. This can be due the different focus each model’s has in the carbon and nitrogen cycle which leads to different assumptions and parameterisations.

Modelling Process

The following diagram illustrates a typical modelling process:

modelling process

1. Collect site/environment and observation data

Two types of data needs to be made available:

  • Site/environment data: Various tabular and non-tabular data that is needed as the input parameters for the simulation.
  • Observation data: Mostly tabular data that is needed to validate the simulation result.

In general, there is no standard format to record/documents these original data as observed/measured from the field.

2. Setup initial configurations using site/environment data

This step starts by making copy of the initial directory that contain the model executables and its parameter files (with default values). The site/environment data required as input parameters can be grouped per input file as follows:

  • Weather data (SITE.wth): In Australia, the de facto weather data provider is BOM. The data need to be downloaded as a CSV file, and transformed to CENTURY/DayCENT format. The required format is different between CENTURY and DayCENT. CENTURY needs monthly data while DayCENT needs daily data. Current practice is to use Microsoft Excel to manually convert the data.
  • Model parameters (*.100): Parameters that describe the site environment, elements, and activities condition such as crop, tree, fertilizer, etc. Some parameters need to be adjusted to reflect the field environment while some can be left as the default.
  • Site management events/activities (*.sch): This data describes mainly the duration of the simulation and the events that will be simulated on the site. Creating/editing this file using a text editor is a tedious process because it involves calculating Julian dates and referencing other parameters in *.100 files.
  • Output files parameter (outfiles.in): A list of 0 or 1 values to mark whether the model should produce some additional output file or not. This file is only required in DayCENT
  • Soil data (soils.in): A description of the soil layer structure. Some values for this table can be measured from the field.
  • Additional site parameter (sitepar.in): Additional site information needed by DayCENT.

2a. Modify model parameters

To achieve simulation results that match with the observation data, the input files as described in Step 2 must be modified accordingly. It requires human experts to tweak and adjust some values among hundreds of parameters based on their experiences. To keep track of the changes, a user can make copy of the working directory (and all the files) before modifying the input files. However it adds an extra effort and could results in many unmanageable folders. So the usual practice is to record the changes manually (e.g. make a note)

3. Run the model

The simulation model is performed by executing a batch file from the command line console. The batch file is used to automate (mainly) the tasks of:

  • deleting previously generated output files
  • running the required schedule file,
  • running a tool to produce a tabular text output files from the binary output files.

4. Format output files

Current practice is to import the tabulat text output files into Microsoft Excel sheets. The process it tedious because it involves a lot of mouse movement and clicks to import, copy, and paste a table multiple times in Excel.

5. Compare simulation result with observation data

Selected values from the output and observation data are plotted to a chart and visually compared by a user. Current practice is to use Microsoft Excel to plot the chart and also to keep track of the results. Based on the user's judgement, the process can be repeated from step 2a until a satisfying result is achieved.

Using Semaphore

Normally, to proceed with step 3, the scientists need to obtain their own copy of the biogeochemical modelling program (in this case, Century and Daycent). Setting up the programs in the local computer to quickly run experiments in an organised manner is not trivial. Semaphore comes with an execution service, a component that allows Century and Daycent models to be run remotely. The scientists only need to upload their configuration files using a web browser, and run the model with a single click.

To calibrate the model, steps 2a, 3, 4, and 5 needs to be repeated many times, which make keeping track of the input parameters and output data difficult. By using the Semaphore web application, a user can easily create, manage, and run their experiments in the cloud without having to download or install the software. All the associated input and output data for each experiment run are made persistent in the cloud. All experiment runs are recorded and shown as a history list to allow them to be easily tracked.

To validate the modelling result, Microsoft Excel is often used to quickly transform data, generate charts, and compare results visually. However, getting the modelling output to Excel spreadsheets is a tedious manual process involving a lot of clicks and copy paste. By using Semaphore Excel addins, modelling output can be imported with a single click.

Acknowledgements

This project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS) Program and the Education Investment Fund (EIF) Super Science Initiative. The software is developed in conjunction with Queensland University of Technology (QUT) and The Australian Centre for Ecological Analysis and Synthesis (ACEAS).