-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the pyOmeroUpload wiki!
The PyOmeroUpload toolkit software architecture comprises three main components: the data transfer manager, data broker and metadata parser. The data transfer manager provides a high-level interface for transferring data in a specified directory structure to a remote OMERO server, initiating the core functions of extracting metadata and uploading data by delegating to the appropriate service. The data broker service makes extensive use of the OMERO Python API modules, exposing core functions for administering HTTP sessions, creating OMERO datasets and linking metadata objects, retrieving objects with HQL queries, and creating multi-dimensional images by compositing individual images together according to pre-defined folder structure and filename convention rules. The metadata parser implementation is designed to build internal representations in the form of nested Python dictionaries of relevant biological metadata extracted from microscope log and acquisition files. This is achieved using regular expressions to match against text discovered in the log files as they are read iteratively. These internal representations are then converted into the appropriate OMERO annotation object – either a key-value pairs map17, a table18 or a tag19 – and uploaded via the broker. The data broker class encapsulates all direct interoperation with the OMERO API, while the data transfer manager does not import any OMERO API modules and is thus resilient to any changes in the OMERO service APIs.
The OMERO Python API is the core dependency for connecting to and managing sessions with the OMERO server, as well as for retrieving data objects and executing Object-Relational Mapping (ORM) queries using the Hibernate Query Language (HQL). The OMERO API can be used with the Blitz Gateway, “a Python client-side library that facilitates working with the OMERO API, handling connection to the server, loading of data objects and providing convenience methods to access the data”20, or it can be utilized via a number of lower ‘service’ levels that provide stateless access. Although accessing the OMERO Python API with the Blitz Gateway as a context manager is encouraged, the stateless service level APIs were found to be more robust with regard to differences between versions of the client library relative to the OMERO server (for example, we could interact with the public demo OMERO server running version 5.5.1 even though there is no official 5.5.1 library available in Bioconda….), as well as less sensitive to SSL issues than the Blitz Gateway session management. While the Blitz Gateway does provide very convenient connection and object wrappers, the power and flexibility of the underlying service APIs is of significant consideration. At the time of writing, the OMERO Python API supports only Python 2, with the Python 3 implementation planned for early 2020, so consequently the toolkit is currently implemented in Python 2.
The uploader toolkit software is extensible in the sense that the metadata parser implementations can be replaced as required for application to different log file data; as long as the output from the parsers is in a generic Python dictionary format, then it can be processed by the data broker into the appropriate OMERO annotation type. Likewise, the function utilized by the data broker to iterate through the provided directory structure and composite the multi-dimensional images from sub-directories according to location and filename, can be replaced with a custom implementation easily. The toolkit is available as a Conda package through the Bioconda channel, and once installed these modifications and function overrides can be realized by the user with some extra configuration as shall be explained in a later section.
To cater for Windows OS users, and to minimize complexity, the OMERO Python library and PyOmeroUpload package have been integrated into portable Docker image definitions. These images are specified by a hierarchy of Dockerfiles that build upon one another, and all inherit from a base image which incorporates the library in a ready-baked Conda environment. The base image itself inherits from a parent OpenJDK 821 Docker official image, which includes the required Java version for compatibility with version 5.4.10 of the OMERO library. There are five Docker images in total, as described in the table below. This system of inheritance has been implemented because it minimizes the effort required to upgrade the version of the OMERO library that is specified, since only the Dockerfile for the base image must be modified and then all successive images automatically include the new library when they are rebuilt. Likewise, the first child image incorporating the PyOmeroUpload library can be easily updated to accommodate new versions of the uploader library. Concomitantly, the ongoing maintenance of each antecedent Docker image is easily managed because individual Dockerfiles have clearly delineated roles with particular dependencies that can be updated independently of one another.
The intended operation of the PyOmeroUpload library is designed to support users with a range of software experience levels, as explored in later sections. At the least complex end of the usage spectrum, the library can be installed in a Conda environment, or the Docker images downloaded from DockerHub.
The images need to be pulled from DockerHub23 [TODO: Sign-up and deploy pre-built images to DockerHub and provide URL here] or built in sequence and available in the system’s local Docker image repository so that the inheritance chain is completed successfully. Once the images are present, the Docker containers are ready to be instantiated with the run command and any desired configuration parameters, as specified in the README.md24. In order to instruct commands within a Docker container, an interactive shell can be established with the standard ‘exec’ command:
$ docker exec -it omero-uploader bash
For users running 64-bit Linux and Mac OS systems, the library can be installed in a Conda environment. Installation through Conda follows the typical usage pattern, except that the OMERO Python library requires Python 2.7 so a corresponding Conda environment must be created with this configuration. Additionally, the OMERO Python and PyOmeroUpload packages are only available through the Bioconda channel. Therefore, Conda must be instructed to use this channel explicitly, demonstrated in the environment setup commands25 as follows:
$ conda create --name omero_upload python=2.7
$ source activate omero_upload
$ conda install pyomero-upload
With the PyOmeroUpload Conda package installed or the desired containers in a running state, the user is ready to instruct commands in the appropriate environment. The user may initiate a Python shell from their CLI terminal (either in Linux, Mac OS or within the Docker container on Windows), and then import the OMERO Python API and PyOmeroUpload libraries. The PyOmeroUpload modules can be executed to upload the accompanying sample data [TODO: provide data repository URL for sample data] with an invocation of the data transfer manager as shown in the figure below. The user specifies the designated metadata parser(s), the target directory and, if desired, an alternative custom data transformation function with which to process collections of single images into n-dimensional images. If no data transformation function is specified but the hypercube option is still present, the uploader will attempt to transform the data into five-dimensional hypercubes according to the following rules:
-
Target directory contains sub-directories named ‘pos{xxx}’, each of which corresponds to a microscope position, where ‘{xxx}’ is a unique numeric identifier for that position
-
Within each sub-directory, there are multiple image files per z-section, time point and channel
-
Each image file adheres to a naming convention of ‘{abc}{z-section}{channel}_{timepoint}’ where ‘{abc}’ can be any arbitrary string [TODO: confirm filename convention]
Depending on network bandwidth and latency, the upload process can take some time. Once data are uploaded to the OMERO server, standard data curation procedures can be applied.
[TODO: This is a placeholder image; to be replaced with a figure showing CLI usage of PyOmeroUpload, to be produced when the Conda package build is fixed]
Slightly more advanced usage of the toolkit can be achieved with the Jupyter Notebook server Docker image. Users can invoke the PyOmeroUpload data transfer manager functions directly, by importing the relevant packages. Assuming the omero-jupyter container has been initiated with the required volume mounts (see the OMEROConnect repository README.md), the sample Jupyter Notebook ‘test_omero_upload.ipynb’ in the work directory can be executed to demonstrate upload to the demonstration OME OMERO server. Users must provide their own login credentials for access to this server, and this requires registration26. Alternatively, if users have access to another OMERO server, the relevant configuration parameters can be adjusted in the notebook’s global variables. Other notebooks in the image are ‘test_omero_query.ipynb’ and ‘test_omero_api.ipynb’, for data retrieval from an OMERO server via either the service level APIs or the JSON API respectively. These notebooks demonstrate interactive querying, and can be adapted for deeper analysis and visualization using the included Python libraries such as Pandas27, NumPy28, Matplotlib29 and seaborn30.
For developers, the omero_ide image provides a fully-fledged IDE (Integrated Development Environment) featuring JetBrains’ PyCharm31 Community edition. Like the Jupyter image, the IDE image comes with the same pre-built Conda environment and required libraries installed. In addition, the IDE container runs an OpenSSH32 server that enables users to establish an X1133 SSH connection so that the IDE GUI can be displayed, as if the IDE is running on the host system. For Linux and Mac OS users, the connection can be established simply by entering the standard ssh -X jovyan@127.0.0.1 -p 2222 in the command terminal. For Windows users, an X Server application must be installed such as MobaXTerm34 or XMing35, followed by the appropriate instructions to create an X11-enabled SSH session with username ‘jovyan’, host ‘127.0.0.1’ and port ‘2222’.