Skip to content

Using IBMStreams from Streams Studio

hildrum edited this page Jul 17, 2015 · 1 revision

If you use Streams Studio to develop your toolkits and applications, you will want to use the Eclipse Git tool (EGit) to share your source code in IBMStreams repositories. This page offers step-by-step instructions for doing that.

Note: This page is a work in progress. Comments, criticisms, additions, and alternatives are all welcome. Discussion is here.


understanding how Git works

Git is a scheme for managing source code that allows developers to collaborate on a software project without any central coordination. It is described in loving detail by the Pro Git book. You really should read at least "Chapter 2. The Basics", but if you're in a hurry to get started, here is a very brief summary:

Git keeps all of the source files related to a software project in a repository. Each developer involved in a software project keeps a complete copy of the repository on their own machine. Initially, you will create a clone some existing copy of the repository on your machine. As you work, Git will keep track of every change you make to your copy. Meanwhile, Git will be keeping track of every change other developers make to their copies of the repository.

If you are new to using Git and Github and think you will be making edits, we suggest that you fork the repository you want to work in, and clone that fork, rather than the IBMStreams repository. There will be three copies--the IBMStreams one, your fork on GitHub, and your local repository. You do your work (add operators, change applications, and edit) in your local repository.

When you're ready to share your application, you commit and push to your fork on GitHub. You can have multiple people contributing to your fork, and since you have total control over your fork, you can make whatever changes you like here.

When (and if) you think your changes should be incorporated into the IBMStreams repository, you issue a pull request via the github web interface. Then the people with push permission on the IBMStreams repository will review your code to decide whether to include it or not. They may ask for additional comments or code changes before accepting the code.

Git also allows branching within a single repository, which is a useful way to keep work on different features separate, but is more complex than many people need.

Git is implemented as a set of commands that can be typed at a Linux prompt, but if you use Streams Studio, you may never type them directly. The Eclipse Git tool (EGit) provides a perspective and views that will show you the contents of your repository and issue the commands for you. The exhaustive EGit User's Guide and EGit Tutorial will tell you everything there is to know about it.

The instructions on this page will get you started with Git and EGit, but then you really do need to read the documentation.


understanding what IBMStreams is

IBMStreams is a collection of Git repositories containing toolkits for the IBM InfoSphere Streams product. They extend its capabilities beyond the standard toolkit and specialized toolkits that are packaged with the product.

The IBMStreams repositories are hosted by GitHub, and provided as open source under the Apache License, version 2.0. We encourage you to use the toolkits in these repositories, collaborate with us on improving them, and contribute your own toolkits to the community of Streams developers.


create a GitHub account

Note: You can skip this step if you just want to create a copy for yourself and do not plan to contribute any changes.

You will need a GitHub account to use IBMStreams. If you already have one, you can use it, or if not, you can create one for free.

You may use a Secure Shell (SSH) public/private key pair for your GitHub account and Streams Studio. If you already have SSH keys on your machine, perhaps created by the "First Steps" application when you installed InfoSphere Streams, you can use them with GitHub. If you do not already have SSH keys, you should create them now before proceeding. Follow these instructions to create SSH keys on your machine using a Linux command prompt.

You can create your GitHub account by filling in the Join GitHub form. After you have signed into your account, add your public SSH key by filling in the Add SSH key form. You should copy your public key from the $HOME/.ssh/id_dsa.pub file on your machine into the "Key" field of the form. Keep your private key a secret.


install the Eclipse Git tool (EGit) in Streams Studio

To add the Eclipse Git tool (EGit) to Streams Studio, launch Studio and select 'Help > Install New Software ...' from the Eclipse menu. In the 'Work with' field of the 'Install' dialog, enter 'http://download.eclipse.org/egit/updates'. When package names appear below it, expand the 'Eclipse Git Team Provider' feature and select the 'Eclipse Git Team Provider' plugin, like this:

image

Then click through the remaining dialog panels, click 'Finish', and restart Eclipse when prompted. After Eclipse restarts, you should open the new 'Git' perspective to prepare for cloning an IBMStreams repository.


clone an IBMStreams repository

To clone a copy of an IBMStreams repository into Streams Studio on your machine, first go to its home page by clicking on its title at https://github.com/IBMStreams. On the repository's home page, copy its 'clone URL' to your clipboard by clicking this button:

image

Then, in Streams Studio, start the 'Clone Git Repository' dialog by clicking this button in the 'Git Repositories' view:

image

Most of the dialog's fields will be filled in automatically from the 'clone URL' in your clipboard, but you may need to select the SSH protocol, like this:

image

Click 'Next' twice in the dialog. When you reach its 'Local Destination' panel, select 'Import all projects after clone finishes', like this:

image

When you click 'Finish', EGit will copy the IBMStreams repository into your local machine, and then import all of the projects in the repository into your Eclipse workspace, and then keep track of all the changes you make to it. When you switch back from the Git perspective to the InfoSphere Streams perspective, you should see all of the projects imported from the repository in the 'Project Explorer' view.

Note: Git creates a directory named $HOME/git on your machine and keeps cloned repository directories and files there. EGit puts links to them in your Eclipse workspace. You may find this confusing if you are accustomed to looking at your Eclipse workspace directory with Linux tools such as Nautilus, but it all works out. That's the magic of Git.


Using Maven

Many of the IBMStreams toolkits have dependencies on other open source projects. To manage these dependencies, we use Apache Maven. Each of the toolkits that uses maven in this way has a pom.xml file listing the packages it depends on. For these toolkits, the toolkit build calls Apache Maven.

In order the the maven to be successfully invoked:

  1. Install Maven
  2. Once Maven is installed, before building, export M2_HOME="Mavin_Install_Location"

If you rely on StreamsStudio to build your toolkit, you'll generally need to manually run ant maven-deps in your toolkit directory (the one with pom.xml) to download the necessary files. They'll appear in opt/downloaded. You can then add those files to your StreamsStudio classpath.