Skip to main content

Hello world pipeline

Even though BatchX is a platform for running batch jobs in the cloud and does not provide direct mechanisms to run pipelines, in these tutorial we will show how BatchX jobs can be orchestrated in the client side.

In this tutorial we will create a very simple pipeline that connects together two Docker Images that will run in BatchX:

  1. The tutorial/hello-world:1.0.0 that outputs a txt file.
  2. The tutorial/word-count:1.0.0 that will use the output generated by the hello-world image and count the number of words in that file.

This pipeline will be implemented in Python3 to show how quickly you can integrate your images and build workflows. However, any programming language can be used to build pipelines just as easily.

Requirements

Python3

Since the pipeline will be implemented in Python3 the first step is to verify you have it installed in your system, type the following command to find out.

$ python3 --version
Python 3.7.6

If you obtained a Python3 version as shown above you are OK, no action is required. If the command returned a command not found error you will need to run the installation.

Using yum type the following command.

sudo yum install -y python3

Using apt-get type the following command.

sudo apt-get update && sudo apt install -y python3

In MacOS use HomeBrew. If you don't have HomeBrew installed, follow the instructions described here.

brew install python3

hello-world image

You should probably have the hello-world image in your BatchX registry by now, it's a good idea to make sure though. Use the bx images command and check that one of the images displayed in the list is the batchx@tutorial/hello-world:1.0.0.

If you have not added the hello-world image to your registry yet, follow the instructions from the Quick start section.

word-count image

This image can be imported from Docker Hub, pointing to the Docker coordinates batchx/wordcount:1.0.0 as:

bx import batchx/wordcount:1.0.0

The image will be imported to your BatchX environment using the coordinates defined in the image manifest. According to the word-count manifest, the image will be stored in BatchX as tutorial/word-count:1.0.0.
Check the image has been correctly added to your account using the bx images command.

Source code

To build the hello-world pipeline you will need to create two files: a pipeline script and a configuration JSON file.

Create a separate directory to store these files, this folder can have any name you prefer. For this tutorial we will name it batchx-hw-pipeline. The final file structure should look like this:

batchx-hw-pipeline/
├── hello-world-pipeline.py
├── the-hulk.json

Pipeline Script

This script will be responsible of orchestrating the workflow, so it will contain the necessary code to:

  1. Parse the content of the input configuration JSON file.
  2. Configure and launch the first image (hello-world).
  3. Fetch the output message (lightweight) from the first image and use it as input for the second image (word-count). Notice that referenced files are not transferred from BatchX.
  4. Output the result of the workflow.
info

The pipeline script will launch images to run in BatchX. However, this script runs in your local system, not in BatchX. Therefore, It's not necessary to upload this script or the JSON file to BatchX.

Below is the code for the pipeline script. Copy its content and save it inside the batchx-hw-pipeline folder as hello-world-pipeline.py.

tip

Comments have been added at each step of this script. So even if you are not a Python programmer you can navigate through its content and understand what its happening.

#!/usr/bin/python3
import sys
import json
import subprocess

# Check if the correct input has been provided
if len(sys.argv) != 2 :
print("Only one argument is allowed: input JSON")
sys.exit(1)

# Load and parse the json configuration input file
with open(str(sys.argv[1]), "r") as inputFile:
inputJson = inputFile.read()
parsedJson = json.loads(inputJson)

# Parse yourName from the input json
yourName = parsedJson["yourName"]

try:
# Start of the first image: hello-world
# Define image coordinates
helloWorldImage = "tutorial/hello-world:1.0.0"
# Define run commmands using parameters from the input JSON file
helloWorldCommand = 'bx run ' + helloWorldImage + ' \'{"yourName":"' + yourName + '"}\''
# Print the run details to STDERR
print("Running " + helloWorldImage + " image:", file=sys.stderr)
print(str(helloWorldCommand), file=sys.stderr)
# Run the image and store STDOUT
helloWorldProcess = subprocess.run(helloWorldCommand, shell=True, stdout=subprocess.PIPE)
# Fetch the image output
helloWorldOutput = helloWorldProcess.stdout.decode('UTF-8')
# Load the output JSON content
helloWorldJson = json.loads(helloWorldOutput)
# Parse responseFile content from the output JSON
helloWorldResponseFile = helloWorldJson["responseFile"]
# End of the first image: hello-world

# Start of the second image: word-count
# Define image coordinates
wordCountImage = "tutorial/word-count:1.0.0"
# Define run commmands using the output file from the previous image
wordCountCommand = 'bx run ' + wordCountImage + ' \'{"textFile":"' + helloWorldResponseFile + '"}\''
# Print the run details to STDERR
print("Running " + wordCountImage + " image:", file=sys.stderr)
print(str(wordCountCommand), file=sys.stderr)
# Run the image and store STDOUT
wordCountProcess = subprocess.run(wordCountCommand, shell=True, stdout=subprocess.PIPE)
# Fetch the image output
wordCountOutput = wordCountProcess.stdout.decode('UTF-8')
# End of the second image: word-count

# Print Output Summary
print("Output from " + helloWorldImage + " image: " + helloWorldOutput)
print("Output from " + wordCountImage + " image: " + wordCountOutput)
except subprocess.CalledProcessError as e:
print(e)
exit(e.returncode)
except:
print("Unexpected error:", sys.exc_info()[0])
raise

Configuration JSON file

The configuration JSON file will be provided as input for the pipeline script. This file contains the parameters needed to run the images used by the pipeline.

For this example, the JSON file is very simple. Only one parameter is needed to run the pipeline.
As your pipelines gain more complexity they will also require more elaborated JSON configuration files. But for now we can keep things simple.

Copy the content below and save it as the-hulk.json in the batchx-hw-pipeline folder .

{"yourName":"The Hulk"}
tip

You are free to use any other format as input for your pipeline script. You might prefer using a YAML file or specify the parameters directly through the command line.
Choose the option you feel more comfortable with.

Running the pipeline

Now that you have created the script and the configuration JSON file you are ready to run your first pipeline in BatchX. Use the command below to launch this workflow.

python3 hello-world-pipeline.py the-hulk.json

Logs similar to the ones below will start showing on your screen, displaying updates on the status and on the outputs generated by each image.

Running tutorial/hello-world:1.0.0 image:
bx run tutorial/hello-world:1.0.0 '{"yourName":"The Hulk"}'
[batchx] [2020/04/08 16:00:55] Submitting job...
[batchx] [2020/04/08 16:00:56] Job submitted with id 1230
[batchx] [2020/04/08 16:00:56] Attaching to job 1230
[batchx] [2020/04/08 16:00:57] Job status: SUBMITTED
[batchx] [2020/04/08 16:01:12] Job status: STARTING
[batchx] [2020/04/08 16:01:12] Job status: DOWNLOADING_INPUT
[batchx] [2020/04/08 16:01:12] Job status: RUNNING
[batchx@tutorial/hello-world:1.0.0] [2020/04/08 16:01:16] Environment:
[batchx@tutorial/hello-world:1.0.0] [2020/04/08 16:01:16] environ({'HOSTNAME': 'batchx-container', 'PYTHON_PIP_VERSION': '19.1.1', 'SHLVL': '1', 'HOME': '/root', 'GPG_KEY': '0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D', 'PATH': '/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'BX_MEMORY': '1000', 'LANG': 'C.UTF-8', 'BX_GPUS': '0', 'PYTHON_VERSION': '3.6.8', 'BX_VCPUS': '1', 'PWD': '/batchx'})
[batchx] [2020/04/08 16:01:24] Job status: UPLOADING_OUTPUT
[batchx] [2020/04/08 16:01:25] Job status: SUCCEEDED
Running tutorial/word-count:1.0.0 image:
bx run tutorial/word-count:1.0.0 '{"textFile":"bx://jobs/1230/output/response/response.txt"}'
[batchx] [2020/04/08 16:01:25] Submitting job...
[batchx] [2020/04/08 16:01:26] Job submitted with id 1231
[batchx] [2020/04/08 16:01:26] Attaching to job 1231
[batchx] [2020/04/08 16:01:27] Job status: SUBMITTED
[batchx] [2020/04/08 16:01:34] Job status: STARTING
[batchx] [2020/04/08 16:01:34] Job status: DOWNLOADING_INPUT
[batchx] [2020/04/08 16:01:34] Job status: RUNNING
[batchx@tutorial/word-count:1.0.0] [2020/04/08 16:01:38] {'textFile': '/batchx/input/file0/response.txt'}
[batchx] [2020/04/08 16:01:45] Job status: UPLOADING_OUTPUT
[batchx] [2020/04/08 16:01:46] Job status: SUCCEEDED
Output from tutorial/hello-world:1.0.0 image: {"responseFile":"bx://jobs/1230/output/response/response.txt"}

Output from tutorial/word-count:1.0.0 image: {"wordCount":6}

If the execution was successful, the content of the last line should be the same as the one shown above, displaying the result of running this pipeline, a simple word count.

You can verify that the pipeline submitted two jobs to BatchX: one using the hello-world image and one using the word-count image. For this, use the bx jobs command as shown below.

$ bx jobs -a -l 2
ID IMAGE AGO USER BX-UNITS INPUT STATUS COST/H RUNTIME COST
1231 b@t/word-count:1.0.0 10 m david 1x1000x0 35 B SUCCEEDED 0.34 USD 12 s 0.01 USD
1230 b@t/hello-world:1.0.0 11 m david 1x1000x0 SUCCEEDED 0.34 USD 13 s 0.01 USD

Congratulations! You completed this tutorial!
You now know one easy way to build pipelines using Docker images that run in BatchX.
Now it's your turn to try and build your own pipelines!