Guest demo
This tutorial will guide you through the steps needed for running a bioinformatic image in the guest
account.
CLI installation
Once you have received a guest
access token from Batchx support, please follow the steps described at Installation.
When asked for the user name press enter, so you can then enter the provided token.
Enter user name (empty to enter token):
Enter token:
The token allows you to connect to the guest
user environment.
Verify identity
Before we start with the demo please verify your identity by running bx whoami
. You should see something similar to:
$ bx whoami
User id: guest
Name: Guest user
Email: support@batchx.io
Created: Fri Sep 18 14:39:22 CEST 2020
Demo
In this tutorial we will be running bwa mem
, a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. This tool has already been imported into BatchX and is offered as part of our bioinformatic tools catalogue.
BatchX image
Search for the bioinformatics/bwa/mem
image in the BatchX catalogue by using the bx images
command as follows:
bx images -e=batchx
This will display the whole list of ready-to-use images that BatchX provides for its users.
You can limit the output searching for bwa/mem
by using grep:
$ bx images -e=batchx | grep bwa/mem
batchx@bioinformatics/bwa/mem:1.2.1 7 weeks ago 487.2 MB
Now, let's get that image into the guest
environment by running the bx clone
command:
$ bx clone batchx@bioinformatics/bwa/mem
$ bx images
IMAGE CREATED SIZE
batchx@bioinformatics/bwa/mem:1.2.1 7 weeks ago 487.2 MB
The above command returned the highest version that have been created for the bioinformatics/bwa/mem
image. Copy the coordinates for the 1.2.1
version ( latest version at the time) and use the bx image
command to see its details:
bx image batchx@bioinformatics/bwa/mem:1.2.1
This should have displayed the contents of the bioinformatics/bwa/mem
manifest, including a description of the inputs and outputs associated with this image. Read more about the contents of the manifest here.
Input data
Now that we have inspected the details of the bioinformatics/bwa/mem
image, use the bx ls
command to see the files we are using for this example, which are going to be passed to this image when launching the job.
bx ls
The previous command displayed the contents of the file system at the root level. You should see a directory named readonly
. Use the bx tree
command to display the contents of this directory as an depth-indented list of files:
$ bx tree readonly
readonly
├── bwa
│ └── hg38
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.dict
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.amb
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.ann
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.bwt
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.dict
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.fai
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.gzi
│ ├── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.pac
│ └── Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.sa
└── fastqs
├── sample.R1.fq.gz
└── sample.R2.fq.gz
info
readonly
here is just the name of a folder (that happens to be read-only for the provided access token)
Submit job
Now that we have examined the image and the inputs we are using for this tutorial it’s time to submit a job to BatchX. For this, we will use the bx submit
command as follows:
bx submit -n batchx@bioinformatics/bwa/mem:1.2.1 '{"fastqFileR1":"readonly/fastqs/sample.R1.fq.gz","fastqFileR2":"readonly/fastqs/sample.R2.fq.gz","refFolder":"readonly/bwa/hg38","refBaseName":"Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz","outputBamName":"sample.bam"}'
You have now submitted a job to BatchX to “align query sequences using bwa-mem”, a very common task performed in bioinformatics when dealing with NGS (Next Generation Sequencing) data.
Job details
Use the bx jobs
command to see the list of active jobs submitted to BatchX. Feel free to check the extended details for this command.
bx jobs
You can also use the bx attach
command to receive live streams from this job. This command requires specifying the id of the job you want to attach to (returned by the submit
command)
bx attach <job-id>
info
If this is not the first job you submit type bx jobs -al 1
to see the id of the last submitted job, which should correspond to the job from this tutorial.
Ultimately, if everything went well, the job will finish with a Job status: SUCCEEDED
message and an additional line displaying the path to the output file.
In a real bioinformatics scenario you could use this output file as part of an analysis workflow to identify chromosome alterations, point mutations or other types of variants.
Continue exploring the details of this job with the bx job
and the bx logs
command.