Hoffman2:Job Array: Difference between revisions

Revision as of 23:34, 19 December 2019

Job array makes it possible to process different subjects/files using the same script on multiple Hoffman2 working nodes at the same time.

Here, we use the this template code to show how it can be done

#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID
#$ -j y
#$ -pe shared 2
#$ -l h_rt=8:00:00,h_data=4G
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea
#$ -t 1-50:1

The only difference comparing with the single subject version is

 #$ -t 1-50:1

The -t option should be followed by a lower number and a higher number range together with the step interval in the following format:

-t lower-upper:interval

where

lower: is replaced with the starting number
upper: is replaced with the ending number
interval: is replaced with the step interval

So adding the argument

-t 10-100:5

will step through the numbers 10, 15, 20, 25, ..., 100 submitting a job for each one.

There will be an environment variable called SGE_TASK_ID whose value will be incremented over the range you specified. Hoffman2 job scheduler will submit one job for each SGE_TASK_ID, so your work will be parallelized.

When to use it?

Let's see how job array can replace a loop which can run only in one node.

#!/bin/bash
# myFuncSlowWrapper.sh
for i in {1..100};
do
    myFunc.sh $i;
done

With job arrays, the computations will be split among many processors and can finish much more quickly. You would instead write a wrapper script called myFuncFastWrapper.sh as

#!/bin/bash
# myFuncFastWrapper.sh
echo $SGE_TASK_ID
myFunc.sh $SGE_TASK_ID

Example

In this sample code, each SGE_TASK_ID is the index of the array of subjects, so each job on different node knows which subject it should process.

# Set up the subjects list
declare -a subjects

subjects[1]="su3v3hkaykw2"
subjects[2]="wxg5mk5u5xbz"
subjects[3]="6q2bgkqu5grp"
subjects[4]="whjue68jmwyh"
subjects[5]="pfx3ju9wz8rr"

echo "This is sub-job $SGE_TASK_ID"
echo "This is subject ${subjects[$SGE_TASK_ID]}"

Call your script to process the subject

# Your script content goes here...
myFunc.sh  ${subjects[$SGE_TASK_ID]}

Here's another example code, which reads a list of subjects from a file into an array. In this way, there's no need to manually assign indexes to your subjects.

@@ Line 56: / Line 56: @@
 ===Example===
-In this [[Hoffman2:Submit_jobarray|sample code]], each SGE_TASK_ID is the index of the array of subjects, so each job knows which subject it should process.
+In this [[Hoffman2:Submit_jobarray|sample code]], each SGE_TASK_ID is the index of the array of subjects, so each job on different node knows which subject it should process.
   # Set up the subjects list

Hoffman2:Job Array: Difference between revisions

Revision as of 23:34, 19 December 2019

When to use it?

Example

Navigation menu

Search