Hoffman2:Job Array: Difference between revisions
No edit summary |
|||
Line 56: | Line 56: | ||
===Example=== | ===Example=== | ||
In this [[Hoffman2:Submit_jobarray|sample code]], each SGE_TASK_ID is the index of the array of subjects, so each job knows which subject it should process. | In this [[Hoffman2:Submit_jobarray|sample code]], each SGE_TASK_ID is the index of the array of subjects, so each job on different node knows which subject it should process. | ||
# Set up the subjects list | # Set up the subjects list |
Revision as of 23:34, 19 December 2019
Job array makes it possible to process different subjects/files using the same script on multiple Hoffman2 working nodes at the same time.
Here, we use the this template code to show how it can be done
#!/bin/bash #$ -cwd # error = Merged with joblog #$ -o joblog.$JOB_ID #$ -j y #$ -pe shared 2 #$ -l h_rt=8:00:00,h_data=4G # Email address to notify #$ -M $USER@mail # Notify when #$ -m bea #$ -t 1-50:1
The only difference comparing with the single subject version is
#$ -t 1-50:1
The -t option should be followed by a lower number and a higher number range together with the step interval in the following format:
-t lower-upper:interval
where
lower
- is replaced with the starting number
upper
- is replaced with the ending number
interval
- is replaced with the step interval
So adding the argument
-t 10-100:5
will step through the numbers 10, 15, 20, 25, ..., 100 submitting a job for each one.
There will be an environment variable called SGE_TASK_ID
whose value will be incremented over the range you specified. Hoffman2 job scheduler will submit one job for each SGE_TASK_ID, so your work will be parallelized.
When to use it?
Let's see how job array can replace a loop which can run only in one node.
#!/bin/bash # myFuncSlowWrapper.sh for i in {1..100}; do myFunc.sh $i; done
With job arrays, the computations will be split among many processors and can finish much more quickly. You would instead write a wrapper script called myFuncFastWrapper.sh as
#!/bin/bash # myFuncFastWrapper.sh echo $SGE_TASK_ID myFunc.sh $SGE_TASK_ID
Example
In this sample code, each SGE_TASK_ID is the index of the array of subjects, so each job on different node knows which subject it should process.
# Set up the subjects list declare -a subjects subjects[1]="su3v3hkaykw2" subjects[2]="wxg5mk5u5xbz" subjects[3]="6q2bgkqu5grp" subjects[4]="whjue68jmwyh" subjects[5]="pfx3ju9wz8rr" echo "This is sub-job $SGE_TASK_ID" echo "This is subject ${subjects[$SGE_TASK_ID]}"
Call your script to process the subject
# Your script content goes here... myFunc.sh ${subjects[$SGE_TASK_ID]}
Here's another example code, which reads a list of subjects from a file into an array.
In this way, there's no need to manually assign indexes to your subjects.