Hoffman2:Job Array: Difference between revisions

From Center for Cognitive Neuroscience
Jump to navigation Jump to search
No edit summary
Line 41: Line 41:


===Example===
===Example===
<!--Why would anyone use this?  Here are some examples


====Lots of numbers====-->
====Lots of numbers====-->
Let's say you have a script, '''myFunc.sh''', that takes one numerical input and computes a bunch of values based on that input.  But you need to run <code>myFunc.sh</code> for input values 1 to 100.  One solution would be to write a wrapper script '''myFuncSlowWrapper.sh''' as
Let's see how job array can replace a loop which can run only in one node
 
  #!/bin/bash
  #!/bin/bash
  # myFuncSlowWrapper.sh
  # myFuncSlowWrapper.sh

Revision as of 21:30, 19 December 2019

Back to Hoffman2 Batch Mode

Job Array or Array jobs make it possible to process different subjects/files using the same script on multiple Hoffman2 working nodes at the same time.

Here, we use the this template code to show how it can be done Submit job

#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID
#$ -j y
#$ -pe shared 2
#$ -l h_rt=8:00:00,h_data=4G
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea
#$ -t 1-50:1

The only difference comparing with the single subject version is this part

 #$ -t 1-50:1

The -t option should be followed by a number range together with the step interval in the following format:

-t lower-upper:interval

where

lower
is replaced with the starting number
upper
is replaced with the ending number
interval
is replaced with the step interval

So adding the argument

-t 10-100:5

will step through the numbers 10, 15, 20, 25, ..., 100 submitting a job for each one.

There will be an environment variable called SGE_TASK_ID whose value will be incremented over the range you specified. Hoffman2 job scheduler will submit one job for each SGE_TASK_ID, so your work will be parallelized.


Example

====Lots of numbers====--> Let's see how job array can replace a loop which can run only in one node

#!/bin/bash
# myFuncSlowWrapper.sh
for i in {1..100};
do
    myFunc.sh $i;
done

The only drawback is that this will take quite a while since all 100 iterations will be done on a single processor. With job arrays, the computations will be split among many processors and can finish much more quickly. You would instead write a wrapper script called myFuncFastWrapper.sh as

#!/bin/bash
# myFuncFastWrapper.sh
echo $SGE_TASK_ID
myFunc.sh $SGE_TASK_ID

And submit it with

qsub -cwd -V -N PJ -l h_data=1024M,h_rt=01:00:00 -M $HOME -m bea -t 1-100:1 myFuncWrapper.sh