Hoffman2:Job Array: Difference between revisions

From Center for Cognitive Neuroscience
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
[[Hoffman2:Batch_Mode|Back to Hoffman2 Batch Mode]]
[[Hoffman2:Batch_Mode|Back to Hoffman2 Batch Mode]]


Job Array or Array jobs make it possible to process different subjects/files using the same script on multiple Hoffman2 working nodes at the same time.
Jobarray makes it possible to process different subjects/files using the same script on multiple Hoffman2 working nodes at the same time.


Here, we use the this template code to show how it can be done
Here, we use the this template code to show how it can be done
'''[[Hoffman2:Submit_jobarray|Submit job]]'''
'''[[Hoffman2:Submit_jobarray|Submit jobarray]]'''


  #!/bin/bash
  #!/bin/bash
Line 57: Line 57:


===Example===
===Example===
In our sample code
'''[[Hoffman2:Submit_jobarray|Submit jobarray]]'''

Revision as of 21:38, 19 December 2019

Back to Hoffman2 Batch Mode

Jobarray makes it possible to process different subjects/files using the same script on multiple Hoffman2 working nodes at the same time.

Here, we use the this template code to show how it can be done Submit jobarray

#!/bin/bash
#$ -cwd
# error = Merged with joblog
#$ -o joblog.$JOB_ID
#$ -j y
#$ -pe shared 2
#$ -l h_rt=8:00:00,h_data=4G
# Email address to notify
#$ -M $USER@mail
# Notify when
#$ -m bea
#$ -t 1-50:1

The only difference comparing with the single subject version is this part

 #$ -t 1-50:1

The -t option should be followed by a number range together with the step interval in the following format:

-t lower-upper:interval

where

lower
is replaced with the starting number
upper
is replaced with the ending number
interval
is replaced with the step interval

So adding the argument

-t 10-100:5

will step through the numbers 10, 15, 20, 25, ..., 100 submitting a job for each one.

There will be an environment variable called SGE_TASK_ID whose value will be incremented over the range you specified. Hoffman2 job scheduler will submit one job for each SGE_TASK_ID, so your work will be parallelized.

When to use it?

Let's see how job array can replace a loop which can run only in one node.

#!/bin/bash
# myFuncSlowWrapper.sh
for i in {1..100};
do
    myFunc.sh $i;
done

With job arrays, the computations will be split among many processors and can finish much more quickly. You would instead write a wrapper script called myFuncFastWrapper.sh as

#!/bin/bash
# myFuncFastWrapper.sh
echo $SGE_TASK_ID
myFunc.sh $SGE_TASK_ID

Example

In our sample code Submit jobarray