Hoffman2:Job Array: Difference between revisions
No edit summary |
No edit summary |
||
Line 38: | Line 38: | ||
There will be an [[Hoffman2:UNIX Tutorial#Environment Variables|environment variable]] called <code>SGE_TASK_ID</code> whose value will be incremented over the range you specified. Hoffman2 job scheduler will submit one job for each SGE_TASK_ID, so your work will be parallelized. | There will be an [[Hoffman2:UNIX Tutorial#Environment Variables|environment variable]] called <code>SGE_TASK_ID</code> whose value will be incremented over the range you specified. Hoffman2 job scheduler will submit one job for each SGE_TASK_ID, so your work will be parallelized. | ||
===Example=== | |||
<!--Why would anyone use this? Here are some examples | |||
====Lots of numbers====--> | |||
Let's say you have a script, '''myFunc.sh''', that takes one numerical input and computes a bunch of values based on that input. But you need to run <code>myFunc.sh</code> for input values 1 to 100. One solution would be to write a wrapper script '''myFuncSlowWrapper.sh''' as | |||
#!/bin/bash | |||
# myFuncSlowWrapper.sh | |||
for i in {1..100}; | |||
do | |||
myFunc.sh $i; | |||
done | |||
The only drawback is that this will take quite a while since all 100 iterations will be done on a single processor. With job arrays, the computations will be split among many processors and can finish much more quickly. You would instead write a wrapper script called '''myFuncFastWrapper.sh''' as | |||
#!/bin/bash | |||
# myFuncFastWrapper.sh | |||
echo $SGE_TASK_ID | |||
myFunc.sh $SGE_TASK_ID | |||
And submit it with | |||
qsub -cwd -V -N PJ -l h_data=1024M,h_rt=01:00:00 -M $HOME -m bea -t 1-100:1 myFuncWrapper.sh |
Revision as of 21:27, 19 December 2019
Job Array or Array jobs make it possible to process different subjects/files using the same script on multiple Hoffman2 working nodes at the same time.
Here, we use the this template code to show how it can be done Submit job
#!/bin/bash #$ -cwd # error = Merged with joblog #$ -o joblog.$JOB_ID #$ -j y #$ -pe shared 2 #$ -l h_rt=8:00:00,h_data=4G # Email address to notify #$ -M $USER@mail # Notify when #$ -m bea #$ -t 1-50:1
The only difference comparing with the single subject version is this part
#$ -t 1-50:1
The -t option should be followed by a number range together with the step interval in the following format:
-t lower-upper:interval
where
lower
- is replaced with the starting number
upper
- is replaced with the ending number
interval
- is replaced with the step interval
So adding the argument
-t 10-100:5
will step through the numbers 10, 15, 20, 25, ..., 100 submitting a job for each one.
There will be an environment variable called SGE_TASK_ID
whose value will be incremented over the range you specified. Hoffman2 job scheduler will submit one job for each SGE_TASK_ID, so your work will be parallelized.
Example
Let's say you have a script, myFunc.sh, that takes one numerical input and computes a bunch of values based on that input. But you need to run myFunc.sh
for input values 1 to 100. One solution would be to write a wrapper script myFuncSlowWrapper.sh as
#!/bin/bash # myFuncSlowWrapper.sh for i in {1..100}; do myFunc.sh $i; done
The only drawback is that this will take quite a while since all 100 iterations will be done on a single processor. With job arrays, the computations will be split among many processors and can finish much more quickly. You would instead write a wrapper script called myFuncFastWrapper.sh as
#!/bin/bash # myFuncFastWrapper.sh echo $SGE_TASK_ID myFunc.sh $SGE_TASK_ID
And submit it with
qsub -cwd -V -N PJ -l h_data=1024M,h_rt=01:00:00 -M $HOME -m bea -t 1-100:1 myFuncWrapper.sh