Difference between revisions of "Submitting Matlab scripts to the SGE cluster"

From Center for Cognitive Neuroscience
Jump to: navigation, search
m (17 revisions)
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Submitting matlab jobs to the SGE cluster is a breeze, all that needs to be done is copying the script below and substituting runme6 with your particular matlab .m file.
+
Submitting matlab jobs to the SGE cluster is a breeze, just use the following command:
  
  #!/bin/sh
+
  matlab.q [-t HOURS] [-d MEMORY] [-m MESSAGING] MY_SCRIPT.m
#$ -cwd
 
#$ -j y
 
#$ -S /bin/sh
 
#$ -V
 
. /etc/profile
 
/Volumes/local/bin/matlab6 -nojvm -nosplash -nodisplay -r runme6
 
  
Explanation of script line by line:
+
; HOURS
 +
: (optional) This is the number of hours your script will take to run.  Choose a high number, but overestimating by too much can make you wait unnecessarily in the queue
 +
; MEMORY
 +
: (optional) This is the number of megabytes of memory you would like dedicated to your job (e.g. substituting a value of 1024 will give you 1 GB)
 +
; MESSAGING
 +
: (optional) Place the desired messaging flags here
 +
: b = beginning
 +
: e = end
 +
: a = abort
 +
: n = none
 +
: So "-m bea" would send a message to the email account on file for you with Hoffman.  A separate message will come when the job begins, when the job ends, and if the job was aborted.
 +
; MY_SCRIPT.m
 +
: The m-file you want to execute.
 +
; other options
 +
: Try typing "matlab.q" at the command prompt on Hoffman2 and use the Info page to see the other available options.
  
; #!/bin/sh
+
; Keep in mind that you can only run one of these at a time since it uses a full MATLAB license.  If you are already running MATLAB on Hoffman2 and try to submit a job with matlab.q, it will fail because you are already using a license.
: this is a sh script
 
; #$ -cwd
 
: before executing the script, change into the current directory
 
; #$ -j y
 
: the output and errors will be put in the same file (in your current directory)
 
; #$ -S /bin/sh
 
: SGE will use the /bin/sh shell
 
; #$ -V
 
: export the current environment to SGE
 
; . /etc/profile
 
: make sure all our paths are right
 
; /Volumes/local/bin/matlab6 -nojvm -nosplash -nodisplay -r runme6
 
: Use the matlab6 binary (change this to the version you desire/need), no gui, no splash screen, and no display used, run the runme6.m script found in the current directory (change this to your particular script.
 
  
=Wrappers for general script submission=
+
matlab.q will first try to compile your code and then run it. If the compilation fails, it will run it as a job on its own. To directly compile your code, see the instructions below.
A set of scripts has been created to ease the use of matlab & clustering. The available scripts are as follows:
 
;matsub6
 
:submits jobs using the matlab 6.5 version
 
;matsub7
 
:submits jobs using the matlab 7.0 version
 
;matsub75
 
:submits jobs using the matlab 7.5 version
 
  
==Usage==
 
$ matsub6 myAwesomeMatlabScript.m
 
  
Output will be placed in the current directory in a file called mat_log
+
=Submitting Multiple Jobs without Using Multiple Licenses (aka Compiling)=
 +
The key to this approach is compiling your code using the MATLAB Compiler (MCC) then running it using the queue. Note that the MATLAB Compiler has some limitations to it.
  
=Licensing and SGE Matlab Jobs=
+
*Each job must have a single associated .m file.
Please note that each job you submit to SGE will use up one of our available license seats. Submitting multiple jobs at once could use up all the seats so no one but you will have access to matlab.
+
*You CANNOT use the path() function inside the .m file.
  
Please be courteous to other matlab users and only submit one job at a time.
+
Commands to do it:
 +
qrsh
 +
module load matlab
 +
mcc -m my_m_file.m -R -singleCompThread -a top_path
 +
matexe.q my_m_file
 +
 
 +
Alternative commands:
 +
qrsh
 +
module load matlab
 +
mcc -m my_m_file.m -R -singleCompThread -a top_path
 +
matexe.q -ns my_m_file
 +
qsub my_m_file.cmd
 +
 
 +
;mcc -m my_m_file.m -a top_path
 +
:This will create an executable called my_m_file
 +
:top_path adds all files inside top_path and inside it's subfolders
 +
:Note that the -a option can make compilation take a long time
 +
::Consider using "-I folder1 -I folder2"
 +
::Note that folder1 and folder2 must contain ALL of the dependencies of the compiled function, excluding the default base that MATLAB gives you.
 +
:This step takes time, but don't try to parallelize it if you're compiling in the same directory. This will lead to errors.
 +
:-R -singleCompThread
 +
::This specifies that the job only needs one cpu to run instead of the default 8.
 +
::If you include this, each job will take more time but it won't need to wait for 8 cpus to be free on a single node.
 +
;matexe.q my_m_file
 +
:This submits to the queue and also does the "module load matlab" command before your script is executed.
 +
:This is the ONLY way to submit a mcc-compiled job. Other attempts to load the module manually will fail.
 +
:-t <hours>
 +
::This argument is 8 hours by default. [1,336] is allowed for highp.
 +
::If you think your job may take longer than 8 hours, you MUST include this or else the job will fail precisely at 8 hours.
 +
:-ns
 +
::This stands for not submit. This allows you to edit your .cmd file as you desire. This is particularly useful for job arrays.
 +
 
 +
;Without using -a:
 +
qrsh
 +
module load matlab
 +
mcc -m my_m_file.m -I folder1 -I folder2
 +
matexe.q my_m_file
 +
 
 +
;L33T tip:
 +
:As jobs are in the queue, request up to 8 cpus of qrsh sessions at a time. This ensures that you will get, at minimum, 8 cpus of processing power at any one time.
 +
:You are limited to 8x1cpu sessions or 1x8cpu session by ATS.
 +
:To request a single cpu, use "qrsh -l i,time=24:00:00".
 +
:Do this also to ensure that your compiled code works as you expect it to.
 +
 
 +
If you have more questions and/or want example scripts, ask [mailto:wesleytk@ucla.edu Wesley] and/or Hong-jing.

Latest revision as of 03:09, 16 January 2014

Submitting matlab jobs to the SGE cluster is a breeze, just use the following command:

matlab.q [-t HOURS] [-d MEMORY] [-m MESSAGING] MY_SCRIPT.m
HOURS
(optional) This is the number of hours your script will take to run. Choose a high number, but overestimating by too much can make you wait unnecessarily in the queue
MEMORY
(optional) This is the number of megabytes of memory you would like dedicated to your job (e.g. substituting a value of 1024 will give you 1 GB)
MESSAGING
(optional) Place the desired messaging flags here
b = beginning
e = end
a = abort
n = none
So "-m bea" would send a message to the email account on file for you with Hoffman. A separate message will come when the job begins, when the job ends, and if the job was aborted.
MY_SCRIPT.m
The m-file you want to execute.
other options
Try typing "matlab.q" at the command prompt on Hoffman2 and use the Info page to see the other available options.
Keep in mind that you can only run one of these at a time since it uses a full MATLAB license. If you are already running MATLAB on Hoffman2 and try to submit a job with matlab.q, it will fail because you are already using a license.

matlab.q will first try to compile your code and then run it. If the compilation fails, it will run it as a job on its own. To directly compile your code, see the instructions below.


Submitting Multiple Jobs without Using Multiple Licenses (aka Compiling)

The key to this approach is compiling your code using the MATLAB Compiler (MCC) then running it using the queue. Note that the MATLAB Compiler has some limitations to it.

  • Each job must have a single associated .m file.
  • You CANNOT use the path() function inside the .m file.

Commands to do it:

qrsh
module load matlab
mcc -m my_m_file.m -R -singleCompThread -a top_path
matexe.q my_m_file

Alternative commands:

qrsh
module load matlab
mcc -m my_m_file.m -R -singleCompThread -a top_path
matexe.q -ns my_m_file
qsub my_m_file.cmd
mcc -m my_m_file.m -a top_path
This will create an executable called my_m_file
top_path adds all files inside top_path and inside it's subfolders
Note that the -a option can make compilation take a long time
Consider using "-I folder1 -I folder2"
Note that folder1 and folder2 must contain ALL of the dependencies of the compiled function, excluding the default base that MATLAB gives you.
This step takes time, but don't try to parallelize it if you're compiling in the same directory. This will lead to errors.
-R -singleCompThread
This specifies that the job only needs one cpu to run instead of the default 8.
If you include this, each job will take more time but it won't need to wait for 8 cpus to be free on a single node.
matexe.q my_m_file
This submits to the queue and also does the "module load matlab" command before your script is executed.
This is the ONLY way to submit a mcc-compiled job. Other attempts to load the module manually will fail.
-t <hours>
This argument is 8 hours by default. [1,336] is allowed for highp.
If you think your job may take longer than 8 hours, you MUST include this or else the job will fail precisely at 8 hours.
-ns
This stands for not submit. This allows you to edit your .cmd file as you desire. This is particularly useful for job arrays.
Without using -a
qrsh
module load matlab
mcc -m my_m_file.m -I folder1 -I folder2
matexe.q my_m_file
L33T tip
As jobs are in the queue, request up to 8 cpus of qrsh sessions at a time. This ensures that you will get, at minimum, 8 cpus of processing power at any one time.
You are limited to 8x1cpu sessions or 1x8cpu session by ATS.
To request a single cpu, use "qrsh -l i,time=24:00:00".
Do this also to ensure that your compiled code works as you expect it to.

If you have more questions and/or want example scripts, ask Wesley and/or Hong-jing.