Hoffman2:Submitting Jobs: Difference between revisions

Revision as of 08:11, 15 March 2012

If you remember from Anatomy of the Computing Cluster, the Sun Grid Engine on Hoffman2 is the scheduler for all computing jobs. It takes your computing job request, considers what resources you are asking for and then puts your job in a line waiting for those resources to become available.

Ask for a simple 1GB of memory and a single computing core with a short time window, and your job will likely get placed at the front of the line and start running soon if not immediately. And for the vast majority of people, this will be the case.

Ask for a lot of memory or many computing cores, and your job will get put further back in the line because it will have to wait for more things to become available. If your job needs these types of resources, you are probably at a level where reading this tutorial isn't very helpful.

So how does one submit a computing job request? You've got some options:

job.q - Use a simple yet effective tool that ATS wrote. It has a great menu and walks you through submitting things.
qsub - Get under the hood and do it yourself. It can get messy but it can also be faster and you have more flexibility with options.

job.q

Once you've identified or written a script you'd like to run, SSH into Hoffman2 and enter job.q. Then it is just a matter of following its step-by-step instructions.

From the tool's main menu, you can type Info to read up about how to use it and we highly encourage you to do so.

But we know patience is a virtue that most of us aren't blessed with. So we'll walk you through submitting a basic job so you can hit the ground running.

Example

Once on Hoffman2, you'll need to edit one file so pull out your favorite text editor and edit the file
~/.queuerc
Add the line
set qqodir = ~/job-output
You've just set the default directory where your job command files will be created. Save the configuration file and close your text editor.
Make that directory using the command
mkdir ~/job-output
Now run
job.q
Press enter to acknowledge the message about some files that get created (READ IT FIRST THOUGH).
Type Build <ENTER> to begin creating an SGE command file.
The program now asks you which script you'd like to run, enter the following text to use our example script
/u/home/FMRI/apps/examples/qsub/gather.sh
The program now asks how much memory the job will need (in megabytes). This script is really simple, so let's go with the minimum and enter 64.
The program now asks how long will the job take (in hours). Go with the minimum 1 hour, it will complete in well less than this time.
The program now asks if your job should be limited to only your resource group's cores. Answer n because you do not need to be limiting yourself here.
Soon, the program will tell you that gather.sh.cmd has been built and saved.
When it asks you if you would like to submit your job, say no. Then type Quit to leave the program.
Now you should be able to run
ls ~/job-output

and see gather.sh.cmd. This file will stay there until you delete it and can be run over and over again. Saving a command file like this is especially useful if there is a task you'll be running over and over on Hoffman2.
The time has come to actually run the program (thought we'd never get to that, didn't you?). Start the job.q program again but this time use the Submit menu option.
When it asks you where the command file is, type
~/job-output/gather.sh.cmd

qsub

Everything that job.q did can be done on the command line.

Example

I have a script called gather.sh which can take a list of directories and aggregate the contents of a specific file in each directory into a single text file. This file actually exists and can be found in /u/home/FMRI/apps/examples/gather.sh

If it needs to go through a bunch of directories and the files are large, this would be a good job to submit to the queue. The command to do this would be:

qsub -cwd -V -N J1 -l express,time=0:05:00 /u/home/FMRI/apps/examples/gather.sh

And something like the following will be printed out:

Your job 1875395 ("J1") has been submitted

Where the number is your JOBID, a unique numerical identifier for your job.

Let's now break down the arguments in the script

-cwd

Change working directory

When your script runs, change the working directory to where you currently are in the filesystem.

e.g. If you were in the director /u/home/mscohen/data/ when you ran the command, the queue will change directories to that location and then execute the script you gave it. This means output and error directories will be placed here for that job.

-V: Exports all the environment variables in qsub to the context of the job. Useful if you passed a variable to the script.

-N J1: Names your job "J1." When you look at the queue, this will be the text that shows up in the "name" column. This will also be the beginning of the output (J1.o[JOBID]) and error (J1.e[JOBID]) files for your job.

-l

This is the resources flag meaning that the text immediately after it will ask for things like:

certain amount of memory (mem=1024MB)
certain number of processors (pe=8), or
certain length of time (time=HH:MM:SS)

@@ Line 3: / Line 3: @@
 If you remember from [[Hoffman2:Introduction#Sun Grid Engine|Anatomy of the Computing Cluster]], the Sun Grid Engine on Hoffman2 is the scheduler for all computing jobs.  It takes your computing job request, considers what resources you are asking for and then puts your job in a line waiting for those resources to become available.
-Ask for a lot of memory or many computing cores, and your job will get put further back in the line because it will have to wait for more things to become available.
+Ask for a simple 1GB of memory and a single computing core with a short time window, and your job will likely get placed at the front of the line and start running soon if not immediately.  And for the vast majority of people, this will be the case.
-Ask for a simple 1GB of memory and a single computing core with a short time window, and your job will likely get placed at the front of the line and start running soon if not immediately.
+Ask for a lot of memory or many computing cores, and your job will get put further back in the line because it will have to wait for more things to become available.  If your job needs these types of resources, you are probably at a level where reading this tutorial isn't very helpful.
 So how does one submit a computing job request?  You've got some options:
-# job.q - Use the wonderful tool that ATS wrote, it has a great menu and walks you through everything
+# job.q - Use a simple yet effective tool that ATS wrote.  It has a great menu and walks you through submitting things.
-# qsub - Get under the hood and do things yourself, it can get messy but it can also be faster.
+# qsub - Get under the hood and do it yourself.  It can get messy but it can also be faster and you have more flexibility with options.
 ==job.q==
-On the command line, enter <code>job.q</code> and follow its step-by-step instructions.
+Once you've identified or written a script you'd like to run, [[Hoffman2:Accessing the Cluster#SSH - Command Line|SSH into Hoffman2]] and enter <code>job.q</code>.  Then it is just a matter of following its step-by-step instructions.
+From the tool's main menu, you can type ''Info'' to read up about how to use it and we highly encourage you to do so.
+But we know patience is a virtue that most of us aren't blessed with.  So we'll walk you through submitting a basic job so you can hit the ground running.
+===Example===
+# Once on Hoffman2, you'll need to edit one file so pull out your favorite [[Text Editors|text editor]] and edit the file
+#: <code>~/.queuerc</code>
+# Add the line
+#: <code>set qqodir = ~/job-output</code>
+# You've just set the default directory where your job command files will be created. Save the configuration file and close your text editor.
+# Make that directory using the command
+#: <code>mkdir ~/job-output</code>
+# Now run
+#:<code>job.q</code>
+# Press enter to acknowledge the message about some files that get created (READ IT FIRST THOUGH).
+# Type ''Build <ENTER>'' to begin creating an SGE command file.
+# The program now asks you which script you'd like to run, enter the following text to use our example script
+#: /u/home/FMRI/apps/examples/qsub/gather.sh
+# The program now asks how much memory the job will need (in [http://en.wikipedia.org/wiki/Megabyte megabytes]).  This script is really simple, so let's go with the minimum and enter ''64''.
+# The program now asks how long will the job take (in hours). Go with the minimum 1 hour, it will complete in well less than this time.
+# The program now asks if your job should be limited to only your resource group's cores. Answer ''n'' because you do not need to be limiting yourself here.
+# Soon, the program will tell you that ''gather.sh.cmd'' has been built and saved.
+# When it asks you if you would like to submit your job, say no.  Then type ''Quit'' to leave the program.
+# Now you should be able to run
+#: <code>ls ~/job-output</code>
+#: and see ''gather.sh.cmd''.  This file will stay there until you delete it and can be run over and over again.  Saving a command file like this is especially useful if there is a task you'll be running over and over on Hoffman2.
+# The time has come to actually run the program (thought we'd never get to that, didn't you?).  Start the <code>job.q</code> program again but this time use the ''Submit'' menu option.
+# When it asks you where the command file is, type
+#: <code>~/job-output/gather.sh.cmd</code>
+#:
@@ Line 18: / Line 52: @@
 Everything that job.q did can be done on the command line.
-===EXAMPLE===
+===Example===
 I have a script called ''gather.sh'' which can take a list of directories and aggregate the contents of a specific file in each directory into a single text file.  This file actually exists and can be found in <code>/u/home/FMRI/apps/examples/gather.sh</code>

Hoffman2:Submitting Jobs: Difference between revisions

Revision as of 08:11, 15 March 2012

Contents

job.q

Example

qsub

Example

Navigation menu

Hoffman2:Submitting Jobs: Difference between revisions

Revision as of 08:11, 15 March 2012

job.q

Example

qsub

Example

Navigation menu

Search