Hoffman2:R: Difference between revisions
No edit summary |
No edit summary |
||
(12 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
[[Hoffman2|Back to all things Hoffman2]] | [[Hoffman2|Back to all things Hoffman2]] | ||
R is a great statistics and graphics tool. Here's how to run it on the cluster. See the official info from | R is a great statistics and graphics tool. Here's how to run it on the cluster. See the official info from IDRE [https://www.hoffman2.idre.ucla.edu/software/r/ here]. | ||
Line 8: | Line 8: | ||
# On the cluster, [[Hoffman2:Interactive Sessions|check out an interactive node]]. | # On the cluster, [[Hoffman2:Interactive Sessions|check out an interactive node]]. | ||
# Execute the following so the node knows how to speak R | # Execute the following so the node knows how to speak R | ||
#: <pre>$ module load R | #: <pre>$ module load R</pre> | ||
# Execute | # Execute | ||
#: <pre>$ R</pre> | #: <pre>$ R</pre> | ||
# And you'll now be in an interactive R session. If you have no idea what to do with R, we suggest looking [http://www.r-project.org/ here]. To see all the installed packages, execute | # And you'll now be in an interactive R session. If you have no idea what to do with R, we suggest looking [http://www.r-project.org/ here]. To see all the installed packages, execute | ||
#: <pre>> library()</pre> | #: <pre>> library()</pre> | ||
# To use packages, we must install them first | |||
#: <pre>install.packages("tidyverse", dependencies=TRUE)</pre> | |||
# Then load the packages | |||
#: <pre>Library(tidyverse)</pre> | |||
==Batch== | ==Batch== | ||
# On the cluster, [[Hoffman2:Interactive Sessions|check out an interactive node]]. | # On the cluster, [[Hoffman2:Interactive Sessions|check out an interactive node]]. | ||
# Execute the following so the node | # Execute the following so the node knows how to speak R | ||
#: <pre> $ module load R | #: <pre> $ module load R</pre> | ||
# Execute an R script using the following commands | # Execute an R script using the following commands | ||
#: <pre> $ R CMD BATCH /path/to/R/script /path/to/output/file</pre> | #: <pre> $ R CMD BATCH /path/to/R/script /path/to/output/file</pre> | ||
Line 27: | Line 29: | ||
; <code>/path/to/output/file</code> | ; <code>/path/to/output/file</code> | ||
: This argument is optional. If you don't specify this argument, the output will be dumped into a file in the current working directory named for the script run with "out" appended to it. (e.g. if you ran the script <code>sampleRscript.R</code>, the generic output file would be named <code>sampleRscript.Rout</code>) | : This argument is optional. If you don't specify this argument, the output will be dumped into a file in the current working directory named for the script run with "out" appended to it. (e.g. if you ran the script <code>sampleRscript.R</code>, the generic output file would be named <code>sampleRscript.Rout</code>) | ||
Line 50: | Line 51: | ||
# Type ''Build <ENTER>'' to begin creating an SGE command file. | # Type ''Build <ENTER>'' to begin creating an SGE command file. | ||
# The program now asks you which script you'd like to run, enter the following text to use our example script | # The program now asks you which script you'd like to run, enter the following text to use our example script | ||
#: <pre>/u/ | #: <pre>/u/project/CCN/apps/examples/qsub/sampleR.R</pre> | ||
# The program now asks how much memory the job will need (in [http://en.wikipedia.org/wiki/Megabyte Megabytes]). This script is really simple, but go ahead with the default value. | # The program now asks how much memory the job will need (in [http://en.wikipedia.org/wiki/Megabyte Megabytes]). This script is really simple, but go ahead with the default value. | ||
# The program now asks how long will the job take (in hours). Go with the minimum 1 hour; it will complete in much less than this. | # The program now asks how long will the job take (in hours). Go with the minimum 1 hour; it will complete in much less than this. | ||
Line 71: | Line 72: | ||
#: ''R.joblog.[JOBID]'' | #: ''R.joblog.[JOBID]'' | ||
#:: This file has all the details about when, where, and how your job was processed. Useful information if you are going to be running this job over and over and need to fine tune the resources it uses. | #:: This file has all the details about when, where, and how your job was processed. Useful information if you are going to be running this job over and over and need to fine tune the resources it uses. | ||
# Better ways of checking on your job can be found [[Hoffman2:Monitoring Jobs|here]] | # Better ways of checking on your job can be found [[Hoffman2:Monitoring Jobs|here]]. | ||
# The script you ran is an example taken from [http://www.mayin.org/ajayshah/KB/R/html/b1.html] which we found by Googling "R example scripts." | # The script you ran is an example taken from [http://www.mayin.org/ajayshah/KB/R/html/b1.html] which we found by Googling "R example scripts." | ||
# Finally, go check the inbox of the email you used to sign up for your Hoffman2 account. There will be two emails from "root@mail.hoffman2.idre.ucla.edu" that indicate when the job was started and when the job was completed. This is one of the neat features of the queue so that you can be alerted about the progress of your job without having to stay logged into Hoffman2 and checking on it constantly. | # Finally, go check the inbox of the email you used to sign up for your Hoffman2 account. There will be two emails from "root@mail.hoffman2.idre.ucla.edu" that indicate when the job was started and when the job was completed. This is one of the neat features of the queue so that you can be alerted about the progress of your job without having to stay logged into Hoffman2 and checking on it constantly. | ||
Line 78: | Line 79: | ||
You could also make a shell script that contains | You could also make a shell script that contains | ||
#!/bin/bash | #!/bin/bash | ||
module load R | module load R | ||
R CMD BATCH /path/to/R/script | R CMD BATCH /path/to/R/script | ||
and submit this shell script using [[Hoffman2:Submitting Jobs#qsub|qsub]] or [[Hoffman2:Submitting Jobs#q.sh|q.sh]] to achieve similar results. | and submit this shell script using [[Hoffman2:Submitting Jobs#qsub|qsub]] or [[Hoffman2:Submitting Jobs#q.sh|q.sh]] to achieve similar results. | ||
==Different Versions== | |||
Different versions of R are maintained on Hoffman2. To see which versions are installed, use the command | |||
module available R | |||
To load a specific version, use the command | |||
module load R/<version> | |||
where you replace <version> with the numerical version name e.g. | |||
module load R/3.6.1 | |||
will load version 3.6.1 | |||
==RStudio== | |||
RStudio, an integrated development environment (IDE), is also available to users interested in working with additional software tools when running their analysis on the cluster. | |||
To get started with the latest version of RStudio, execute the following: | |||
$ module load anaconda3 | |||
$ source $CONDA_DIR/etc/profile.d/conda.sh | |||
$ condo activate rstudio | |||
$ rstudio | |||
The RStudio GUI should then appear on the screen. | |||
==Shared Libraries== | |||
On Hoffman, users and groups do not have the permission to download packages directly to the installation folder. R libraries may be managed using a strategy that combines common and user libraries. | |||
Common libraries allow for all users access to the bare bone software and packages without having to make individual copies for users. This facilitates the management of the software by administrators while saving space on the cluster. | |||
So why not just allow anyone to download packages into the common library? That's a bit tricky to do. If anyone were allowed to download packages, then packages would be constantly changing and updating, and it would be difficult to maintain consistency across the lifespan of a project. | |||
===Creating a Group Library=== | |||
But what if you're working in a group? or using different versions of libraries between projects? | |||
Users can choose to create a group or project library by defining their library paths: | |||
> .libPaths() | |||
Issuing the statement above in the R command prompt will output a list of directories where R automatically searches for libraries. | |||
===Define a New Library Path=== | |||
In order to create a shared library, first, determine the new location and create an R/<RVERSION> directory to store the libraries: | |||
$ mkdir -p /u/project/<USERGROUP>/apps/R/3.6.0 | |||
====Rprofile==== | |||
Within the new library path, create a file called Rprofile that contains the following statement: | |||
.libPaths(c(paste("/u/project/<USERGROUP>/apps/R",R.version$major,".",R.version$minor,sep=""), .libPaths())) | |||
For each user in your group to begin using the new library, they should issue the following in their terminal: | |||
$ cat /u/project/<USERGROUP>/apps/R/Rprofile >> $HOME/.Rprofile | |||
The same can be done for a Rprofile configuration file located within a project folder. This would be a good place to define any project-specific setting: | |||
$ cat /u/project/<USERGROUP>/apps/R/Rprofile >> u/project/<USERGROUP>/<PROJECT>/.Rprofile | |||
Verify the new R library location by issuing the following in the R command prompt: | |||
> .libPaths() | |||
The new library path should appear in the output as such: | |||
[1] "/u/project/<USERGROUP>/apps/R/3.6.0" | |||
==External Links== | ==External Links== | ||
*[ | *[https://www.hoffman2.idre.ucla.edu/software/r/ R on Hoffman2, official description] | ||
*[http://www.r-project.org/ The R Project] | *[http://www.r-project.org/ The R Project] | ||
*[http://www.mayin.org/ajayshah/KB/R/html/b1.html Example R Script] | *[http://www.mayin.org/ajayshah/KB/R/html/b1.html Example R Script] | ||
*[http://stats.idre.ucla.edu/stat/data/intro_r/intro_r_interactive_flat.html#rstudio-console RStudio Console] |
Latest revision as of 20:12, 22 November 2021
R is a great statistics and graphics tool. Here's how to run it on the cluster. See the official info from IDRE here.
Interactively
- On the cluster, check out an interactive node.
- Execute the following so the node knows how to speak R
$ module load R
- Execute
$ R
- And you'll now be in an interactive R session. If you have no idea what to do with R, we suggest looking here. To see all the installed packages, execute
> library()
- To use packages, we must install them first
install.packages("tidyverse", dependencies=TRUE)
- Then load the packages
Library(tidyverse)
Batch
- On the cluster, check out an interactive node.
- Execute the following so the node knows how to speak R
$ module load R
- Execute an R script using the following commands
$ R CMD BATCH /path/to/R/script /path/to/output/file
/path/to/R/script
- This argument is necessary because it is the file you are running
/path/to/output/file
- This argument is optional. If you don't specify this argument, the output will be dumped into a file in the current working directory named for the script run with "out" appended to it. (e.g. if you ran the script
sampleRscript.R
, the generic output file would be namedsampleRscript.Rout
)
Job
R.q
Similar to job.q, there is an R.q for building command files for jobs that use R. It's a fairly simple step-by-step program that will guide you through making an SGE command file.
But for the less than patient, we'll run through an example case now.
Example
- Once on Hoffman2, you'll need to edit one file so pull out your favorite text editor and edit the file
~/.queuerc
- Add the line (if it isn't already there)
set qqodir = ~/job-output
- You've just set the default directory where your job command files will be created. Save the configuration file and close your text editor.
- Make that directory using the command
$ mkdir ~/job-output
- Now execute
$ R.q
- Press enter to acknowledge the message that appears (READ IT FIRST THOUGH).
- Type Build <ENTER> to begin creating an SGE command file.
- The program now asks you which script you'd like to run, enter the following text to use our example script
/u/project/CCN/apps/examples/qsub/sampleR.R
- The program now asks how much memory the job will need (in Megabytes). This script is really simple, but go ahead with the default value.
- The program now asks how long will the job take (in hours). Go with the minimum 1 hour; it will complete in much less than this.
- The program now asks if your job should be limited to only your resource group's cores. Answer n because you do not need to be limiting yourself here and the job is not going to be running for more than 24 hours.
- Soon, the program will tell you that sampleR.cmd has been built and saved.
- When it asks you if you would like to submit your job, say no. Then type Quit <ENTER> to leave the program.
- Now you should be able to run
ls ~/job-output
- and see R.cmd. This file will stay there until you delete it and can be run over and over again. Making a command file like this is especially useful if there is a task you'll be running repeatedly on Hoffman2. But if this is something you only need to run once, you should delete the file so you don't needlessly approach your quota.
- The time has come to actually run the program (thought we'd never get to that, didn't you?). Type
$ qsub job-output/R.cmd
- and after hitting enter, a message similar to this will pop up:
Your job 1882940 ("R.cmd") has been submitted
- where the number is your JobID, a unique numerical identifier for the computer job you have submitted to the queue.
- Now you can check if the job has finished running by doing
$ ls ~/job-output
- When two files named R.out.[JOBID] and R.joblog.[JOBID] (where JOBID is your job's unique identifier) appear, your job has run.
- R.out.[JOBID]
- This file has all the standard output generated by your script.
- R.joblog.[JOBID]
- This file has all the details about when, where, and how your job was processed. Useful information if you are going to be running this job over and over and need to fine tune the resources it uses.
- R.out.[JOBID]
- Better ways of checking on your job can be found here.
- The script you ran is an example taken from [1] which we found by Googling "R example scripts."
- Finally, go check the inbox of the email you used to sign up for your Hoffman2 account. There will be two emails from "root@mail.hoffman2.idre.ucla.edu" that indicate when the job was started and when the job was completed. This is one of the neat features of the queue so that you can be alerted about the progress of your job without having to stay logged into Hoffman2 and checking on it constantly.
By hand
You could also make a shell script that contains
#!/bin/bash module load R R CMD BATCH /path/to/R/script
and submit this shell script using qsub or q.sh to achieve similar results.
Different Versions
Different versions of R are maintained on Hoffman2. To see which versions are installed, use the command
module available R
To load a specific version, use the command
module load R/<version>
where you replace <version> with the numerical version name e.g.
module load R/3.6.1
will load version 3.6.1
RStudio
RStudio, an integrated development environment (IDE), is also available to users interested in working with additional software tools when running their analysis on the cluster.
To get started with the latest version of RStudio, execute the following:
$ module load anaconda3 $ source $CONDA_DIR/etc/profile.d/conda.sh $ condo activate rstudio $ rstudio
The RStudio GUI should then appear on the screen.
On Hoffman, users and groups do not have the permission to download packages directly to the installation folder. R libraries may be managed using a strategy that combines common and user libraries.
Common libraries allow for all users access to the bare bone software and packages without having to make individual copies for users. This facilitates the management of the software by administrators while saving space on the cluster.
So why not just allow anyone to download packages into the common library? That's a bit tricky to do. If anyone were allowed to download packages, then packages would be constantly changing and updating, and it would be difficult to maintain consistency across the lifespan of a project.
Creating a Group Library
But what if you're working in a group? or using different versions of libraries between projects?
Users can choose to create a group or project library by defining their library paths:
> .libPaths()
Issuing the statement above in the R command prompt will output a list of directories where R automatically searches for libraries.
Define a New Library Path
In order to create a shared library, first, determine the new location and create an R/<RVERSION> directory to store the libraries:
$ mkdir -p /u/project/<USERGROUP>/apps/R/3.6.0
Rprofile
Within the new library path, create a file called Rprofile that contains the following statement:
.libPaths(c(paste("/u/project/<USERGROUP>/apps/R",R.version$major,".",R.version$minor,sep=""), .libPaths()))
For each user in your group to begin using the new library, they should issue the following in their terminal:
$ cat /u/project/<USERGROUP>/apps/R/Rprofile >> $HOME/.Rprofile
The same can be done for a Rprofile configuration file located within a project folder. This would be a good place to define any project-specific setting:
$ cat /u/project/<USERGROUP>/apps/R/Rprofile >> u/project/<USERGROUP>/<PROJECT>/.Rprofile
Verify the new R library location by issuing the following in the R command prompt:
> .libPaths()
The new library path should appear in the output as such:
[1] "/u/project/<USERGROUP>/apps/R/3.6.0"