Hoffman2:R

From Center for Cognitive Neuroscience
Revision as of 20:32, 24 November 2020 by Hwang (talk | contribs) (→‎Interactively)
Jump to navigation Jump to search

Back to all things Hoffman2

R is a great statistics and graphics tool. Here's how to run it on the cluster. See the official info from IDRE here.


Interactively

  1. On the cluster, check out an interactive node.
  2. Execute the following so the node knows how to speak R
    $ module load R
  3. Execute
    $ R
  4. And you'll now be in an interactive R session. If you have no idea what to do with R, we suggest looking here. To see all the installed packages, execute
    > library()
  5. To use packages, we must install them first
    install.packages("tidyverse", dependencies=TRUE)
  6. Then load the packages
    Library(tidyverse)

Batch

  1. On the cluster, check out an interactive node.
  2. Execute the following so the node knows how to speak R
     $ module load R
  3. Execute an R script using the following commands
     $ R CMD BATCH /path/to/R/script /path/to/output/file
/path/to/R/script
This argument is necessary because it is the file you are running
/path/to/output/file
This argument is optional. If you don't specify this argument, the output will be dumped into a file in the current working directory named for the script run with "out" appended to it. (e.g. if you ran the script sampleRscript.R, the generic output file would be named sampleRscript.Rout)


Job

R.q

Similar to job.q, there is an R.q for building command files for jobs that use R. It's a fairly simple step-by-step program that will guide you through making an SGE command file.

But for the less than patient, we'll run through an example case now.

Example

  1. Once on Hoffman2, you'll need to edit one file so pull out your favorite text editor and edit the file
    ~/.queuerc
  2. Add the line (if it isn't already there)
    set qqodir = ~/job-output
  3. You've just set the default directory where your job command files will be created. Save the configuration file and close your text editor.
  4. Make that directory using the command
    $ mkdir ~/job-output
  5. Now execute
    $ R.q
  6. Press enter to acknowledge the message that appears (READ IT FIRST THOUGH).
  7. Type Build <ENTER> to begin creating an SGE command file.
  8. The program now asks you which script you'd like to run, enter the following text to use our example script
    /u/project/CCN/apps/examples/qsub/sampleR.R
  9. The program now asks how much memory the job will need (in Megabytes). This script is really simple, but go ahead with the default value.
  10. The program now asks how long will the job take (in hours). Go with the minimum 1 hour; it will complete in much less than this.
  11. The program now asks if your job should be limited to only your resource group's cores. Answer n because you do not need to be limiting yourself here and the job is not going to be running for more than 24 hours.
  12. Soon, the program will tell you that sampleR.cmd has been built and saved.
  13. When it asks you if you would like to submit your job, say no. Then type Quit <ENTER> to leave the program.
  14. Now you should be able to run
    ls ~/job-output
    and see R.cmd. This file will stay there until you delete it and can be run over and over again. Making a command file like this is especially useful if there is a task you'll be running repeatedly on Hoffman2. But if this is something you only need to run once, you should delete the file so you don't needlessly approach your quota.
  15. The time has come to actually run the program (thought we'd never get to that, didn't you?). Type
    $ qsub job-output/R.cmd
    and after hitting enter, a message similar to this will pop up:
    Your job 1882940 ("R.cmd") has been submitted
    where the number is your JobID, a unique numerical identifier for the computer job you have submitted to the queue.
  16. Now you can check if the job has finished running by doing
    $ ls ~/job-output
  17. When two files named R.out.[JOBID] and R.joblog.[JOBID] (where JOBID is your job's unique identifier) appear, your job has run.
    R.out.[JOBID]
    This file has all the standard output generated by your script.
    R.joblog.[JOBID]
    This file has all the details about when, where, and how your job was processed. Useful information if you are going to be running this job over and over and need to fine tune the resources it uses.
  18. Better ways of checking on your job can be found here.
  19. The script you ran is an example taken from [1] which we found by Googling "R example scripts."
  20. Finally, go check the inbox of the email you used to sign up for your Hoffman2 account. There will be two emails from "root@mail.hoffman2.idre.ucla.edu" that indicate when the job was started and when the job was completed. This is one of the neat features of the queue so that you can be alerted about the progress of your job without having to stay logged into Hoffman2 and checking on it constantly.

By hand

You could also make a shell script that contains

#!/bin/bash
module load R
R CMD BATCH /path/to/R/script

and submit this shell script using qsub or q.sh to achieve similar results.


Different Versions

Different versions of R are maintained on Hoffman2. To see which versions are installed, use the command

module available R

To load a specific version, use the command

module load R/<version>

where you replace <version> with the numerical version name e.g.

module load R/3.6.1

will load version 3.6.1


RStudio

RStudio, an integrated development environment (IDE), is also available to users interested in working with additional software tools when running their analysis on the cluster.

To get started with the latest version of RStudio, execute the following:

$ module load anaconda3
$ source $CONDA_DIR/etc/profile.d/conda.sh
$ condo activate rstudio
$ rstudio

The RStudio GUI should then appear on the screen.


Shared Libraries

On Hoffman, users and groups do not have the permission to download packages directly to the installation folder. R libraries may be managed using a strategy that combines common and user libraries.

Common libraries allow for all users access to the bare bone software and packages without having to make individual copies for users. This facilitates the management of the software by administrators while saving space on the cluster.

So why not just allow anyone to download packages into the common library? That's a bit tricky to do. If anyone were allowed to download packages, then packages would be constantly changing and updating, and it would be difficult to maintain consistency across the lifespan of a project.

Creating a Group Library

But what if you're working in a group? or using different versions of libraries between projects?

Users can choose to create a group or project library by defining their library paths:

> .libPaths()

Issuing the statement above in the R command prompt will output a list of directories where R automatically searches for libraries.

Define a New Library Path

In order to create a shared library, first, determine the new location and create an R/<RVERSION> directory to store the libraries:

$ mkdir -p /u/project/<USERGROUP>/apps/R/3.6.0

Rprofile

Within the new library path, create a file called Rprofile that contains the following statement:

.libPaths(c(paste("/u/project/<USERGROUP>/apps/R",R.version$major,".",R.version$minor,sep=""), .libPaths()))

For each user in your group to begin using the new library, they should issue the following in their terminal:

$ cat /u/project/<USERGROUP>/apps/R/Rprofile >> $HOME/.Rprofile

The same can be done for a Rprofile configuration file located within a project folder. This would be a good place to define any project-specific setting:

$ cat /u/project/<USERGROUP>/apps/R/Rprofile >> u/project/<USERGROUP>/<PROJECT>/.Rprofile

Verify the new R library location by issuing the following in the R command prompt:

> .libPaths()

The new library path should appear in the output as such:

[1] "/u/project/<USERGROUP>/apps/R/3.6.0"


External Links