Hoffman2:Software Tools: NDATools: Difference between revisions

From Center for Cognitive Neuroscience
Jump to navigation Jump to search
No edit summary
 
(22 intermediate revisions by the same user not shown)
Line 1: Line 1:
[https://github.com/NDAR/nda-tools More information of NDATools]
[https://github.com/NDAR/nda-tools More information of NDATools]


NDATools is for downloadig or uploading data to NDA. Here is how to use it in Hoffman
NDATools is for downloadig or uploading data to NDA. Here is how to use it in Hoffman to download data.


==Load modules==


module load python/3.6.1_shared
==Start an interactive mode==
module load ndatools
It needs more memory than Hoffman2 login nodes can offer. So you need to use a work node for this download


qrsh -l h_rt=20:00:00,h_data=4G -pe shared 2


==Use conda env==
module load anaconda3/2023.03
conda activate /u/project/CCN/apps/nda-tools/0.2.21
<!--
==Generate Temporary Tokens==  
==Generate Temporary Tokens==  


To access NDA data in AWS S3, you need temporary token generated by your NDA user credential
To access NDA data in AWS S3, you need temporary token generated by your NDA user credential.


  generate_token.sh ‘USERID’ ‘PASSWORD’ 'https://nda.nih.gov/DataManager/dataManager'
  generate_token.sh ‘USERID’ ‘PASSWORD’ 'https://nda.nih.gov/DataManager/dataManager'


Then you'll get something like the following
Replace USERID and PASSWORD with your NDA login ID and password.
 
Then you'll get something like the following. These keys and token will be used at the next step.
  Beginning token request...
  Beginning token request...
  Access Key:    AAAAAAAAAAAAAAA
  Access Key:    AAAAAAAAAAAAAAA
  Secret Key:    SSSSSSSSSSSSSSSSSSSSSSSS
  Secret Key:    SSSSSSSSSSSSSSSSSSSSSSSS
  Session Token:  
  Session Token:  
  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCD
  ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRST
  Expiration:    2021-03-04T12:55:00Z
  Expiration:    2021-03-04T12:55:00Z
-->


==Download Data==
==Download Data==
Line 31: Line 39:


  downloadcmd -h
  downloadcmd -h
optional arguments:
  -h, --help            show this help message and exit
  -dp, --package        Flags to download all S3 files in package.
  -t, --txt            Flags that a text file has been entered from where to
                        download S3 files.
  -ds, --datastructure  Flags that a data structure text file has been entered
                        from where to download S3 files.
  -u <arg>, --username <arg>
                        NDA username
  -p <arg>, --password <arg>
                        NDA password
  -r <arg>, --resume <arg>
                        Flags to restart a download process. If you already
                        have some files downloaded, you must enter the
                        directory where they are saved.
  -d <arg>, --directory <arg>
                        Enter an alternate full directory path where you would
                        like your files to be saved.
  -wt <arg>, --workerThreads <arg>
                        Number of worker threads
  --file-regex <regular expression>
                        Option can be used to download only a subset of the files in a package.
                        This command line arg can be used with the -ds, -dp or -t flags.
                        Examples -
                        1) To download all files with a ".txt" extension,
                            downloadcmd -dp 12345 --file-regex .*.txt
                        2) To download all files that contain "NDARINVZLHFUAF0" in the name, 
                            downloadcmd -dp 12345 -ds image03 --file-regex NDARINVZLHFUAF0
                        3) Finally to download all files underneath a folder called "T1w" 
                            downloadcmd -dp 12345 -t s3-links.txt --file-regex .*/T1w/.*
  --verify              When this option is provided a download is not initiated. Instead,
                        a csv file is produced that contains a record of the files in the download,
                        along with information about the file-size if the file could be found on the computer.
  -v, --verbose        Option to print out more detailed messages as the
                        program runs.


To start downloading the package, you'll need the packageID. The packageID can be found in NDA's website after you login and submit your request for data access.
To start downloading the package, you'll need the packageID. The packageID can be found in NDA's website after you login and submit your request for data access.
Line 39: Line 83:


  downloadcmd <packageID> -dp -d /u/project/MYGROUP/MYNDADATA_FOLDER -r /u/project/MYGROUP/MYNDADATA_FOLDER
  downloadcmd <packageID> -dp -d /u/project/MYGROUP/MYNDADATA_FOLDER -r /u/project/MYGROUP/MYNDADATA_FOLDER
Once starting the downloadcmd command, you'll be asked to input the ACCESS KEY, Secret Key and SESSION TOKEN. Use the information generated from the previous step.
If there's no error message, it will start downloading right away. To see more details download logs, add -v option.

Latest revision as of 22:03, 28 March 2024

More information of NDATools

NDATools is for downloadig or uploading data to NDA. Here is how to use it in Hoffman to download data.


Start an interactive mode

It needs more memory than Hoffman2 login nodes can offer. So you need to use a work node for this download

qrsh -l h_rt=20:00:00,h_data=4G -pe shared 2

Use conda env

module load anaconda3/2023.03
conda activate /u/project/CCN/apps/nda-tools/0.2.21


Download Data

Use command downloadcmd to download data

To check the usage of command downloadcmd

downloadcmd -h
optional arguments:
 -h, --help            show this help message and exit
 -dp, --package        Flags to download all S3 files in package.
 -t, --txt             Flags that a text file has been entered from where to
                       download S3 files.
 -ds, --datastructure  Flags that a data structure text file has been entered
                       from where to download S3 files.
 -u <arg>, --username <arg>
                       NDA username
 -p <arg>, --password <arg>
                       NDA password
 -r <arg>, --resume <arg>
                       Flags to restart a download process. If you already
                       have some files downloaded, you must enter the
                       directory where they are saved.
 -d <arg>, --directory <arg>
                       Enter an alternate full directory path where you would
                       like your files to be saved.
 -wt <arg>, --workerThreads <arg>
                       Number of worker threads
 --file-regex <regular expression>
                       Option can be used to download only a subset of the files in a package.
                       This command line arg can be used with the -ds, -dp or -t flags. 
                       Examples - 
                       1) To download all files with a ".txt" extension,
                           downloadcmd -dp 12345 --file-regex .*.txt
                       2) To download all files that contain "NDARINVZLHFUAF0" in the name,  
                           downloadcmd -dp 12345 -ds image03 --file-regex NDARINVZLHFUAF0
                       3) Finally to download all files underneath a folder called "T1w"  
                           downloadcmd -dp 12345 -t s3-links.txt --file-regex .*/T1w/.*
 --verify              When this option is provided a download is not initiated. Instead, 
                       a csv file is produced that contains a record of the files in the download,
                       along with information about the file-size if the file could be found on the computer.
 -v, --verbose         Option to print out more detailed messages as the
                       program runs.

To start downloading the package, you'll need the packageID. The packageID can be found in NDA's website after you login and submit your request for data access.

downloadcmd <packageID> -dp -d /u/project/MYGROUP/MYNDADATA_FOLDER

If the download got interrupted and you want to resume the download

downloadcmd <packageID> -dp -d /u/project/MYGROUP/MYNDADATA_FOLDER -r /u/project/MYGROUP/MYNDADATA_FOLDER

Once starting the downloadcmd command, you'll be asked to input the ACCESS KEY, Secret Key and SESSION TOKEN. Use the information generated from the previous step.

If there's no error message, it will start downloading right away. To see more details download logs, add -v option.