HPCC – Cloud/External

Manuals: AWS

Mon, 01 Jan 0001 00:00:00 +0000

Manuals: CyVerse

Mon, 01 Jan 0001 00:00:00 +0000

CyVersae is a NFS funded web platorm that has many pre defined apps. Mainly of the apps are for life sicence and can be very useful.

NOTE: Not suitable for jobs that require more than 12 CPU cores (ie. MPI).

Account

Go to CyVerse and click on Create Account.

There are a few steps, just fill in the fields accordingly and complete the forms.

After you have completed the form and submitted it, CyVerse will send you an email. Within the email will contain a link to set your password.

Data Management

There are a few ways to upload/download data, for example you can browse your files from the Discovery Environment. However here we will focus on the command line method, since that is directly supported on the HPC cluster. Please refer to here for additoinal methods.

First you will need to load the icommands tools:

module load icommands/4.1.10

Then you will need to initialize the connection to CyVerse:

iinit

When you run the above command it will ask a few questions about your connection:

host name	port #	username	zone	password
data.cyverse.org	1247	CyVerse UserID	iplant	CyVerse Password

Once the iinit command has completed you are now able to list, push, get files and folders on CyVerse directlry from the HPCC.

Upload

The basic format to push files to CyVerse is like so:

iput FileName CyVersePath

For example:

iput hg18.fasta .

Since you automatically start in your home directory from CyVerse, the . will just place the fasta file directly within your home.

Once that command completes, you can double check that the the does exist on CyVerse, by listing the files, like so:

ils

The ils and iput command will work with relative and absolute paths.

Download

The download method is identical to the upload method, just repalce iget instead of iput:

iget hg18.fasta .

The above command will download the hg18.fasta file to your current directory on the cluster.

Jobs

Jobs on CyVerse are deployed via apps that you launch through the GUI here. Here is s video explaining how to create a docker image on the CyVerse system as well as configure a custom app to use it.

Please contact support for help creating a custom app, or any other questions.

Manuals: PRP

Mon, 01 Jan 0001 00:00:00 +0000

CAVEATS:

A fair amount of resources must be manually calculated from the currently available.

Also, it seems like a beta feature would repalce what I did here:

https://kubernetes.io/blog/2021/04/19/introducing-indexed-jobs/

The PRP (Pacific Research Platform) is a NSF funded Kubernetes infrastructure and therefore requires the use of a Kubernetes interface. The command line Kubernetes interface kubectl is what is used from the HPC cluster.

Account

Follow the guides on the PRP website posted here. There is a link called Get access, depending on your role, you can request a admin account or a user account.

Install

Once you have an account, and you have read all of the docs here, you can proceed to install. Submitting jobs, checking states, getting logs as well as interactive sessions can all be done from the HPCC.

However, running these operations directly from your laptop/workstation can be faster. In order to run these actions locally you will need to install kubectl on your local laptop/workstation.

You can install kubectl via conda, like so:

conda create -n kube -c anaconda-platform kubectl

If you do not yet have conda installed, follow these instructions.

Config

For kubectl to function, it requires your config file provdied by PRP. In order to get the PRP Kubernetes config file, do the following:

Visit Nautilus Portal
Click on Login in upper right coner.
Login using CILogon credentials (A.K.A UCR netID).
Once authenticated, click on the Get config in the upper right conner.
This takes a while to dynamically generate, just wait and eventually your browser will present you a download prompt.
Place this file in your ~/.kube directory.

Next set the namespace, or else you will have to append the -n ucr-hpcc flag to every Kubernetes command. You may be under the ucr-hpcc namespace if you are testing, otherwise you should have your own namespace:

kubectl config set-context nautilus --namespace=ucr-hpcc

Usage

Here is an example of an array style job on utilizing redis to track job numbers, and the dockerfile stored within the PRP GitLab repository.

First copy scripts/* files from the repo to your code base. Make sure that your analysis workflow is started within the worker.py script.
Create redis service and deployment

kubectl create -f hpcc-redis.yml

Log into Redis pod and manually add items

# Get into pod
kubectl exec -it redis-master -- /bin/bash
# Add 10 items to list "job2"
echo lpush job2 {1..10} | redis-cli -h redis --pipe
# Print all items in "job2"
redis-cli -h redis lrange job2 0 -1

Submit job

kubectl create -f hpcc-job.yml

Egress

The PRP has fail2ban blocking rapid SSH connections, so copying files within a loop would fail. It is best to try and copy all needed files with a single rsync command, like so:

nohup rsync -rvP --include='*/spades.log.gz' --include='*/scaffolds.fasta' --exclude='*/*' /output/ cluster.hpcc.ucr.edu:~/output &> rsync_spades.log

The above rsync command looks into the sub-directories within /output and will copy only the spades.log.gz and scaffolds.fasta files from each onto the HPCC cluster.

Trouble

From pod list, check log:

kubectl logs hpcc-pod

Jobs and pods will expire after 1 week, however you can alter this with the following:

ttlSecondsAfterFinished=604800

Check on the pod details:

kubectl describe pod hpcc-pod

Delete job if you want to rerun, this will also delete associated pods:

kubectl delete pods hpcc-pod

For updating your repo to the lastest HPCC changes, you can sync like so:

# Add upstream
git remote add upstream git://github.com/ORIGINAL-DEV-USERNAME/REPO-YOU-FORKED-FROM.git
# Get branchs
git fetch upstream
# Sync local files with master branch
git pull upstream master

Manuals: XSEDE

Mon, 01 Jan 0001 00:00:00 +0000

If you require large amounts of resources (ie. 1000s of CPUs, or 100s of GPUs) then access to the computing resources at the NSF funded XSEDE might be a good option.

Account

To create an account, visit the XSEDE Portal.

Proposal

As stated previously, writting a propoasl and then getting it approved is required to gain access to XSEDE. Instructions on how to do this are outlined here.

Data Management

There are several methods used to transfer data to and from XSEDE resources, they are outlined here

Jobs

For submitting jobs, XSEDE also supports Slurm, which is similar to what we already use on the HPC cluster.

Example on how to submit Slurm style jobs are described here

UCR Campus Champion

You can contact Jordan Hayes for additional information regarding XSEDE and how to gain access.

HPCC – Cloud/External

Manuals: AWS

Manuals: CyVerse

Account

Data Management

Upload

Download

Jobs

Manuals: PRP

Account

Install

Config

Usage

Egress

Trouble

Links

Manuals: XSEDE

Account

Proposal

Data Management

Jobs

UCR Campus Champion