Using R on HPC Clusters Webinar
This OLCF hosted Webinar tutorial helps users learn a basic workflow for how to use R on an HPC cluster. The tutorial will focus on parallel computing as a means to speed up R scripts on a cluster computer. Many packages in R offer some form of parallel computing yet they rely on a much smaller set of underlying approaches: multithreading in compiled code, the unix fork, and MPI. The tutorial will take a narrow path to focus on packages that directly engage the underlying approaches, yet are easy to use at a high-level. This workshop is targeted for current users of OLCF, CADES, ALCF and NERSC. Users who do not already have accounts on those system are welcome to attend the lectures but will not be able to participate in all of the hands-on activities.
Objectives
- Learn a workflow to edit R code on your laptop and run it on an HPC cluster
- Learn how to use multicore and distributed parallel concepts in R on an HPC cluster system
Topics covered:
Day 1: Wednesday, August 17, 1:00pm 4:00pm EDT
Hardware and software overview and ways to use multiple cores on a single node: using the mclapply function in the parallel package, using multithreaded BLAS
Day 2: Friday , August 19 , 1:00pm 4:00pm EDT
Distributed: Hardware review and using multiple nodes: MPI at high level via pbdMPI package, matrix methods via kazaam and pbdDMAT packages
Hands-on exercises workflow:
Edit your code in RStudio on your laptop -> push the code to GitHub/GitLab -> pull the code to the cluster and submit as batch -> look at your output and circle back to Edit.
This has the advantage of editing code in a familiar environment and running it in a common teaching environment. Other workflows are possible if you already know the tools.
We start with each user forking a GitHub exercise repository to own GitHub account and working with it as described above. See prerequisites that follow.
Prerequisites:
This workshop is targeted for users of OLCF, CADES, ALCF and NERSC. Users who do not have accounts on systems at those centers will be able to participate in the lectures and laptop hands-on parts of the course but will not be able to do the hands-on parts on the HPC clusters during the course, though the course repo will be provided to all attendees. The workshop assumes that participants have done the following:
- Have R installed on laptop
- Have RStudio Desktop installed on laptop
- Have git installed on laptop
- Are able to ssh to a remote machine
- For Mac use Terminal
- For Windows use Putty
- See: https://github.com/olcf/foundational_hpc_skills/raw/master/intro_to_ssh/Intro_to_ssh_clients.pdf
- Have worked with GitHub in RStudio. Have a GitHub account, know how to create or fork a repository and work with it from RStudio. Many tutorials are available on the web, for example: https://happygitwithr.com/index.html.
- Know a few basic unix commands for listing files, creating a directory, removing files, etc. Lots of places to learn, for example Intro to Unix or Unix Shell Crash Course
- Slides and Recording
- Slides Part 1
- Recording Part 1
- Slides Part 2
- Recording Part 2
- Exercise Git Repo