Jun 17 2013


JICS Auditorium, Building 5100
1 Bethel Valley Roak


George Ostrouchov

Programming with Big Data in R: pbdR

In this tutorial, we will introduce the R statistical programming language and quickly progress to a high performance extension to this language called Programming with Big Data in R, or pbdR for short.

First, we will introduce attendees to the R programming language.  Emphasis will be placed on its use as an analytics engine.  We will give a flavor or R’s breadth for data analysis but cover only a minimum required for the HPC portion of the tutorial. Consequently, no background with R is assumed yet even the experienced R user stands to gain much from attending.  This portion of the tutorial will be concluded with in-depth examples.

The second, longer, portion of the tutorial will cover the high performance extension pbdR.  Since the syntax of pbdR was developed to mimic serial R — often identically so — the first portion of the tutorial will not only be reinforced, but scaled to much larger problems.  Discussions and examples will be tailored to the analysis of data.  Only moderate familiarity with parallel computing is assumed.  This session will close with a revisiting of the in-depth examples from the first session, with some new twists.

Introduction to R
MPI Programming in R
Distributed Matrices

