Integrating RStudio Server Pro with Spark and sparklyr#
sparklyr is an R interface for Apache Spark that allows you to install and connect to Spark, filter and aggregate datasets using dplyr syntax against Spark, then bring them into R for analysis and visualization.
You can install RStudio Server Pro within a Spark/Hadoop cluster and use sparklyr from R sessions.
The following articles describe how to integrate RStudio Server Pro with a Spark cluster in different configurations:
- Using sparklyr with Cloudera CDH
- Using sparklyr with Amazon EMR
- Deployment and configuration options
Visit spark.rstudio.com for more information.