11 Job Spawner
11.1 Overview
The RStudio Job Spawner provides the ability for various RStudio applications, such as RStudio Server Pro and RStudio Connect, to start processes within various batch processing systems (e.g. IBM Spectrum LSF) and container orchestration platforms (e.g. Kubernetes). RStudio Server Pro integrates with the Job Spawner to allow you to run your R Sessions within your compute cluster software of choice, and allows you to containerize your sessions for maximum process isolation and operations efficiency. Furthermore, users can submit standalone jobs to your compute cluster(s) to run computationally expensive R or Python scripts.
11.2 Configuration
11.2.1 Job Spawner Configuration
For information on how to configure the Job Spawner service, see the Job Spawner documentation.
Before the Job Spawner can be run, it must be properly configured via the config file /etc/rstudio/spawner.conf
. The following table lists the supported configuration options.
11.2.2 RStudio Server Pro Integration
RStudio Server Pro must be configured in order to integrate with the Job Spawner. There are several files which house the configuration, and they are described within subsequent sections.
11.2.2.1 Server Configuration
The RStudio Server process rserver
must be configured to communicate with the Job Spawner in order to enable session launcing. The following table lists the various configuration options that are available to be specified in the rserver.conf
configuration file:
/etc/rstudio/rserver.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
spawner-sessions-enabled | Enables launching of rsession processes via the Job Spawner. This must be enabled to use the Job Spawner. | N | 0 |
spawner-address | TCP host/IP of the spawner host, or unix domain socket path (must match `/etc/rstudio/spawner.conf configuration value) | Y | |
spawner-port | Port that the spawner is listening on. Only required if not using unix domain sockets. | Y | |
spawner-default-cluster | Name of the cluster to use when launching sessions. Can be overridden by the launching user. | Y | |
spawner-sessions-callback-address | Address (http or https) of RStudio Server Pro that will be used by spawner sessions to communicate back for project sharing features. This is only required if using plugins that do not use containers, as project sharing is not yet supported for containerized sessions. | Y | |
spawner-use-ssl | Whether or not to connect to the spawner over HTTPS. Only supported for connections that do not use unix domain sockets. | N | 0 |
spawner-sessions-container-image | The default container image to use when creating sessions. Only required if using a plugin that requires containerization. | Y | |
spawner-sessions-container-run-as-root | Whether or not to run as root within the session container. | N | 1 |
spawner-sessions-create-container-user | Whether or not to create the session user within the container. Only applicable if using container sessions and not running containers as root. The created user will have the same UID and GID as the user that launched the session. It is recommended that this option be used, unless your containers connect to an LDAP service to manage users and groups. | N | 0 |
For example, your rserver.conf
file might look like the following:
/etc/rstudio/rserver.conf
spawner-address=localhost
spawner-port=5559
spawner-sessions-enabled=1
spawner-default-cluster=Kubernetes
spawner-sessions-callback-address=http://localhost:8787
spawner-use-ssl=1
spawner-sessions-container-image=rstudio:R-3.5
spawner-sessions-container-run-as-root=0
spawner-sessions-create-container-user=1
11.2.2.2 Containerized sessions
In order to run your R sessions in containers, you will need a Docker image that contains the necessary rsession
binaries installed. RStudio provides an official image for this purpose, which you can get from Docker Hub.
For example, to get the RHEL6 image, you would run:
docker pull rstudio/r-session:centos6-latest
After pulling the desired image, you will need to create your own Dockerfile that extends from the r-session base image and adds whatever versions of R you want to be available to your users, as well as adding any R packages that they will need. For example, your Dockerfile should look similar to the following:
FROM rstudio/r-session:centos6-latest
# install desired versions of R
RUN yum install -y R
# install R packages
...
11.2.2.2.1 Spawner Mounts
When creating containerized sessions via the Job Spawner, you will need to specify mount points as appropriate to mount the user’s home drive and any other desired paths.
To specify mount points, modify the /etc/rstudio/spawner-mounts
file to consist of multiple mount entries separated by a blank line. The following table lists the fields that are available for each mount entry in the file.
Field | Description | Required (Y/N) | Default Value |
---|---|---|---|
Path | The source directory of the mount, i.e. where the mount data comes from. | Y | |
Host | The NFS host name for the NFS mount. Only used if the mount is NFS. | N | |
MountPath | The path within the container that the directory will be mounted to. | Y | |
ReadOnly | Whether or not the mount is read only. Can be true or false. | N | false |
Additionally, paths may contain the special variable {USER}
to indicate that the user’s name be substituted, enabling you to mount user-specific paths.
An example /etc/rstudio/spawner-mounts
file is shown below.
/etc/rstudio/r-versions
# User home mount
Host: nfs01
Path: /home/{USER}
MountPath: /home/{USER}
ReadOnly: false
# Shared code mount
Host: nfs01
Path: /dev64
MountPath: /code
ReadOnly: false
It is important that each entry consists of the fields as specified above. Each field must go on its own line. There should be no empty lines between field definitions. Each entry must be separated by one full blank line (two new-line \n
characters).
11.2.2.2.2 Spawner Environment
You may optionally specify environment variables to set when creating containerized sessions.
To specify environment variables, modify the /etc/rstudio/spawner-env
file to consist of KEY=VALUE
pairs, one per line. Additionally, you can use the special {USER}
variable to specify the value of the launching user’s username, similar to the mounts file above.
An example /etc/rstudio/spawner-env
file is shown below.
/etc/rstudio/spawner-env
VAR1=VAL1
USER_HOME=/home/{USER}
11.2.2.2.3 Spawner Ports
You may optionally specify ports that should be exposed when creating containerized sessions. This will allow the ports to be exposed within the host running the container, allowing the ports to be reachable from external services. For example, for Shiny applications to be usable, you must expose the desired Shiny port, otherwise the browser window will not be able to connect to the Shiny application running within the container.
To specify ports, modify the /etc/rstudio/spawner-ports
file to consist of port numbers, one per line.
An example /etc/rstudio/spawner-ports
file is shown below.
/etc/rstudio/spawner-ports
5873
5874
64234
64235
11.3 Running the Spawner
Once it is configured, you can run the Job Spawner by invoking the command sudo rstudio-spawner start
, and stop it with sudo rstudio-spawner stop
. The Job Spawner must be run with root privileges, but similar to rstudio-server
, privileges are immediately lowered. Root privileges are used only to impersonate users as necessary.
11.4 Creating plugins
Plugins allow communication with specific batch cluster / container orchestration systems like Platform LSF and Kubernetes. However, you may be using a system that RStudio does not natively support. Fortunately, the Job Spawner provides a fairly simple means of creating custom plugins that can allow you to spawn jobs on any cluster software you desire.
Documentation for creating plugins can be found here.