1 Job Launcher

1.1 Overview

The RStudio Job Launcher provides the ability for various RStudio applications, such as RStudio Server Pro and RStudio Connect, to start processes within various batch processing systems (e.g. IBM Spectrum LSF) and container orchestration platforms (e.g. Kubernetes). RStudio products integrate with the Job Launcher to allow you to utilize your existing cluster hardware for maximum process isolation and operations efficiency.

1.2 Configuration Options

To configure the Job Launcher, create and modify the /etc/rstudio/launcher.conf file. Configuration options are listed below.

Server Options

There should be one [server] section in the configuration file (see sample config below).

Config Option Description Required (Y/N) Default Value
address IPv4 or IPv6 address, or path to Unix domain socket Y
port Port number (0-65535) Y (when using IP Address)
enable-ssl Toggle usage of SSL encryption for connections N 0
certificate-file Certificate chain file of public certificates to present to incoming connections Y (Only required when SSL is enabled)
certificate-key-file Certificate private key file used for encryption Y (Only required when SSL is enabled)
server-user User to run the executable as. The Launcher should be started as root, and will lower its privilege to this user for normal execution. N rstudio-server
authorization-enabled Enables/disables authorization - this is required for all but test systems. Can be 1 (enabled) or 0 (disabled) N 1
admin-group Group name of users that are able to see/control all jobs in the system of other users. If using with RStudio Pro, this must match the rserver.conf’s server-user’s group value. N Empty
thread-pool-size Size of the thread pools used by the launcher N Number of CPUs * 2
request-timeout-seconds Number of seconds a plugin has to process a request before it is considered timed out N 120
bootstrap-timeout-seconds Number of seconds a plugin has to bootstrap before it is considered a failure N 120
max-message-size Maximum allowed message size of messages sent by plugins in bytes. It is strongly recommended you do not change this, but it may be bumped higher if you run into the limit. N 5242880
enable-debug-logging Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled) N 0
scratch-path Scratch directory where the launcher and its plugins write temporary state N /var/lib/rstudio-launcher
secure-cookie-key-file Location of the secure cookie key, which is used to perform authorization/authentication. It is strongly recommended you do not change this. N /etc/rstudio/secure-cookie-key

Cluster Options

There should be one [cluster] section in the configuration file per cluster to connect to / plugin to load (see sample config below).

Config Option Description Required (Y/N) Default Value
name Friendly name of the cluster Y
type Type of the cluster (for human consumption, display purposes) Y The plugin type. Can be one of Local, Kubernetes, or Slurm
exe Path to the plugin executable for this cluster N If using an RStudio plugin like Local, Kubernetes, or Slurm, this will be inferrered from the value of type. If using a custom plugin, you must provide its executable path in this option.
config-file Path to the configuration file for the plugin N Each plugin will have its own default config location
allowed-groups Comma-separated list of user groups that may access this cluster N Empty (all groups may access)

1.2.1 Sample Configuration

/etc/rstudio/launcher.conf

[server]
address=127.0.0.1
port=5559
server-user=rstudio-server
admin-group=devops
authorization-enabled=1
thread-pool-size=4
enable-debug-logging=1

[cluster]
name=Local
type=Local
exe=/usr/lib/rstudio-server/bin/rstudio-local-launcher
allowed-groups=devs,admins

1.2.2 Job Launcher Plugin Configuration

Each specific cluster plugin can be additionally configured via its own configuration file, and some plugins (such as the Kubernetes plugin) require additional configuration. Documentation for all plugins created by RStudio can be found in the following sections.

1.2.2.1 Local Plugin

The Local Job Launcher Plugin provides the capability to launch executables on the local machine (same machine that the Launcher is running on). It also provides the capability of running arbitrary PAM profiles. All of the sandboxing capability is provided via rsandbox.

The local plugin does not require configuration, and it is recommended you do not change any of the defaults.

/etc/rstudio/launcher.local.conf

Config Option Description Required (Y/N) Default Value
server-user User to run the executable as. The plugin should be started as root, and will lower its privilege to this user for normal execution. N rstudio-server
thread-pool-size Size of the thread pool used by the plugin N Number of CPUs * 2
enable-debug-logging Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled) N 0
scratch-path Scratch directory where the plugin writes temporary state N /var/lib/rstudio-launcher
job-expiry-hours Number of hours before completed jobs are removed from the system N 24
save-unspecified-output Enables/disables saving of stdout/stderr that was not specified in submitted jobs. This will allow users to view their output even if they do not explicitly save it, at the cost of disk space. N 1
rsandbox-path Location of rsandbox executable. N /usr/lib/rstudio-server/bin/rsandbox

1.2.2.2 Kubernetes Plugin

The Kubernetes Job Launcher Plugin provides the capability to launch executables on a Kubernetes cluster.

It is recommended not to change the default values which come from the Job Launcher itself and only configure required fields as outlined below.

/etc/rstudio/launcher.kubernetes.conf

Config Option Description Required (Y/N) Default Value
server-user User to run the executable as. The plugin should be started as root, and will lower its privilege to this user for normal execution. N rstudio-server
thread-pool-size Size of the thread pool used by the plugin. N Number of CPUs * 2
enable-debug-logging Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). N 0
scratch-path Scratch directory where the plugin writes temporary state. N /var/lib/rstudio-launcher
job-expiry-hours Number of hours before completed jobs are removed from the system. N 24
profile-config Path to the user and group profiles configuration file (explained in more detail below). N /etc/rstudio/launcher.kubernetes.profiles.conf
api-url The Kubernetes API base URL. This can be an HTTP or HTTPS URL. The URL should be up to, but not including the /api endpoint. Y Example: https://192.168.99.100:8443
auth-token The auth token for the job-launcher service account. This is used to authenticate with the Kubernetes API. This should be base-64 encoded. See below for more information. Y
kubernetes-namespace The Kubernetes namespace to create jobs in. Note that the account specified by the auth-token setting must have full API privileges within this namespace. See Kubernetes Cluster Requirements below for more information. N rstudio
verify-ssl-certs Whether or not to verify SSL certificates when connecting to api-url. Only applicable if connecting over HTTPS. For production use, you should always have this set to true, but can be disabled for testing purposes. N 1
certificate-authority Certificate authority to use when connecting to Kuberentes over SSL and when verifying SSL certificates. This must be a Base64-encoded PEM certificate, which is what most Kubernetes systems will report as the certificate authority in use. Leave this blank to just use the system root CA store. N
watch-timeout-seconds Number of seconds before the watch calls to Kubernetes stops. This is to help prevent job status updates from hanging in some environments. It is recommended to keep the default, but it can be raised if job status hangs are not apparent, or turned off by setting this to 0. N 300
fetch-limit The maximum amount of objects to request per API call from the Kubernetes Service for GET collection requests. It is recommended you only change the default if you run into size issues with the returned payloads. N 500

In order to retrieve the auth-token value, run the following commands. Note that the account must first be created and given appropriate permissions (see Kubernetes Cluster Requirements below).

KUBERNETES_AUTH_SECRET=$(kubectl get serviceaccount job-launcher --namespace=rstudio -o jsonpath='{.secrets[0].name}')
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d
1.2.2.2.1 User and Group Profiles

The Kubernetes plugin also allows you to specify user and group configuration profiles, similar to RStudio Server Pro’s profiles, in the configuration file /etc/rstudio/launcher.kubernetes.profiles.conf (or any arbitrary file as specified in profile-config within the main configuration file; see above). These are entirely optional.

Profiles are divided into sections of three different types:

Global ([*])

Per-group ([@groupname])

Per-user ([username])

Here’s an example profiles file that illustrates each of these types:

/etc/rstudio/launcher.kubernetes.profiles.conf

[*]
placement-constraints=node,region:us,region:eu
default-cpus=1
default-mem-mb=512
max-cpus=2
max-mem-mb=1024
container-images=r-session:3.4.2,r-session:3.5.0
allow-unknown-images=0

[@rstudio-power-users]
default-cpus=4
default-mem-mb=4096
max-cpus=20
max-mem-mb=20480
container-images=r-session:3.4.2,r-session:3.5.0,r-session:preview
allow-unknown-images=1

[jsmith]
max-cpus=3

This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory, and use only two different R containers. It also specifies that members of the rstudio-power-users group will be allowed to use much more resources, and the ability to see the r-session:preview image, in addition to being able to run any image they specify.

Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below.

/etc/rstudio/launcher.kubernetes.profiles.conf

Config Option Description Required (Y/N) Default Value
container-images Comma-separated string of allowed images that users may see and run. N
default-container-image The default container image to use for the Job if none is specified. N
allow-unknown-images Whether or not to allow users to run any image they want within their job containers, or if they have to use the ones specified in container-images N 1
placement-constraints Comma-separated string of available placement constraints in the form of key1:value1,key2:value2,... where the :value part is optional to indicate free-form fields. See next section for more details N
default-cpus Number of CPUs available to a job by default if not specified by the job. N 0.0 (infinite - managed by Kubernetes)
default-mem-mb Number of MB of RAM available to a job by default if not specified by the job. N 0.0 (infinite - managed by Kubernetes)
max-cpus Maximum number of CPUs available to a job. N 0.0 (infinite - managed by Kubernetes)
max-mem-mb Maximum number of MB of RAM available to a job. N 0.0 (infinite - managed by Kubernetes)
1.2.2.2.2 Kubernetes Cluster Requirements

In order for the Kubernetes plugin to run correctly, the following assumptions about the Kubernetes cluster must be true:

  • The Kubernetes API must be enabled and reachable from the machine running the Job Launcher
  • There must be a namespace to create jobs in, which can be specified via the kubernetes-namespace configuration mentioned above (this defaults to rstudio)
  • There must be a service account that has full API access for all endpoints and API groups underneath the aforementioned namespace, and the account’s auth token must be supplied to the plugin via the auth-token setting
  • The service account must have access to view the nodes list via the API (optional, but will restrict IP addresses returned for a job to the internal IP if not properly configured, as /nodes is needed to fetch a node’s external IP address)
  • The cluster must have the metrics-server addon running and working properly to provide job resource utilization streaming

In order to use placement constraints, you must attach labels to the node that match the given configured placement constraints. For example, if you have a node with the label az=us-east and have a placement constraint defined az:us-east, incoming jobs specified with the az:us-east placement constraint will be routed to the desired node. For more information on Kubernete’s placement constraints, see here.

The following sample script can be run to create a job-launcher service account and rstudio namespace, granting the service account (and thus, the launcher) full API access to manage RStudio jobs:

kubectl create namespace rstudio
kubectl create serviceaccount job-launcher --namespace rstudio
kubectl create rolebinding job-launcher-admin \
   --clusterrole=cluster-admin \
   --group=system:serviceaccounts:rstudio \
   --namespace=rstudio
kubectl create clusterrole job-launcher-clusters \
   --verb=get,watch,list \
   --resource=nodes
kubectl create clusterrolebinding job-launcher-list-clusters \
  --clusterrole=job-launcher-clusters \
  --group=system:serviceaccounts:rstudio

1.2.2.3 Slurm Plugin

The Slurm Job Launcher Plugin provides the capability to launch executables on a Slurm cluster. It is recommended not to change the default values which come from the Job Launcher itself and only configure required fields as outlined below.

/etc/rstudio/launcher.slurm.conf

Config Option Description Required (Y/N) Default Value
server-user User to run the executable as. The plugin should be started as root and will lower its privilege to this user for normal execution. N rstudio-server
thread-pool-size Size of the thread pool used by the plugin. N Number of CPUs * 2
enable-debug-logging Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). N 0
scratch-path Scratch directory where the plugin writes temporary state. N /var/lib/rstudio-launcher
job-expiry-hours Number of hours before completed jobs are removed from the system. N 24
profile-config Path to the user and group profiles configuration file (explained in more detail below). N /etc/rstudio/launcher.slurm.profiles.conf
slurm-service-user The user to run slurm service commands as. This user must have privileges to query the slurm cluster. If using SSH, this user must also have an identity file configued in the appropriate location. Y
user-storage-path The default location to store slurm job output. Can be templated with {HOME} or {USER}. Users must have write access to the confgured location. Paths beggining with ~ will be correctly evaluated. N ~/slurm-data
max-output-stream-seconds The maximum amount of time to keep the job output stream open after the job is completed, in seconds. Since job output may be buffered, the output stream will stay open until it sees an end of stream notifier or it waits the configured number of seconds. Setting this option to a low value may cause job output to appear truncated. Reloading the job output window should resolve that. A value of 0 will cause the output stream to close immediately when the job finishes. N 30
1.2.2.3.1 User and Group Profiles

The Slurm plugin also allows you to specify user and group configuration profiles, similar to RStudio Server Pro’s profiles, in the configuration file /etc/rstudio/launcher.slurm.profiles.conf (or an arbitrary file as specified in profile-config within the main configuration; see above). These are entirely optional.

Profiles are divided into sections of three different types:

Global ([*])

Per-group ([@groupname])

Per-user ([username])

Here’s an example profiles file that illustrates each of these types:

/etc/rstudio/launcher.slurm.profiles.conf

[*]
default-cpus=1
default-mem-mb=512
max-cpus=2
max-mem-mb=1024

[@rstudio-power-users]
default-cpus=4
default-mem-mb=4096
max-cpus=20
max-mem-mb=20480

[jsmith]
max-cpus=3

This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory and 2 CPUs. It also specifies that members of the rstudio-power-users group will be allowed to use much more resources.

Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below. Also note that if the Slurm cluster has been configured to have a maximum and/or default memory value, these values will be returned whenever a maximum or default value is not configured for a user.

/etc/rstudio/launcher.slurm.profiles.conf

Config Option Description Required (Y/N) Default Value
default-cpus Number of CPUs available to a job by default if not specified by the job. N 0.0 (infinite - managed by Slurm)
default-mem-mb Number of MB of RAM available to a job by default if not specified by the job. N 0.0 (infinite - managed by Slurm)
max-cpus Maximum number of CPUs available to a job. N 0.0 (infinite - managed by Slurm)
max-mem-mb Maximum number of MB of RAM available to a job. N 0.0 (infinite - managed by Slurm)
1.2.2.3.2 Slurm Cluster Requirements

In order for the Slurm plugin to run correctly, the following assumptions about the Slurm cluster must be true:

  • The Slurm service account (specified in the main configuration file) must have full cluster-admin privileges.
  • The Slurm control machine (the one running slurmctld), the RStudio Launcher host machine, and all Slurm nodes must have a shared home directory.
  • The RStudio Launcher host machine must have the following properties:
    • the Slurm executables installed (e.g. sinfo, scontrol, etc.)
    • the same slurm.conf file as the desired Slurm cluster
    • network connectivity to the machine running slurmctld (i.e. the RStudio Launcher host machine can resolve the IP or hostname of the Slurm control machine and connect via the slurmctld port configured in slurm.conf)
    • properly configured and running Slurm plugins, as required (e.g. if using MUNGE as an authentication service, munged must be running under the same user on all machines connected to the slurm cluster)
    • properly configured users and groups (i.e. all users with the same name have the same UID, group, group ID on all machines connected to the cluster)

For more information about configuring and running a Slurm cluster, please see the Slurm documenation. Information about available Slurm plugins can also be found in the Slurm documentation in the relevant section. For example, here is the documentation about Slurm Accounting which also includes information about the available plugins and how to use them.

Below is an example of a launcher configuration which might be used in this scenario:

/etc/rstudio/launcher.conf

[server]
address=127.0.0.1
port=5559
server-user=rstudio-server
admin-group=devops
authorization-enabled=1
thread-pool-size=4
enable-debug-logging=1

[cluster]
name=Slurm
type=Slurm

/etc/rstudio/launcher.slurm.conf

slurm-service-user=slurm
job-expiry-hours=48
user-storage-path=~/slurm-data
max-output-stream-seconds=15
1.2.2.3.3 Using the Slurm Launcher Plugin with RSP

To support launching RSP R Sessions via the Slurm Launcher plugin, the following must be true in addition to the requirments listed in the Slurm Cluster Requirements section:

  • The RSP host must have network access to every Slurm node that may run an R Session via any TCP port
  • Slurm nodes must have network access to the RSP host via the launcher-sessions-callback-access in order to support launcher jobs via the session, as described in the RSP Launcher Configuration documentation
  • To encorporate R Session configurations, rsession.conf must be accessible by all Slurm nodes that may run R Sessions. The default expected location can be changed by adding rsession-config-file=<path/to/rsession.conf> to /etc/rstudio/rserver.conf
1.2.2.3.4 Additional Considerations

This section lists notable considerations related to the use of the Slurm Plugin.

  • Slurm does not provide a time zone for any time values which it returns. All times related to slurm jobs returned from the launcher will have the same time zone as the configured Slurm cluster.

1.3 Running the Service

Once configured, you can run the Job Launcher via service by executing the command sudo rstudio-launcher start. The Launcher service needs root privilege for performing authentication and authorization, as well as providing any child plugin processes with root privilege (as needed). After initial setup, the Job Launcher lowers its privilege to the server user (see Configuration Options for more information).

If the Job Launcher service fails to start and continue running, one of its plugins exited in failure and is likely not configured properly. It is often easier to run the Job Launcher in the terminal directly when getting it set up for the first time so you can more easily see any reported errors and more quickly test configuration changes. In order to run from the console, execute the command sudo /usr/lib/rstudio-server/rstudio-launcher. If you are still having troubles starting the service, see Logging and Troubleshooting.

The service is not automatically configured to start on system startup, and you must enable this manually if desired by using the following commands:

systemd

systemctl enable rstudio-launcher.service

System V

chkconfig --add rstudio-launcher

1.4 Logging and Troubleshooting

By default, the Job Launcher and its plugins write logs to the system logger. If the service fails to start, check the system log to see if there are any errors, which should help you determine what is going wrong. In general, errors are usually a result of misconfiguration of the Job Launcher or one of its plugins. When initially setting up the Launcher, it is sometimes helpful to run it directly from the command line, as opposed to running it via the service. See Running the Service for more information.

When running into issues that you are unable to resolve, make sure to enable debug logging for the Job Launcher by adding the line enable-debug-logging=1 to /etc/rstudio/launcher.conf. This will cause the Launcher and all of its plugins to emit debug output. This debug output can be seen on the console (if running the Job Launcher manually in the terminal), or in a debug log file located under the /var/lib/rstudio-launcher folder for the Job Launcher service, and under the plugin’s subdirectory for plugin-specific logging.

1.5 Load Balancing and Monitoring

The Job Launcher can be load balanced. It is recommended that you use an active/active setup for maximum throughput and scalability. This means that you should have multiple Job Launcher nodes pointed to your specific cluster back-ends, and have a load balancer configured to round-robin traffic between them.

The ability for the Job Launcher to be load balanced effectively depends on each plugin’s individual design and whether or not it effectively supports load balancing. For example, the Local plugin does not provide load balancing capabilities. As such, the Local plugin should only be used in specific deployments scenarios and should not be used in most cases. However, the other RStudio plugins will work properly when used in a load balancing setup. Most plugins should support load balancing configurations, but you must be aware of which ones do not. RStudio cannot provide load balancing guarantees for third-party plugins.

The /status endpoint of the Job Launcher can be used to get the current health status and other connection information. Unlike other Job Launcher endpoints, this endpoint does not require authorization and may be queried by any monitoring or load balancing software to determine the health of a specific Job Launcher node. The status field indicates whether a node is experiencing no issues (“Green”), one or more plugins are restarting or unavailable (“Yellow”) or all plugins have failed and service shutdown is imminent (“Red”). It is recommended that you reroute traffic to another launcher node if you receive a “Yellow” or “Red” status, or if the page fails to load.