11 Process Management

RStudio Connect launches R to perform a variety of tasks. This includes:

  • Installation of R packages
  • Rendering of R Markdown documents
  • Running Shiny Applications
  • Running a Shiny application to customize a parameterized R Markdown document.
  • Running APIs using Plumber (Beta)

The location of R defaults to whatever is in the path. Customize the Server.RVersion setting to use a specific R installation. See Chapter 13 for details.

11.1 Sandboxing

The RStudio Connect process runs as the root user. It needs escalated privileges to allow binding to protected ports and to create “unshare” environments that contain the R processes.

RStudio Connect runs its R processes as an unprivileged user; both a system default and content-specific overrides are supported. See Section 11.3 for details.

The “unshare” environment created for R execution involves first establishing a number of bind mounts and then switching to the target unprivileged user. RStudio Connect uses unshare to alter the execution context available to R processes. Within this newly established environment, a number of mount calls are made in order to hide or isolate parts of the filesystem.

You can learn more about unshare here. The mount call is detailed here. Your local man pages will document their behavior specific to your system.

The following locations are masked during R execution:

  • The Server.DataDir directory containing all variable data used by RStudio Connect.
  • The SQLite.Dir directory, which can optionally be placed outside the data directory.
  • Configuration directories, including /etc/rstudio-connect.
  • The /tmp and /var/tmp directories.

The following information is exposed during R execution:

  • The packrat data directory (read-only except when installing packages).
  • The R data directory (only when installing packages).
  • The directory containing the unpackaged R code (Shiny, Plumber, and R Markdown).
  • The document rendering destination directory (only for R Markdown).
  • A per-process temporary directory (exposed over the original /tmp and /var/tmp).

When Applications.HomeMounting is enabled, the contents of /home are masked by an additional bind mount as follows:

  • The contents of /home are masked by the home directory of the RunAs user.
  • If the RunAs does not have a home directory, an empty directory masks /home.

The path to the home directory is always available through the HOME environment variable. With Applications.HomeMounting, the mounted path to the HOME directory is subject to change. Avoid hard-coding paths to either /home and /home/username.

Running R applications, like Shiny apps and Plumber APIs, have write access to the directory containing the unpackaged R code. This application directory is the working directory when launching an application. Data written here will be visible to all processes associated with that application but are not visible to other R processes. Application directory data remains available until that application is next deployed to RStudio Connect. A deployment creates a new application directory containing only the deployed content.

RStudio Connect may launch multiple processes to service requests for an application. There is no coordination between these processes. Applications that write to local files could experience problems when different processes attempt to write to a single file.

For example, two different processes writing to the same file may see output incorrectly interleaved or even overwritten.

We do not recommend using the file system for data persistence.

R Markdown documents have write access to the rendering destination directory and read access to a directory containing the unpackaged R code. The source directory is the working directory when calling rmarkdown::render. The destination directory is passed as the output_dir while a temporary directory is passed as the intermediates_dir. The intermediate directory is transient and not available after rendering completes. A new output directory is created whenever the document is rendered. Data created during one rendering is not visible to another.

R Markdown multi-document sites have a slightly different rendering pipeline than standalone documents. RStudio Connect uses the rmarkdown::render_site function, which does its rendering in-place. The content from the source directory is copied into the rendering destination directory in preparation for rendering. Site rendering has write access to the destination directory. Access to the original source directory is not provided because the source content is duplicated in the destination directory

The rmarkdown::render_site call usually places its output into a subdirectory (typically, ’_site’). The contents of this output subdirectory will be moved to the root of the rendering destination directory, replacing any other content. No post-rendering file movement occurs if rmarkdown::render_site is instructed to render into the current directory instead of a subdirectory. This means that both source and output files will be available for serving.

We recommend against configuring rmarkdown::render_site to write its output into the current directory. Rendering the site into a subdirectory (the default) allows RStudio Connect to remove source from the output directory.

RStudio Connect serves rendered content from the document output directory. This content remains available until a subsequent rendering is successful and activated (if requested). Neither incomplete nor unsuccessful document renderings affect the availability of previously rendered content.

11.2 Shiny Applications & Plumber APIs

Note: Plumber APIs are currently in Beta.

Most of the R processes started by RStudio Connect are batch-oriented tasks. R is invoked, does a narrow set of work, and then exits. Shiny applications and Plumber APIs are different and may see an R process handle many requests for many users over their lifetimes. Both Shiny Applications and Plumber APIs are live applications that react to user requests on-demand.

RStudio Connect launches an R process tied to a live application when the first request arrives for that application. That R process will continue to service requests until it becomes idle and eventually terminated. If there is sufficient traffic against that application, RStudio Connect may launch additional processes to service those requests.

There are a number of configuration parameters which control the conditions under which processes for applications are launched and eventually reaped. The default values are appropriate for most applications but occasionally need customization in specialized environments. Section A.14 explains each of the options.

We recommend that adjustment to these runtime properties be done gradually.

11.3 User Account for R Processes

The RStudio Connect installation creates a local rstudio-connect user account. This account runs all the R processes; root does not invoke R. If you would like a different user to run R, customize the Applications.RunAs property.

Administrators can customize the RunAs user on a content-specific level. This means that different applications and R Markdown reports can be run using different Unix accounts. This setting can be found on the Access tab when editing content settings. Publishers and Viewers are prohibited from changing the RunAs user on a content-specific level.

If you choose to specify a custom RunAs user for content, that user must be a member of the Unix group that is the primary group of the Applications.RunAs user.

The rstudio-connect user, for example, has a primary group also named rstudio-connect. Any Unix account configured as a custom RunAs user for a Shiny application, Plumber API, or R Markdown report must be a member of the rstudio-connect group.

Installation of R packages always happens as the Application.RunAs user. An application or R Markdown report may override its RunAs setting; this alters how the deployed code is executed and does not impact package installation. See Section 11.1 for more information about process sandboxing.

11.4 Current user execution

RStudio Connect can use a local Unix account associated with the currently logged-in user when executing R. This feature requires that user authentication use PAM.

See Section 9.6 for information about using PAM for user authentication.

The Applications.RunAsCurrentUser property specifies that content can be configured to execute as the currently logged-in user.

[Applications]
RunAsCurrentUser = true

Administrators can now customize the RunAs settings to permit current-user execution on a content-specific level. The Access content setting tab offers the option of executing using “The Unix account of the current user”.

Content accessed anonymously will execute as the specified fallback RunAs user.

See Section 11.3 for more information about RunAs customization.

Content execution settings are not altered when RunAsCurrentUser is enabled. The RunAsCurrentUser setting permits current-user execution but by itself does not change how R processes are launched. Each application or R Markdown report must explicitly request current-user execution.

All Unix accounts used to execute R must be members of the Unix group that is the primary group of the Applications.RunAs user. Applications are not permitted to launch if the Unix account associated with the logged-in user does not have the proper group membership.

The Applications.RunAs setting uses the rstudio-connect user by default. This user has a primary group also named rstudio-connect. Any Unix account that may be used to execute applications or R Markdown reports must be a member of the rstudio-connect group.

11.5 PAM sessions

RStudio Connect can use PAM to establish the environment and resources available for R sessions.

See Section 9.6 for information about using PAM for user authentication.

PAM sessions are enabled with the PAM.UseSession setting.

[PAM]
UseSession = true

The default PAM service name used for PAM sessions is su. This gives RStudio Connect the ability to launch processes as the specified user without requiring a password.

You can customize the PAM service name used for PAM sessions by customizing the PAM.SessionService setting.

[PAM]
SessionService = rstudio-connect-session

Any custom PAM service must contain the PAM directive that enables authentication with root privilege.

# Allows root to su without passwords (required)
auth sufficient pam_rootok.so

11.6 Path Rewriting

The sandboxing used by RStudio Connect involves bind mounts which map physical locations on disk onto different directory structures at runtime. Paths used by your R code use these sandboxed locations. If you need to find the physical file on disk, you will need to undo the path transformation.

This section gives some examples of path rewriting and offer some ways of finding the file you need.

Let’s start with an app.R file that describes a Shiny application. This file will be in the apps/XX/YY/ directory underneath the Server.DataDir location. The XX and YY path components correspond to the application ID and bundle (or deployment) ID for this version of your application. This directory is available at runtime as /opt/rstudio-connect/mnt/app/.

The directory structure of /opt/rstudio-connect/mnt/ is just a number of empty directories. The “unshare” environment created during sandboxing allows RStudio Connect to associate different application directories with these mount directories.

Here are some common path transformations that may be helpful. All of the physical paths are beneath the Server.DataDir hierarchy that defaults to /var/lib/rstudio-connect. All of the sandbox paths are beneath the mount directory /opt/rstudio-connect/mnt/. This location is not customizable.

Physical path Sandbox path

DataDir/apps/XX/YY/

MountDir/app/

DataDir/reports/XX.ZZ

MountDir/report/

DataDir/R

MountDir/R

DataDir/packrat

MountDir/packrat

Here are some actual path transformations using the default Server.DataDir location:

# A source Shiny application
/var/lib/rstudio-connect/apps/4/7/app.R
    => /opt/rstudio-connect/mnt/app/app.R

# A source Plumber API
/var/lib/rstudio-connect/apps/38/10/plumber.R
=> /opt/rstudio-connect/mnt/app/plumber.R

# A source R Markdown document
/var/lib/rstudio-connect/apps/8/12/index.Rmd
    => /opt/rstudio-connect/mnt/app/index.Rmd

# An HTML document rendered from that R Markdown document
/var/lib/rstudio-connect/reports/8.2/index.html
    => /opt/rstudio-connect/mnt/report/index.html

# A staticly deployed document
/var/lib/rstudio-connect/apps/17/21/index.html
    => /opt/rstudio-connect/mnt/app/index.html

# The Shiny package inside the packrat cache
/var/lib/rstudio-connect/packrat/3.2.5/v2/library/shiny/
  28d6903a44dc53bd4823fa43ccdc08e5/shiny
    => /opt/rstudio-connect/mnt/packrat/3.2.5/v2/library/shiny/
         28d6903a44dc53bd4823fa43ccdc08e5/shiny

11.7 Program Supervisors

You may need to modify the environment or resources available to R processes prior to R being launched. This can be accomplished using a program supervisor using the Applications.Supervisor configuration setting.

The supervisor command is provided the full R command-line, which MUST be invoked by the supervisor. The process exit code from R MUST be returned as the exit code of the supervisor. The file descriptors for standard input, output, and error MUST NOT be intercepted by the supervisor.

A supervisor is executed as the appropriate RunAs user. Package installation always uses the Applications.RunAs user. Other R processes will use the content-specific RunAs account, falling back to Applications.RunAs if no override was configured. See Section 11.3 for details.

Supervisors run within the sandbox established for any R process. See Section 11.1 for more information about process sandboxes.

RStudio Connect configures the TMPDIR, HOME, and RSTUDIO_PANDOC environment variables for launched R processes. RStudio Connect also manages package installation and references. Avoid altering any of this behavior in program supervisors.

11.7.1 Example Supervisors

Here is a configuration that uses the nice command to lower the priority of all R processes. See http://linux.die.net/man/1/nice for details about nice. Because process supervisors are run as a RunAs user and not as root or another super-user, you may not be permitted to assign a negative (higher priority) privilege.

[Applications]
Supervisor = nice -n 2

Here is a configuration that uses a custom script to prepare a custom execution environment before finally running R.

[Applications]
Supervisor = /some/script/that/prepares/an/environment.sh

Here is an example supervisor that echos its arguments, sets an environment variable, then invokes whatever arguments have been passed.

#!/bin/bash

echo arguments: "$@"
echo

export COMPANY_DATA_HOME="/data/resides/here"

exec "$@"

Your organization may use shell initialization scripts to establish a particular environment. This environment might not be completely compatible with how RStudio Connect attempts to launch R.

We recommend building supervisor scripts gradually and carefully. Changes to the environment can alter how your content executes or even prevent R from running correctly.

11.8 Using the config Package

The config package makes it easy to manage environment specific configuration values in R code. For example, you might want to use one value for a variable locally, and another value when deployed on RStudio Connect. The package vignette contains more information.

The desired configuration is identified to the config package by the R_CONFIG_ACTIVE environment variable. By default, R processes launched by RStudio Connect set R_CONFIG_ACTIBE to rsconnect. The value can be changed by modifying the Applications.RConfigActive configuration setting. Note that the value of R_CONFIG_ACTIVE is not available during package installation.