8 Process Management

RStudio Connect launches R to perform a variety of tasks. This includes:

  • Installation of R packages
  • Rendering of R Markdown documents
  • Running Shiny Applications
  • Running a Shiny application to customize a parameterized R Markdown document.

The location of R defaults to whatever is in the path. Customize the Server.RVersion setting to use a specific R installation. See Chapter 10 for details.

8.1 Package Installation

RStudio Connect installs the R package dependencies of Shiny applications and R Markdown documents when that content is deployed. The RStudio IDE uses the rsconnect and packrat packages to bundle the relevant source code and document its dependencies. RStudio Connect then uses packrat to duplicate those package dependencies on the server.

Packrat attempts to re-use R packages whenever possible. The shiny package, for example, should be installed only when the first Shiny application is deployed. Subsequent Shiny applications can use that package and see faster deployments as a result. Packrat also allows multiple versions of a package to exist on a system. Two Shiny applications referencing different versions of shiny will reference the correct Shiny installation and these two packages will not conflict with each other.

Resolving which packages need installing and which are already available all happens when you deploy content to RStudio Connect.

8.2 Private Packages

Packages available on CRAN or a public GitHub repository are automatically downloaded and built when an application is deployed. RStudio Connect cannot automatically obtain private packages, but a workaround is available.

The configuration option Server.SourcePackageDir can reference a directory containing additional packages that Connect would not otherwise be able to retrieve. This directory and its contents must be readable by the Applications.RunAs user. Connect will look in this directory for packages before attempting to obtain them from a remote location.

This feature has some limitations.

  • The package must be tracked in a git repository so that each distinct version has a unique commit hash associated with it.
  • The package must have been installed from the git repository using the devtools package so that the hash is contained in the DESCRIPTION file on the client machine.

If these conditions are met, you may place .tar.gz source packages into per-package subdirectories of SourcePackageDir. The proper layout of these files is <package-name>/<full-git-hash>.tar.gz.

For example, if Server.SourcePackageDir is defined as /opt/R-packages, source bundles for the MyPrivatePkg package are located at /opt/R-packages/MyPrivatePkg. A commit hash of 28547e90d17f44f3a2b0274a2aa1ca820fd35b80 needs its source bundle stored as /opt/R-packages/MyPrivatePkg/28547e90d17f44f3a2b0274a2aa1ca820fd35b80.tar.gz.

When private package source is arranged in this manner, users of RStudio Connect will be able to use those package versions in their deployed content.

Be aware that this mechanism is specific to the commit hash, so you will either need to make many git revisions of your package available in the SourcePackageDir directory hierarchy or standardize to a particular git commit of the package.

8.3 Sandboxing

The RStudio Connect process runs as the root user. It needs escalated privileges to allow binding to protected ports and to create “unshare” environments that contain the R processes.

RStudio Connect runs its R processes as an unprivileged user; both a system default and content-specific overrides are supported. See Section 8.5 for details.

The “unshare” environment created for R execution involves first establishing a number of bind mounts and then switching to the target unprivileged user. RStudio Connect uses unshare to alter the execution context available to R processes. Within this newly established environment, a number of mount calls are made in order to hide or isolate parts of the filesystem.

You can learn more about unshare here. The mount call is detailed here. Your local man pages will document their behavior specific to your system.

The following locations are masked during R execution:

  • The Server.DataDir directory containing all variable data used by RStudio Connect.
  • The Database.Dir directory, which can optionally be placed outside the data directory.
  • Configuration directories, including /etc/rstudio-connect.
  • The /tmp and /var/tmp directories.
  • The /home hierarchy.

The following information is exposed during R execution:

  • The packrat data directory (read-only except when installing packages).
  • The R data directory (only when installing packages).
  • The directory containing the unpackaged R code (Shiny and R Markdown).
  • The document rendering destination directory (only for R Markdown).
  • A per-process temporary directory (exposed over the original /tmp and /var/tmp).
  • The home directory for the Applications.RunAs user, should it exist (exposed as /home).

Shiny applications have write access to the directory containing the unpackaged R code. This application directory is the working directory when launching Shiny. Data written here will be visible to all processes associated with that Shiny application but are not visible to other R processes. Application directory data remains available until that application is next deployed to RStudio Connect. A deployment creates a new application directory containing only the deployed content.

RStudio Connect may launch multiple processes to service requests for an application. There is no coordination between these processes. Shiny applications that write to local files could experience problems when different processes attempt to write to a single file. We recommend against using the file system for data persistence.

R Markdown documents have write access to the rendering destination directory and read access to a directory containing the unpackaged R code. The source directory is the working directory when calling rmarkdown::render. The destination directory is passed as the output_dir while a temporary directory is passed as the intermediates_dir. The intermediate directory is transient and not available after rendering completes. A new output directory is created whenever the document is rendered. Data created during one rendering is not visible to another.

R Markdown multi-document sites have a slightly different rendering pipeline than standalone documents. RStudio Connect uses the rmarkdown::render_site function, which does its rendering in-place. The content from the source directory is copied into the rendering destination directory in preparation for rendering. Site rendering has write access to the destination directory. Access to the original source directory is not provided because the source content is duplicated in the destination directory

The rmarkdown::render_site call usually places its output into a subdirectory (typically, ’_site’). The contents of this output subdirectory will be moved to the root of the rendering destination directory, replacing any other content. No post-rendering file movement occurs if rmarkdown::render_site is instructed to render into the current directory instead of a subdirectory. This means that both source and output files will be available for serving.

We recommend against configuring rmarkdown::render_site to write its output into the current directory. Rendering the site into a subdirectory (the default) allows RStudio Connect to remove source from the output directory.

RStudio Connect serves rendered content from the document output directory. This content remains available until a subsequent rendering is successful and activated (if requested). Neither incomplete nor unsuccessful document renderings affect the availability of previously rendered content.

8.4 Shiny Applications

Most of the R processes started by RStudio Connect are batch-oriented tasks. R is invoked, does a narrow set of work, and then exits. Shiny applications are different and may see an R process handle many requests for many users over its lifetime.

RStudio Connect launches an R process tied to a Shiny application when the first request arrives for that application. That R process will continue to service requests until it becomes idle and eventually terminated. If there is sufficient traffic against that Shiny application, RStudio Connect may launch additional processes to service those requests.

There are a number of configuration parameters which control the conditions under which processes for Shiny applications are launched and eventually reaped. The default values are appropriate for most applications but occasionally need customization in specialized environments. Section 12.13 explains each of the options.

We recommend that adjustment to these runtime properties be done gradually.

8.5 User Account for R Processes

The RStudio Connect installation creates a local rstudio-connect user account. This account runs all the R processes; root does not invoke R. If you would like a different user to run R, customize the Applications.RunAs property.

Administrators can customize the RunAs user on a content-specific level. This means that different Shiny applications and R Markdown reports can be run using different Unix accounts. This setting can be found on the Access tab when editing content settings. Publishers and Viewers are prohibited from changing the RunAs user on a content-specific level.

If you choose to specify a custom RunAs user for content, that user must be a member of the Unix group that is the primary group of the default RunAs user.

The rstudio-connect user, for example, has a primary group also named rstudio-connect. Any Unix account configured as a custom RunAs user for a Shiny application or R Markdown report must be a member of the rstudio-connect group.

Installation of R packages (see Section 8.3) always happens as the Application.RunAs user and not the override RunAs setting for a particular Shiny application or R Markdown report.