13 High Availability and Load Balancing

Multiple instances of RStudio Package Manager can share the same data in highly available (HA) and load-balanced configurations. In this document, we refer to these configurations as “HA” for brevity.

13.1 HA Checklist

Follow the checklist below to configure multiple RStudio Package Manager instances for HA:

  1. Ensure that all node clocks are synchronized 13.2
  2. Ensure that all server configurations (i.e. contents of the /etc/rstudio-pm directory) are identical.
  3. Install and Configure the same version of RStudio Package Manager on each node - 2
  4. HA requires using a PostgreSQL database. All nodes in the cluster must use the same PostgreSQL database.
  5. When using NFS for shared storage, configure each server’s Server.DataDir (7.6) to point to the same shared location. Be sure to read 13.3.3 for additional information on the recommended settings for the shared directory. For more granular control of data directories, see 7.7 for information on customizing the locations of each storage class.
  6. When using S3 for shared storage, each server’s Server.EncryptionKeyPath must point to a file that contains the same encryption key.

13.2 HA Time Synchronization Requirements

The clocks on all nodes in an HA configuration must be synchronized. We recommend configuring NTP for clock synchronization.

13.3 HA Limitations

13.3.1 Node Management

RStudio Package Manager nodes in an HA configuration are not self-aware of HA. The load-balancing responsibility is fully assumed by your load balancer, and the load balancer is responsible for directing requests to specific nodes and checking whether nodes are available to accept request

13.3.2 Database Requirements

RStudio Package Manager only supports HA when using a PostgreSQL database.

13.3.3 Shared Data Directory Requirements

RStudio Package Manager manages repository content within the server’s data and variable data directories. These directories must be at shared locations, and each node must be configured to point to the same shared locations. See section 7.6 for more information on the server’s data directories.

RStudio Package Manager supports using either NFS or AWS S3 storage for shared data directories. You can also use a combination of both NFS and AWS S3 for different variable data classes. NFS

We recommend and support NFS version 3 or 4 for file sharing.

RStudio Package Manager relies on being able to efficiently detect new files inside of the NFS-shared DataDir. By default, NFS clients are configured to cache responses for up to 60 seconds, which means that it can take up to a minute before an RStudio Package Manager service is able to respond to certain requests. For most deployments, this is an unacceptably long delay.

Therefore, we strongly recommend that you modify your NFS client settings for the mount on which you’ll be hosting your DataDir. Typically, the best way to accomplish this is to set lookupcache=pos for your NFS mount, which will allow existing files to be cached but will contact the NFS server directly to check for the existence of new files. If this setting is not acceptable for your mount, you could alternatively consider shortening acdirmax or actimeo so that your client becomes aware of new files within, say, 5 seconds, instead of the default of 60. S3

IMPORTANT: AWS S3 support is in beta. Please do not use S3 for production data at this time.

When using S3 for shared storage, each server’s Server.EncryptionKeyPath must point to a file that contains the same encryption key. See also A.1. The easiest way to ensure a consistent encryption key on all nodes is to start RStudio Package Manager on one of the nodes and then copy the key file created at /var/lib/rstudio-pm/rstudio-pm.key to the same location on the other nodes. Set each key’s file mode to 0600.

Please refer to Data Destinations for information on configuring RStudio Package Manager to store variable data on S3. For help configuring your server with the credentials and settings you need to interact with S3, see S3 Configuration

13.4 Updating HA Nodes

When applying updates to the RStudio Package Manager nodes in your HA configuration, you should follow these steps to avoid errors due to an inconsistent database schema:

  1. Stop all RStudio Package Manager nodes in your cluster.
  2. Upgrade one RStudio Package Manager node. The first update will upgrade the database schema (if necessary) and start RStudio Package Manager on that instance - 5.2.
  3. Upgrade the remaining nodes.

If you forget to stop any RStudio Package Manager nodes while upgrading another node, these nodes will be using a binary that expects an earlier schema version, and will be subject to unexpected and potentially serious errors. These nodes will detect an out-of-date database schema within 30 seconds and shut down automatically.

13.5 Downgrading

If you wish to move from an HA environment to a single-node environment, please follow these steps:

  1. Stop all RStudio Package Manager services on all nodes
  2. Reconfigure your network to route traffic directly to one of the nodes, unless you wish to continue using a load balancer.
  3. If you wish to move all shared file data to the node, then
    1. Configure the server’s Server.DataDir to point to a location on the node, and copy all the data from the NFS share to this location - 7.6
  4. If you wish to move the databases to this node, install PostgreSQL on the node and copy the data. Moving the PostgreSQL databases from one server to another is beyond the scope of this guide. Please note that we do not support migrating from PostgreSQL to SQLite.
  5. Start the RStudio Package Manager process 5.1