High Availability & Load Balancing

Multiple instances of Posit Connect can share the same data in highly available (HA) and load-balanced configurations.

HA Checklist

Follow the checklist below to configure multiple Posit Connect instances for HA:

Install and Configure the same version of Posit Connect on each node.
Migrate to a PostgreSQL database (if running SQLite). All nodes in the cluster must use the same PostgreSQL database.
Configure each server’s Server.DataDir to point to the same shared location; see Variable Data and Shared Data Directory Requirements.
If the Database.Dir setting has been customized, ensure that it points to a consistent, shared location on each server; see Variable Data and Shared Data Directory Requirements.
Configure each server’s Server.LandingDir to point to the same shared location (if using a custom landing page); see Using a Custom Landing Page and Shared Data Directory Requirements.
Configure each server’s Metrics.DataPath directory to point to a unique-per-server location; see the Metrics configuration appendix. Alternatively, you may also wish to consider using Graphite to write all metrics to a single location; see Metrics Requirements.
Configure your load balancer to route traffic to your Posit Connect nodes with sticky sessions; see rsconnect Cookie Support.

Note

Most users will want all nodes using identifical configurations in HA environments, but it is not strictly required.

HA Limitations

Clock Synchronization

All nodes in an Posit Connect HA Configuration MUST have their clocks synchronized, preferably using ntp. Failure to synchronize system clocks between nodes can lead to undefined behavior, including loss of data.

Consistent Users and Groups

Posit Connect executes your content using one or more target accounts. The rstudio-connect user and its primary group (also named rstudio-connect) are created when Posit Connect is installed. The rstudio-connect user serves as the default for Applications.RunAs and is used when executing content. The Unix group Applications.SharedRunAsUnixGroup defaults to the primary group of the Applications.RunAs user, in this case, rstudio-connect.

You must ensure consistent UID/GID for all Unix user accounts and groups that may be used to execute your deployed content.

The id command is one way to check the user and group identifiers for a single Unix username. Your systems administrator can probably help configure accounts in a uniform way across your cluster of hosts.

id rstudio-connect
# => uid=998(rstudio-connect) gid=998(rstudio-connect) groups=998(rstudio-connect)

Node Management

Posit Connect nodes in a HA configuration are not self-aware of HA. The load-balancing responsibility is fully assumed by your load balancer, and the load balancer is responsible for directing requests to specific nodes and checking whether nodes are available to accept requests.

Database Requirements

Posit Connect only supports HA when using a PostgreSQL database. If you are using SQLite, please switch to PostgreSQL. See the Changing Database Provider section.

Shared Data Directory Requirements

Posit Connect manages uploaded content within the server’s data directory. This data directory must be a shared location. The Server.DataDir configuration on each node must point to the same shared location. See the Variable Data section for more information on the server’s data directory.

We recommend and support:

NFS (version 3)
Amazon Elastic Filesystem (EFS) - See Using Amazon EFS with Posit Team

Note

Optionally, if you choose to configure Database.Dir, (not required), this also must point to the same shared location.

Metrics Requirements

By default, Posit Connect writes resource metrics to a set of RRD files and does not support aggregation, as each server must maintain a separate set of RRD files to avoid conflicts. The Connect dashboard for a specific node will only show metrics for that node. See the Metrics configuration appendix for information on configuring a unique Metrics.DataPath for each server.

If you wish to aggregate resource metrics, consider using Graphite or any monitoring tool compatible with the Carbon protocol. See the Operational Metrics page for more information.

Additional metrics, such as current active Shiny sessions and the duration of active jobs per queue and host, are exposed via Prometheus metrics. See the Prometheus section of the Operational Metrics documentation for more information.

Shiny Applications

Python and R Shiny applications depend on a persistent connection to a single server. Please configure your load-balancer to use cookie-based sticky sessions to ensure that Shiny applications function properly when using HA.

rsconnect Cookie Support

For cookie-based sticky session support, you will need to ensure that your R clients (including the RStudio IDE) use rsconnect version 0.8.3 or later. Versions of rsconnect prior to 0.8.3 did not include support for cookies.

Updating HA Nodes

When applying updates to the Posit Connect nodes in your HA configuration, you should follow these steps to avoid errors due to an inconsistent database schema:

Stop all Posit Connect nodes in your cluster.
Follow the Upgrade instructions to upgrade one Posit Connect node. The first update upgrades the database schema (if necessary) and starts Posit Connect on that instance.
Upgrade the remaining nodes using the same Upgrade instructions.

If you forget to stop any Posit Connect nodes while upgrading another node, these nodes will be using a binary that expects an earlier schema version, and will be subject to unexpected and potentially serious errors. These nodes will detect an out-of-date database schema within 30 seconds and shut down automatically.

Downgrading

If you wish to move from an HA environment to a single-node environment, please follow these steps:

Stop all Connect services on all nodes
Reconfigure your network to route traffic directly to one of the nodes, unless you wish to continue using a load balancer.
If you wish to move all shared file data to the node, then
1. Configure the server’s Server.DataDir to point to a location on the node, and copy all the data from the NFS share to this location; see Variable Data.
2. If using a custom landing page, configure Server.LandingDir to point to a location on the node, and copy the custom landing page data from the NFS share to this location; see Using a Custom Landing Page.
3. Configure the server’s Metrics.DataPath directory to point to an appropriate location. If necessary, copy the data from the NFS share to this location; see Metrics Requirements.
If you wish to move the database to this node, install PostgreSQL on the node and copy the data. Moving the PostgreSQL database from one server to another is beyond the scope of this guide. Migrating from PostgreSQL back to SQLite is not supported.
Start the Connect process; see Stopping and Starting

HA Details

Scheduled document rendering

Scheduled content is rendered across all Connect nodes. By default, each node can run up to two scheduled jobs at the same time. The Applications.ScheduleConcurrency setting adjusts the number of scheduled rendering operations that can run concurrently.

Hosts with substantial processing power can increase this setting. This can be helpful if your environment has many long-running reports.

A particular host can disable processing of scheduled jobs by setting Applications.ScheduleConcurrency to zero.

; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
ScheduleConcurrency = 0

Note

No schedule job will execute if every host sets Applications.ScheduleConcurrency to zero.

The Applications.ScheduleConcurrency setting does not affect ad-hoc rendering requests, hosted APIs, or applications.

Ad-hoc document rendering

Ad-hoc rendering processes run on the server which received the request. Your load balancer should distribute incoming requests to an appropriate Connect node.

Content deployment

Content deployment processes run on the server which received the request. Your load balancer should distribute incoming requests to an appropriate Connect node.

Applications and APIs

Requests to access hosted applications and APIs are handled by the Connect server which received the request. Your load balancer should distribute incoming requests to an appropriate Connect node. Worker processes to service the target content are started and stopped based on the level of incoming traffic to that node.

Note

Minimum and maximum application process limits are enforced by each Connect node. For example, if an application is configured with a 10 process maximum and there are two Connect nodes, up to 20 processes may be used to service that single application.

See the Scheduler configuration appendix for more information.

Polling

Posit Connect nodes poll the data directory for new scheduled jobs:

Every 5 seconds, and
After every completed scheduled job.

Abandoned Processes

While processing a scheduled job, the Posit Connect node periodically updates the job’s metadata in the database with a “heartbeat”. If the node goes offline and the “heartbeat” ceases, another node will eventually claim the abandoned job and run it again. Hence, if a server goes offline or the Connect process gets shut down while a scheduled report is running, it is possible that the scheduled job could run twice.

Abandoned Shiny Applications

A Shiny applications depends on a persistent connection to a single server. If the server associated with a particular Shiny application session goes down, the Shiny application will fail. However, simply refreshing the application should result in a new session on an available server, assuming your load balancer detects the failed node and points you to a working one.

Shiny applications that support client-side reconnects using the session$allowReconnect(TRUE) feature will automatically reconnect the Shiny application to a working node. See https://shiny.posit.co/r/articles/improve/reconnecting/