Skip to content

AWS Simple Cloud Storage Service (S3)#

RSPM can also utilize the AWS Simple Cloud Storage Service (S3) as a storage provider. This integration requires AWS credentials and updates to the RSPM configuration file.

Credentials#

As a best practice, AWS recommends that you specify credentials in the following order:

  1. Use IAM roles for Amazon EC2 (if your application is running on an Amazon EC2 instance). IAM roles provide temporary security credentials to your instance to make AWS calls. IAM roles provide an easy way to distribute and manage credentials on multiple Amazon EC2 instances.
  2. Use a shared credentials file. This credentials file is the same one used by other SDKs and the AWS CLI. If you’re already using a shared credentials file, you can also use it for this purpose.
  3. Use environment variables. Setting environment variables is useful if you’re doing development work on a machine other than an Amazon EC2 instance.

If you select IAM roles for Amazon EC2 instances, RSPM will automatically use the instance’s credentials.

See the AWS CLI Configuration for detailed documentation on configuring your environment for interaction with AWS.

S3 Permissions#

The credentials RSPM uses for S3 storage must have the following permissions for the bucket:

  • s3:GetObject
  • s3:ListBucket
  • s3:PutObject
  • s3:DeleteObject
  • s3:AbortMultipartUpload

Environment Variables#

For testing with environment variables, create and edit a new file at /etc/systemd/system/rstudio-pm.service.d/aws.conf:

; /etc/systemd/system/rstudio-pm.service.d/aws.conf

Environment="AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE"
Environment="AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
Environment="AWS_DEFAULT_REGION=us-west-2"

Then, reload the systemd process and restart the RStudio Package Manager service with:

Terminal

sudo systemctl daemon-reload
sudo systemctl start rstudio-pm

Configuration#

On the RSPM side, the Storage and S3Storage sections must be updated. Here is a simple example using the bucket my-s3-bucket, the region us-east-1, and a shared configuration:

; /etc/rstudio-pm/rstudio-pm.gcfg

[Storage]
; Sets all storage classes to use S3 instead of the `DataDir`
Default = s3

; Default S3 settings. This is the minimum-required setting for using S3.
[S3Storage]
Bucket = my-s3-bucket
Region = us-east-1
EnableSharedConfig = true

Users with advanced or specific needs can configure storage classes individually. For example, you could use this configuration if you only wanted to store internal R and CRAN packages in S3 and use local storage for everything else:

; /etc/rstudio-pm/rstudio-pm.gcfg

[Storage]
Packages = s3
CRAN = s3

[S3Storage]
Bucket = my-s3-bucket
Region = us-east-1
EnableSharedConfig = true

; Override default S3 settings for the "packages" class. This demonstrates
; all the available S3 configuration settings.
[S3Storage "packages"]
Bucket = another-s3-bucket
Prefix = rspm-packages
Profile = dev-rspm
Region = us-west-1
EnableSharedConfig = true

For more information on the storage classes, see the appendix.

Store RStudio Packages#

As an optimization, the RetainFetchedPackages option for the CRAN, Bioconductor, and PyPI sources is false for S3 configurations. This means that these packages will be downloaded from RStudio and served by RSPM instead of being stored on S3.

Downloading packages can incur additional ingress fees and slow down downloads of frequently accessed packages. Users can store packages in S3 by changing the RetainFetchedPackages options, for example:

; /etc/rstudio-pm/rstudio-pm.gcfg

[CRAN]
RetainFetchedPackages = true

[Bioconductor]
RetainFetchedPackages = true

[PyPI]
RetainFetchedPackages = true

Warning

We advise caution when tweaking these settings as some packages can exceed 5GB in size and result in very slow initial download times.

Client-side Encryption#

For users with strict security requirements, RSPM supports client-side encryption with S3 and KMS. This setup requires a symmetric KMS key and additional credential permissions:

  • kms:Encrypt
  • kms:Decrypt
  • kms:GenerateDataKey

It also requires including the KMS Key ID in the RSPM configuration file, for example:

; /etc/rstudio-pm/rstudio-pm.gcfg

[S3Storage]
Bucket = my-s3-bucket
Region = us-east-1
EnableSharedConfig = true
KMSKeyID = XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

Warning

RSPM uses Transport Layer Security (TLS) for all communication with S3. For customers who want additional security, we instead recommend using server-side encryption for Amazon S3 buckets.

Client-side encryption uses the Go implementation for AES/GCM. Due to this, objects to be encrypted or decrypted will be fully loaded into memory before encryption or decryption can occur. Users must allocate additional memory to avoid allocation failures. This will also result in slower upload and download speeds for clients.