AWS Simple Cloud Storage Service (S3)#
RSPM can also utilize the AWS Simple Cloud Storage Service (S3) as a storage provider. This integration requires AWS credentials and updates to the RSPM configuration file.
As a best practice, AWS recommends that you specify credentials in the following order:
- Use IAM roles for Amazon EC2 (if your application is running on an Amazon EC2 instance). IAM roles provide temporary security credentials to your instance to make AWS calls. IAM roles provide an easy way to distribute and manage credentials on multiple Amazon EC2 instances.
- Use a shared credentials file. This credentials file is the same one used by other SDKs and the AWS CLI. If you’re already using a shared credentials file, you can also use it for this purpose.
- Use environment variables. Setting environment variables is useful if you’re doing development work on a machine other than an Amazon EC2 instance.
If you select IAM roles for Amazon EC2 instances, RSPM will automatically use the instance’s credentials.
See the AWS CLI Configuration for detailed documentation on configuring your environment for interaction with AWS.
The credentials RSPM uses for S3 storage must have the following permissions for the bucket:
For testing with environment variables, create and edit a new file at
; /etc/systemd/system/rstudio-pm.service.d/aws.conf Environment="AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE" Environment="AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" Environment="AWS_DEFAULT_REGION=us-west-2"
Then, reload the
systemd process and restart the RStudio Package Manager service with:
sudo systemctl daemon-reload sudo systemctl start rstudio-pm
On the RSPM side, the
S3Storage sections must be updated. Here is a simple example using
my-s3-bucket, the region
us-east-1, and a shared configuration:
; /etc/rstudio-pm/rstudio-pm.gcfg [Storage] ; Sets all storage classes to use S3 instead of the `DataDir` Default = s3 ; Default S3 settings. This is the minimum-required setting for using S3. [S3Storage] Bucket = my-s3-bucket Region = us-east-1 EnableSharedConfig = true
Users with advanced or specific needs can configure storage classes individually. For example, you could use this configuration if you only wanted to store internal R and CRAN packages in S3 and use local storage for everything else:
; /etc/rstudio-pm/rstudio-pm.gcfg [Storage] Packages = s3 CRAN = s3 [S3Storage] Bucket = my-s3-bucket Region = us-east-1 EnableSharedConfig = true ; Override default S3 settings for the "packages" class. This demonstrates ; all the available S3 configuration settings. [S3Storage "packages"] Bucket = another-s3-bucket Prefix = rspm-packages Profile = dev-rspm Region = us-west-1 EnableSharedConfig = true
For more information on the storage classes, see the appendix.
Store RStudio Packages#
As an optimization, the
RetainFetchedPackages option for the
PyPI sources is
false for S3 configurations. This means that these packages will be downloaded from RStudio and served by RSPM
instead of being stored on S3.
Downloading packages can incur additional ingress fees and slow down downloads of frequently accessed packages. Users can
store packages in S3 by changing the
RetainFetchedPackages options, for example:
; /etc/rstudio-pm/rstudio-pm.gcfg [CRAN] RetainFetchedPackages = true [Bioconductor] RetainFetchedPackages = true [PyPI] RetainFetchedPackages = true
We advise caution when tweaking these settings as some packages can exceed 5GB in size and result in very slow initial download times.
For users with strict security requirements, RSPM supports client-side encryption with S3 and KMS. This setup requires a symmetric KMS key and additional credential permissions:
It also requires including the KMS Key ID in the RSPM configuration file, for example:
; /etc/rstudio-pm/rstudio-pm.gcfg [S3Storage] Bucket = my-s3-bucket Region = us-east-1 EnableSharedConfig = true KMSKeyID = XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
RSPM uses Transport Layer Security (TLS) for all communication with S3. For customers who want additional security, we instead recommend using server-side encryption for Amazon S3 buckets.
Client-side encryption uses the Go implementation for AES/GCM. Due to this, objects to be encrypted or decrypted will be fully loaded into memory before encryption or decryption can occur. Users must allocate additional memory to avoid allocation failures. This will also result in slower upload and download speeds for clients.