Kubernetes Plugin

The Kubernetes Job Launcher Plugin provides the capability to launch executables on a Kubernetes cluster.

Note

Kubernetes provides a stable, backward-compatible API, and Posit Launcher relies on major features of the API that are unlikely to be removed. New versions of Posit products will remain compatible with previously released versions of Kubernetes that are still supported, and are extremely likely compatible with newer versions as they release.

The most recent compatible version of Kubernetes tested by Posit is 1.27.

Configuration

It is recommended not to change any of the default values and only configure required fields as outlined below.

/etc/rstudio/launcher.kubernetes.conf

Config Option	Description	Required (Y/N)	Default Value
server-user	Service user. The plugin should be started as root, and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself.	N	rstudio-server
thread-pool-size	Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself.	N	Number of CPUs * 2
enable-debug-logging	Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled).	N	0
scratch-path	Scratch directory where the plugin writes temporary state.	N	/var/lib/rstudio-launcher/{name of plugin}
logging-dir	Specifies the path where debug logs should be written.	N	/var/log/rstudio/launcher
job-expiry-hours	Number of hours before completed jobs are removed from the system.	N	24
profile-config	Path to the user and group profiles configuration file (explained in more detail below).	N	/etc/rstudio/launcher.kubernetes.profiles.conf
api-url	The Kubernetes API base URL. This can be an HTTP or HTTPS URL. The URL should be up to, but not including the /api endpoint.	Y	Example: https://192.168.99.100:8443
auth-token-path	The path to a file that contains the auth token for the `job-launcher` service account. This is used to authenticate with the Kubernetes API. See below for more information. Required unless `auth-token` is set.	Y
auth-token	The auth token for the `job-launcher` service account. This is used to authenticate with the Kubernetes API. See below for more information. Required if `auth-token-path` is not set.	N
kubernetes-namespace	The Kubernetes namespace in which to create jobs. Note that the account specified by the `auth-token` setting must have full API privileges within this namespace. See Kubernetes cluster requirements below for more information.	N	rstudio
shared-process-namespace	Use a shared process namespace. This improves behavior when sending termination signals to processes.	N	true
verify-ssl-certs	Whether or not to verify SSL certificates when connecting to `api-url`. Only applicable if connecting over HTTPS. Do not disable this option in production use.	N	1
certificate-authority	Certificate authority to use when connecting to Kubernetes over SSL and when verifying SSL certificates. This must be the Base64-encoded PEM certificate reported by Kubernetes as the certificate authority in use. Leave this blank to use the system root CA store.	N
watch-timeout-seconds	Number of seconds before the watch calls to Kubernetes stop. It is strongly recommended to not change this value unless instructed by Posit support.	N	180
fetch-limit	The maximum amount of objects to request per API call from the Kubernetes Service for GET collection requests. It is recommended you only change the default if you run into size issues with the returned payloads.	N	500
use-templating	Enables the new Kubernetes object templating feature (see Kubernetes object templating below). When enabled, any configured `job-json-overrides` are ignored.	N	0

In order to generate the contents for the file pointed to by auth-token-path (or the value for auth-token), run the following commands. Note that the account must first be created and given appropriate permissions (see Kubernetes cluster requirements below). The file pointed to by auth-token-path must be owned by the account configured as your server-user (usually rstudio-server).

KUBERNETES_AUTH_SECRET=$(kubectl get serviceaccount job-launcher --namespace=rstudio -o jsonpath='{.secrets[0].name}')

# Write token to file. This file must be owned by the `server-user` (usually `rstudio-server`).
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d > /etc/rstudio/kubernetes.launcher.token
chmod 0600 /etc/rstudio/kubernetes.launcher.token
sudo chown rstudio-server /etc/rstudio/kubernetes.launcher.token

# Print token for copy/paste.
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d

Kubernetes container auto configuration

If you are running the Launcher within a Kubernetes container, a few configuration variables can be inferred automatically by using Kubernetes-injected environment variables and files. These values are automatically added by Kubernetes when a container is launched. Therefore, it is not required to configure these options when running the Launcher within Kubernetes.

Config Option	Obtained From
api-url	`https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}`
auth-token	`/var/run/secrets/kubernetes.io/serviceaccount/token`
certificate-authority	Base64-encoded value of `/var/run/secrets/kubernetes.io/serviceaccount/ca.crt`

User and group profiles

The Kubernetes plugin also allows you to specify user and group configuration profiles, similar to Posit Workbench’s profiles, in the configuration file /etc/rstudio/launcher.kubernetes.profiles.conf (or any arbitrary file as specified in profile-config within the main configuration file; see above). These are entirely optional.

Profiles are divided into sections of three different types:

Global ([*])
Per-group ([@groupname])
Per-user ([username])

Here is an example profiles file that illustrates each of these types:

/etc/rstudio/launcher.kubernetes.profiles.conf

[*]
placement-constraints=node,region:us,region:eu
default-cpus=1
default-cpus-request=0.5
default-mem-mb=512
default-mem-mb-request=256
max-cpus=2
max-mem-mb=1024
container-images=r-session:3.4.2,r-session:3.5.0
allow-unknown-images=0

[@posit-power-users]
default-cpus=4
default-mem-mb=4096
default-nvidia-gpus=0
default-amd-gpus=0
max-nvidia-gpus=2
max-amd-gpus=3
max-cpus=20
max-mem-mb=20480
container-images=r-session:3.4.2,r-session:3.5.0,r-session:preview
allow-unknown-images=1

[jsmith]
max-cpus=3

This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory, and use only two different R containers. It also specifies that members of the posit-power-users group will be allowed to use much more resources, including GPUs, and the ability to see the r-session:preview image, in addition to being able to run any image they specify.

Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below.

/etc/rstudio/launcher.kubernetes.profiles.conf

Config Option	Description	Required (Y/N)	Default Value
container-images	Comma-separated string of allowed images that users may see and run.	N
default-container-image	The default container image to use for the Job if none is specified.	N
allow-unknown-images	Whether or not to allow users to run any image they want within their job containers, or if they have to use the ones specified in `container-images`	N	1
placement-constraints	Comma-separated string of available placement constraints in the form of `key1:value1,key2:value2,...` where the `:value` part is optional to indicate free-form fields. See next section for more details	N
default-cpus	Number of CPUs available to a job by default if not specified by the job.	N	0.0 (infinite - managed by Kubernetes)
default-cpus-request	Number of CPUs requested to be available to a job by default if not specified by the job. Corresponds to the Kubernetes CPU Request.	N	0.0 (not specified - managed by Kubernetes)
default-mem-mb	Number of MB of RAM available to a job by default if not specified by the job. Corresponds to the Kubernetes Memory Limit.	N	0.0 (infinite - managed by Kubernetes)
default-mem-mb-request	Number of MB of RAM requested to be available to a job by default if not specified by the job. Corresponds to the Kubernetes Memory Request.	N	0.0 (not specified - managed by Kubernetes)
max-cpus	Maximum number of CPUs available to a job. Corresponds to the Kubernetes CPU Limit.	N	0.0 (infinite - managed by Kubernetes)
max-cpus-request	Maximum number of CPUs that can be requested. Corresponds to the Kubernetes CPU Request.	N	0.0 (not specified - managed by Kubernetes)
max-mem-mb	Maximum number of MB of RAM available to a job. Corresponds to the Kubernetes Memory Limit.	N	0.0 (infinite - managed by Kubernetes)
max-mem-mb-request	Maximum number of MB of RAM that can be requested. Corresponds to the Kubernetes Memory Request.	N	0.0 (not speified - managed by Kubernetes)
job-json-overrides	JSON path overrides of the generated Kubernetes Job JSON. See [Modifying jobs].	N
cpu-request-ratio	Ratio within the range (0.0, 1.0] representing the Kubernetes container resource `request` to set for the CPU. This will be the ratio of the `limit` amount specified by the user when creating the job if no request was specified and no default was determined via profile settings.	N	1.0
memory-request-ratio	Ratio within the range (0.0, 1.0] representing the Kubernetes container resource `request` to set for the memory. This will be the ratio of the `limit` amount specified by the user when creating the job if no request was specified and no default was determined via profile settings.	N	1.0
default-nvidia-gpus	Number of NVIDIA GPUs available to a job by default if not specified by the job. See below for more information.	N	0
default-amd-gpus	Number of AMD GPUs available to a job by default if not specified by the job. See below for more information.	N	0
max-nvidia-gpus	Maximum number of NVIDIA GPUs available to a job. See below for more information.	N	0
max-amd-gpus	Maximum number of AMD GPUs available to a job. See below for more information.	N	0
resource-profiles	Available resource profiles. See Resource profiles.	N
allow-custom-resources	Whether jobs can use the `custom` resource profile. See Resource profiles.	N	1

Note that resource limits correspond to the Kubernetes container resource limits, which represent hard caps for the resources a job can use. Kubernetes allows jobs to request less resources and occasionally burst up to the limit amount, and this can be controlled by setting the cpu-request-ratio and memory-request-ratio settings as detailed above. Note that resource management in Kubernetes is a complex topic, and in general you should simply leave these to the default value of 1.0 unless you understand the implications of using both requests and limits. See here for more information.

In order to provide GPUs as a schedulable resource, you must first enable the feature in Kubernetes by installing the necessary GPU drivers and device plugins supplied by your desired vendor (AMD or NVIDIA). Once available in Kubernetes, simply set the desired default and max values for the GPU type you intend to use. If not using GPUs, no GPU configuration in the profiles is necessary. For information on adding support for GPUs in Kubernetes, see the Kubernetes documentation.

Resource profiles

Resource profiles greatly simplify the task of assigning CPU, memory, or GPU resources for a job (provided that GPUs are available). They are configured in the optional /etc/rstudio/launcher.kubernetes.resources.conf file. For example:

/etc/rstudio/launcher.kubernetes.resources.conf

[default]
name = "Default" # optional, derived from the section name when absent
cpus=1
mem-mb=4096

[small]
cpus=1
mem-mb=512

[default-gpu]
cpus=1
mem-mb=4096
nvidia-gpus=1
amd-gpus=0

[hugemem]
name = "Huge Memory"
cpus=8
mem-mb=262144

By default, all profiles are available to all users, and jobs can also use a special custom profile to specify CPU, memory, and GPU resources directly instead. However, users are still subject to the constraints in User and group profiles, and administrators may also limit access to individual resource profiles with that configuration file.

For example, suppose an admin wants to restrict the resource profiles above such that (1) GPUs and large memory jobs are only available to users in the bioinformatics group; and (2) only users in the posit-power-users group can use the custom resource profile to set their own resources directly. This might result in the following /etc/rstudio/launcher.kubernetes.profiles.conf file:

/etc/rstudio/launcher.kubernetes.profiles.conf

[*]
resource-profiles=default,small
allow-custom-resources=0

[@bioinformatics]
resource-profiles=default,small,default-gpu,hugemem

[@posit-power-users]
resource-profiles=default,small,default-gpu,hugemem
allow-custom-resources=1

The settings available in each section of the /etc/rstudio/launcher.kubernetes.resources.conf file are described in more depth in the table below:

/etc/rstudio/launcher.kubernetes.resources.conf

Config Option	Description	Required (Y/N)	Default Value
name	A user-friendly name for the profile, e.g. `Default (1 CPU, 4G mem)` or `m4.xlarge`.	N	The section title
cpus	The CPU limit.	Y
cpus-request	The CPU request.	N
mem-mb	The memory limit, in megabytes.	Y
mem-mb-request	The memory request, in megabytes.	N
nvidia-gpus	Number of NVIDIA GPUs, if supported by the cluster.	N	0
amd-gpus	Number of AMD GPUs, if supported by the cluster.	N	0
placement-constraints	Any placement constraints, as a comma-separated list of the form `key1:value1,key2:value2`.	N

Kubernetes object templating

The new preferred method for modifying objects (jobs and services) submitted to Kubernetes is to use the new templating feature. This allows templating of the entire YAML payload that is submitted to the Kubernetes API using a syntax similar to Helm charts. This new method is easier to use than the previous job-json-overrides functionality, and allows for the use of conditional logic to make job modification dynamic, as opposed to the static transformations offered by job-json-overrides.

To use the templating feature, enable it in /etc/rstudio/launcher.kubernetes.conf:

/etc/rstudio/launcher.kubernetes.conf

use-templating=1

Launcher supports templating of these Kubernetes Resource types

Resource	API Docs
Job	Job API
Service	Service API

Generating templates

After restarting the Launcher, job and service templates will automatically be written to the Kubernetes Launcher scratch-path (/var/lib/rstudio-launcher/Kubernetes by default), and these templates will be used to to create the jobs and services that are submitted to Kubernetes when starting Launcher jobs. These templates will only be created if they do not already exist to ensure that any changes made are not overwritten.

You can also generate the templates via a command instead of having to first run the Launcher to create them. This can be done with the --generate-templates command:

sudo /path/to/rstudio-kubernetes-launcher --generate-templates

Note that the templates will be created in the scratch-path as mentioned above. Only the templates found in the scratch-path will be used for templating. The scratch-path is determined by parsing the launcher.conf and launcher.kubernetes.conf files. If these files do not yet exist, you can specify the scratch-path by adding --scratch-path <path> to the --generate-templates command.

Modifying templates

Once the templates have been generated, you can modify them as necessary for your Kubernetes cluster. For example, you can add additional annotations to jobs, add Linux capabilities to the container, or even run a side-car container.

To modify the templates, edit the job.tpl and service.tpl files that were previously generated within the scratch-path. You can add additional fields to the job, but it is strongly recommended that you do not delete fields from the template. Doing so may cause your jobs to fail to launch properly, or cause unintended subtle issues with Job Launcher functionality.

Changes to the template require either a restart of the Launcher or a SIGHUP signal. The SIGHUP signal can be sent to the process to cause the templates to be reloaded during run-time. Note that if the new changes are not valid, the original templates will continue to be used until valid changes are reloaded.

sudo kill -s SIGHUP $(pidof rstudio-kubernetes-launcher)

The object templates are modeled after Helm charts and have identical syntax. Like Helm charts, conditional logic, loops, and various functions are supported via the Sprig library. In contrast to Helm, there are no Values files and no directory structure for templates - all templates are simply stored in the scratch-path. The only values available for use in the template are available on the .Job object which represents the job being submitted via the Launcher.

All templates must start with a comment indicating the version of the template in use, which must be compatible with the version required by the Launcher. Starting with Launcher version 2.6.0, this version follows Semantic Versioning. This is to ensure that your templates are up-to-date with what is needed by the Launcher (see Examples for details). Launcher will fail to start or load a template that does not satisfy the required version. The expected version comment is generated when invoking the --generate-templates command.

The following fields are available on the .Job object:

Field	Type	Description
id	string	The unique ID for the job. Only set for service objects.
cluster	string	The name of the Launcher cluster.
generateName	string	Used for generating unique object names within Kubernetes to ensure object names do not collide.
name	string	The name of the Job.
user	string	The submitting user of the Job.
workingDirectory	string	The working directory for the job command being executed.
container	map	The container that will run the job command.
container.image	string	The Docker image of the container.
container.runAsUser	int	The UID of the container. May be nil if the default of 0 (root) is used.
container.runAsGroup	int	The GID of the container. May be nil if the default of 0 (root) is used.
container.supplementalGroupIds	int array	An array of supplemental UIDs for the container user. May be empty.
host	string	The desired host to run the Job on. Usually empty.
command	string	The Job command to execute. If empty, `exe` will be specified.
exe	string	The executable to execute. If empty, `command` will be specified.
stdin	string	The Job’s stdin.
args	string array	The arguments for the command/executable to be run.
placementConstraints	map array	The placement constraints to be used for deciding where the job should be run.
placementConstraint.name	string	The name of the Placement Constraint. Example: `availability-zone`, `region`, etc.
placementConstraint.value	string	The value of the Placement Constraint. Example: `us-east-2`, `m3xlarge`, etc.
exposedPorts	map array	The ports that should be exposed/opened for the job.
exposedPort.protocol	string	The protocol for the port (TCP or UDP).
exposedPort.targetPort	string	The port within the container to expose/open.
metadata	map	The arbitrary JSON object that was provided via the `metadata` field. Used to apply overrides to annotations, labels, init containers, etc.
mounts	map array	The requested mounts for the job. This is generally complicated to work with to transform into Kubernetes objects, so `volumes` and `volumeMounts` are provided.
mount.mountPath	string	The path where the mount should be mounted to.
mount.readOnly	bool	Whether or not the mount should be read-only.
mount.mountSource	map	The description of the mount itself.
mount.mountSource.type	string	The type of the mount (e.g. host, nfs, etc.).
mount.mountSource.source	map	The underlying description of the type of the mount. Varies by mount type.
config	map array	Job-specific config unique to the Kubernetes Launcher. Currently used to specify secret env vars.
config.name	string	The name of the Job config.
config.value	string	The value of the Job config.
resourceLimits	map array	The resource limits to be used for the Job.
resourceLimit.type	string	The type of the resource limit (memory, cpuCount, NVIDIA GPUs, or AMD GPUs).
resourceLimit.value	string	The value of the resource limit.
volumes	map array	The Kubernetes volumes that should be mounted. This is constructed from the requested Job `mounts`.
volume.name	string	The unique name of the volume.
volume.?	map	The sub-object describing the volume. This varies based on the type of the volume being mounted.
volumeMounts	map array	The Kubernetes volume mounts describing how volumes should be mounted. This is constructed from the requested Job `mounts`.
volumeMount.name	string	The name of the volume to be mounted. Must match a volume.name.
volumeMount.mountPath	string	The path where the mount should be mounted to.
volumeMount.readOnly	bool	Whether or not the mount should be read-only.
tags	string array	The tags for the job.
servicePortsJson	string	Used by the Launcher to ensure that services are created properly after a restart.
shareProcessNamespace	bool	Reflects the current `sharedProcessNamespace` configuration setting to control if a container should use shared process namespacing.
memoryRequestRatio	decimal	Reflects the `memoryRequestRatio` as specified in the User/Group profiles. Deprecated - do not use.
cpuRequestRatio	decimal	Reflects the `cpuRequestRatio` as specified in the User/Group profiles. Deprecated - do not use.
serviceAccountName	string	The Kubernetes Service Account specified for the Job Pods, if any.

The following template functions are available for use in addition to the previously mentioned Sprig template functions:

Name	Description	Example
include	Renders the specified template file (other than `job.tpl` and `service.tpl`) with the specified values and returns the result.	{{ include “custom.tpl” . }}
toYaml	Renders the specified object as YAML.	{{ toYaml .Job.volumes }}
exec	Executes the specified command or executable with the given arguments, returning an `ExecResult` type, which if rendered directly returns the stdout for the process. See Examples below.	{{ exec “echo” “Hello, world” }}
groups	Returns a list of groups for the specified user by shelling out to the `groups` Linux command.	{{ groups .Job.user }}

To see what has been modified in the template files, you can use the --diff-templates command. This will show the output of the diff Linux command comparing the changes that you’ve made to the original templates generated with the --generate-templates command. Note that the diff command must be present on the PATH in order to use this functionality.

sudo /path/to/rstudio-kubernetes-launcher --diff-templates

Examples

Adding host aliases to the job

Host Aliases are defined on spec.template.spec.

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
  backoffLimit: 0
  template:
    (...omitted for brevity...)
    spec:
      hostAliases:
        - ip: "10.2.141.12",
          hostnames: ["db01"]
        - ip: "10.2.141.13",
          hostnames: ["db02"]

Adding Linux capabilities to the job container

Capabilities are defined on spec.template.spec.securityContext. The securityContext is conditionally constructed and is not present for all jobs, so you need to modify the template so that it is always constructed with your desired capabilities.

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
  backoffLimit: 0
  template:
    (...omitted for brevity...)
    spec:
      (...omitted for brevity...)
      securityContext:
        {{- if $securityContext }}
        {{- range $key, $val := $securityContext }}
        {{ $key }}: {{ $val }}
        {{- end }}
        capabilities:
          add: ["NET_ADMIN", "SYS_PTRACE"]

Adding an ImagePullSecret

imagePullSecrets are defined on spec.template.spec.

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
  backoffLimit: 0
  template:
    (...omitted for brevity...)
    spec:
      imagePullSecrets:
        - name: mysecret

Custom annotation with dynamic execution

It can be useful to annotate jobs with special annotations depending on certain business logic. This business logic can be encapsulated in a command or executable that you run that decides the value of the annotation. The following example shows what it might look like to fetch the organizational cost center for a user given their user name and a custom application maintained by the business.

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        my.org/costCenter: {{ exec "get-user-details" "--cost-center" .Job.user }}

The hypothetical get-user-details command would then write the user’s cost center to stdout when invoked, which would cause it to be stamped as an annotation on the job as my.org/costCenter which could be used with various Kubernetes reporting tools.

Using the exec function, it is also possible to obtain more information about the result of the invoked process from the ExecResult return value:

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        {{- $res := exec "get-user-details" "--cost-center" .Job.user }}
        {{- if eq $res.ExitCode 1 }}
        my.org/costCenter: "UNKNOWN: {{ $res.Err }}"
        {{- else }}
        my.org/costCenter: {{ $res.Stdout }}
        {{- end }}

The following fields are available on an ExecResult:

Field	Type	Description
ExitCode	int	The exit code of the process. If the process encountered an error before running or exited due to signal, this is set to -1.
Err	error	The error that occurred while running the process, if any. May be nil.
Stdout	string	The stdout of the process.
Stderr	string	The stderr of the process.

Add annotation for group members

In this example, we use the groups template function to determine if the job user belongs to the admin-grp group, adding an annotation if so.

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        {{- $grp := groups .Job.user }}
        {{- if has "admin-grp" $grp }}
        my.org/admin: "true"
        {{- end }}

IAM permissions with AWS roles for EKS service accounts

When running the Launcher in an EKS cluster within AWS, you can setup IAM roles to be assumed by running Launcher jobs. The setup within AWS requires the following:

Enabling the OIDC provider for your EKS cluster.
Various IAM roles which can be assumed. Each of these roles will be associated with a specific Kubernetes service account.
Various Kubernetes service accounts, one for each IAM role you want assumable by Launcher jobs.

For details on how to create these within AWS, see the AWS EKS Documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs, thus allowing EKS to automatically associate AWS credentials with the started pods to allow your users to access IAM-protected resources.

For example, let’s assume you have written a script called get-kube-svc-account that maintains a many-to-one mapping of users to Kubernetes service accounts (which implies a mapping of users to IAM roles). We can modify the job template to stamp the correct service account on the running pod by invoking the script like so.

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        (...omitted for brevity...)
      generateName: {{ toYaml .Job.generateName }}
    spec:
      {{- $res := exec "get-kube-svc-acct" .Job.user }}
      {{- if eq $res.ExitCode 0 }}
      serviceAccountName: {{ $res.stdOut }}
      {{- end }}
      (...omitted for brevity...)
      {{- $res := exec "id" "-g" .Job.user }}
      {{- if eq $res.ExitCode 0 }}
      {{- $_ := set $securityContext "fsGroup" $res.Stdout }}
      {{- end }}
      {{- if $securityContext }}
      securityContext:
        {{- range $key, $val := $securityContext }}
        {{ $key }}: {{ $val }}
        {{- end }}
      {{- end }}

Important

The fsGroup parameter is required to ensure that the secret IAM temporary credential files are able to be read by the user running within the pod. Without this, only the root user would be able to read these files, preventing your user from accessing AWS resources. Care must be taken to ensure that the running user *actually* belongs to the fsGroup, as this group ID is added to the running pod user. A mistake here could inadvertently give the user access to other files unintentionally, such as any shared files that are mounted with a volume.

Though a hypothetical mapping script was used for this example, you could use a more robust approach, such as calling out to a service that maintains a mapping, or by managing the mappings in LDAP or some other user database.

Azure AD tokens for AKS service accounts (Azure Workload Identity)

Similar to the previous example on AWS IAM roles, Azure provides Azure Workload Identity to allow access of Azure resources within AKS pods. The setup within Azure requires the following:

Enabling the OIDC provider for your AKS cluster.
Installing the Mutating Admission Webhook within the cluster.
Creation of Azure resources that you will need to access within AKS. Each resource must specify the service principal of a specific Azure Active Directory application which will be tied to an AKS service account that will be used when running Launcher jobs.
Various AKS service accounts that have been associated with the aforementioned Azure Active Directory application.

For details on how to create these within Azure, see the Azure Workload Identity documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs (similar to the previous example), thus allowing your users access to your Azure resources within Launcher jobs. Use the example in the AWS IAM section to start your Launcher jobs with the desired AKS service account.

IAM permissions with GKE (GKE Workload Identity)

GKE also allows access to IAM-protected resources within Google Cloud. The setup within Google Cloud requires the following:

Enabling Workload Identity on your GKE cluster.
Enabling Workload Identity on your GKE node pool.
IAM service account(s) that has associated IAM roles that provide access to the Google Cloud resources you want available in your Launcher jobs.
Various GKE service accounts, one for each IAM service account that provides the previously mapped IAM roles to the account.

For details on how to create these within Google Cloud, see the GKE Workload Identity documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs (similar to the previous examples), thus allowing your users access to your Google Cloud resources within Launcher jobs. Use the example in the AWS IAM section to start your Launcher jobs with the desired GKE service account.

You will also need to add a nodeSelector to the job pod (via the job.tpl file) to ensure that the GKE metadata server is accessible from the node(s) that will be running your jobs.

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        (...omitted for brevity...)
      generateName: {{ toYaml .Job.generateName }}
    spec:
      nodeSelector:
        iam.gke.io/gke-metadata-server-enabled: "true"

IAM permissions with kiam (deprecated)

You can extend the above example to include IAM roles per group using kiam. The following example will set the users in the group admin-grp to an existing IAM role (rs-admin-role). All other users are given a different IAM role (rs-user-role).

job.tpl

# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        {{- $grp := groups .Job.user }}
        {{- if has "admin-grp" $grp }}
        my.org/admin: "true"
        iam.amazonaws.com/role: "rs-admin-role"
        {{- else }}
        iam.amazonaws.com/role: "rs-user-role"
        {{- end }}

Change the Kubernetes Service type

Launcher supports configuring a Kubernetes Service with a type of ClusterIP, LoadBalancer, or NodePort. Launcher will create a Kubernetes Service only when a job definition includes one or more exposedPorts. Please reference the Launcher API documentation for further details.

To make the Service only be reachable from within the cluster set the type field to ClusterIP as shown in the following example.

service.tpl

# Version: 2.2.0
apiVersion: v1
kind: Service
metadata:
  (...omitted for brevity...)
spec:
  (...omitted for brevity...)
  clusterIP: ''
  type: ClusterIP

On cloud providers which support external load balancers, setting type field to LoadBalancer provisions a load balancer for your Service. Consult the documentation from your cloud provider for details on support for additional parameters (e.g. loadBalancerIP or loadBalancerClass).

Important

Using the LoadBalancer service type may result in new infrastructure being provisioned by your cloud provider. Your cloud provider will charge you for any infrastructure that is provisioned.

service.tpl

# Version: 2.2.0
apiVersion: v1
kind: Service
metadata:
  (...omitted for brevity...)
spec:
  (...omitted for brevity...)
  clusterIP: ''
  type: LoadBalancer

AWS Fargate

Launcher supports running jobs on AWS Fargate. Please review the AWS Fargate Considerations and the Fargate pod configuration to understand the limits of scheduling pods.

Note

Pods scheduled on AWS Fargate may take 10 minutes or longer to start. Using a larger instance and a smaller container image can result in a faster startup time.

Create one or more Fargate Profiles.
Configure the Launcher Kubernetes Plugin for one or more Launcher Clusters
Modify the Kubernetes Templates for the relevant Launcher Cluster(s)
- The Kubernetes Service type must be one of ClusterIP or LoadBalancer. Fargate does not support NodePort.
- If your Fargate Profile specifies Kubernetes labels to match, you can use placement constraints to conditionally add these labels to the job template.
  
  Configure a placement constraint in the appropriate profiles configuration file
  
  launcher.kubernetes.profiles.conf
```
[*]
placement-constraints=eks.amazonaws.com/compute-type:fargate
```
  The matching label(s) need to be added to the pod metadata (spec.template.metadata.labels). In this example, the Fargate profile is configured to match the label node-type=fargate.
```
service.tpl
```
```
spec:
  (...omitted for brevity...)
  template:
    (...omitted for brevity...)
    metadata:
      (...omitted for brevity...)
      labels:
        (...potential additional labels...)
        {{- range .Job.placementConstraints -}}
          {{- if and (eq .name "eks.amazonaws.com/compute-type") (eq .value "fargate")}}
        node-type: fargate
          {{- end }}
        {{- end }}
```

Important

AWS Fargate schedules each Pod within a Virtual Machine (VM). This VM will not be terminated until the Kubernetes Job is deleted.

AWS will charge you for these EC2 instances until the Jobs are deleted. To ensure these nodes are cleaned up promptly, lower the value of job-expiry-hours in the relevant plugin configuration file.

To clean up Launcher jobs immediately after completion use the following configuration, which will terminate the associated EC2 instance when the Kubernetes Job is removed.

job-expiry-hours=0

Validating templates

Once changes have been made to templates, it is recommended run the --validate-templates command to ensure that the changes are valid YAML and that no important Job Launcher pieces have been tampered with inadvertently. This command generates a test Job payload and renders the templates in the scratch-path.

sudo /path/to/rstudio-kubernetes-launcher --validate-templates

If any errors are found during validation, they will be reported so that the templates can be modified to fix them. The validator tests for the following issues:

Is the template itself valid? Do all functions exist and is all syntax correct?
Is the rendered template YAML valid?
Is there any extra white space on the YAML object?
Are all pieces of the object needed by the Job Launcher available and of the correct type?
Are all required hard-coded values set correctly?

Note

Errors produced during validation are strongly indicative of issues that should be addressed, but the Launcher does not require validation to complete without errors before using the templates.

It is also sometimes desirable to inspect the actual YAML that is generated after templating. This can be done by adding --verbose to the --validate-templates command detailed above.

JSON overrides

Whenever a job is submitted to the Kubernetes Launcher plugin, a JSON job object is generated and sent to Kubernetes. In some cases, it may be desirable to add or modify fields within this automatically generated JSON blob.

In order to do that, you may specify job-json-overrides within the profiles file. The form of the value should be "{json path}":"{path to json value file}","{json path 2}":"{path to json value file 2}",....

The JSON path should be a valid JSON path pointer as specified in the JSON Pointer RFC.

The JSON value path specified must be a file readable by the service user, and must contain valid JSON. For example, to add Host Aliases to all submitted jobs:

/etc/rstudio/launcher.kubernetes.profiles.conf

job-json-overrides="/spec/template/spec/hostAliases":"/etc/rstudio/kube-host-aliases"

/etc/rstudio/kube-host-aliases

[
  {
    "ip": "10.2.141.12",
    "hostnames": ["db01"]
  },
  {
    "ip": "10.2.141.13",
    "hostnames": ["db02"]
  }
]

Because the pod itself is nested within the Kubernetes Job object, it is located at the path /spec/template/spec. In the example above, we simply add a JSON object representing the HostAlias array as defined by the Kubernetes API. See the Kubernetes API Documentation for an exhaustive list of fields that can be set.

Any job-json-overrides-specified fields will overwrite already existing fields in the auto-generated job spec. Note that the Kubernetes Launcher plugin requires certain fields to be set in order to properly parse saved job data. It is strongly recommended you use the job-json-overrides feature sparingly, and only use it to add additional fields to the automatically generated job object when necessary.

Kubernetes cluster requirements

In order for the Kubernetes plugin to run correctly, the following assumptions about the Kubernetes cluster must be true:

The Kubernetes API must be enabled and reachable from the machine running the Job Launcher
There must be a namespace to create jobs in, which can be specified via the kubernetes-namespace configuration mentioned above (this defaults to rstudio)
There must be a service account that has full API access for all endpoints and API groups underneath the aforementioned namespace, and the account’s auth token must be supplied to the plugin via the auth-token setting
The service account must have access to view the nodes list via the API (optional, but will restrict IP addresses returned for a job to the internal IP if not properly configured, as /nodes is needed to fetch a node’s external IP address)
The cluster must have the metrics-server add-on running and working properly to provide job resource utilization streaming

In order to use placement constraints, you must attach labels to the node that match the given configured placement constraints. For example, if you have a node with the label az=us-east and have a placement constraint defined az:us-east, incoming jobs specified with the az:us-east placement constraint will be routed to the desired node. For more information on Kubernetes’ placement constraints, see here.

The following sample script can be run to create a job-launcher service account and rstudio namespace, granting the service account (and thus, the launcher) full API access to manage Workbench jobs:

kubectl create namespace rstudio
kubectl create serviceaccount job-launcher --namespace rstudio
cat > job-launcher-role.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: job-launcher-role
  namespace: rstudio
rules:
  - apiGroups:
      - ""
    resources:
      - "pods"
      - "pods/log"
      - "pods/attach"
      - "pods/exec"
    verbs:
      - "get"
      - "create"
      - "update"
      - "patch"
      - "watch"
      - "list"
      - "delete"
  - apiGroups:
      - ""
    resources:
      - "events"
    verbs:
      - "watch"
  - apiGroups:
      - ""
    resources:
      - "services"
    verbs:
      - "create"
      - "get"
      - "watch"
      - "list"
      - "delete"
  - apiGroups:
      - "batch"
    resources:
      - "jobs"
    verbs:
      - "create"
      - "update"
      - "patch"
      - "get"
      - "watch"
      - "list"
      - "delete"
  - apiGroups:
      - "metrics.k8s.io"
    resources:
      - "pods"
    verbs:
      - "get"
  - apiGroups:
      - ""
    resources:
      - "serviceaccounts"
    verbs:
      - "list"
EOF
kubectl create -f job-launcher-role.yaml; rm job-launcher-role.yaml
kubectl create rolebinding job-launcher-role-binding --namespace rstudio \
   --role=job-launcher-role \
   --serviceaccount=rstudio:job-launcher
kubectl create clusterrole job-launcher-clusters \
   --verb=get,watch,list \
   --resource=nodes
kubectl create clusterrolebinding job-launcher-list-clusters \
  --clusterrole=job-launcher-clusters \
  --group=system:serviceaccounts:rstudio

It should be noted that the ClusterRole created above is only used to get information about the nodes in the cluster that can run Launcher jobs. This is sometimes necessary to ensure that the Launcher can determine all of the IP addresses that belong to the node to ensure that they are reported properly to upstream clients of the Launcher. This ensures that external clients can connect to their Launcher jobs as required. If you ensure that all clients are connecting to the Launcher internally within the same network segment (meaning that clients can connect to the internal IP address of the Kubernetes node), you can forego the ClusterRole and ClusterRoleBinding. If you run into problems where clients cannot connect to their Launcher jobs, you may need the external IP address of the node(s) in your cluster, which will require the ClusterRole to be given to the Launcher service account.