Kubernetes Plugin

Workbench | Advanced

The Kubernetes Job Launcher Plugin provides the capability to launch executables on a Kubernetes cluster.

Note

Kubernetes provides a stable, backward-compatible API, and Posit Launcher relies on major features of the API that are unlikely to be removed. New versions of Posit products will remain compatible with previously released versions of Kubernetes that are still supported, and are extremely likely compatible with newer versions as they release.

The most recent compatible version of Kubernetes tested by Posit is 1.27.

Configuration

It is recommended not to change any of the default values and only configure required fields as outlined below.

/etc/rstudio/launcher.kubernetes.conf

Config Option Description Required (Y/N) Default Value
server-user Service user. The plugin should be started as root, and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself. N rstudio-server
thread-pool-size Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself. N Number of CPUs * 2
enable-debug-logging Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). N 0
scratch-path Scratch directory where the plugin writes temporary state. N /var/lib/rstudio-launcher/{name of plugin}
logging-dir Specifies the path where debug logs should be written. N /var/log/rstudio/launcher
job-expiry-hours Number of hours before completed jobs are removed from the system. N 24
profile-config Path to the user and group profiles configuration file (explained in more detail below). N /etc/rstudio/launcher.kubernetes.profiles.conf
api-url The Kubernetes API base URL. This can be an HTTP or HTTPS URL. The URL should be up to, but not including the /api endpoint. Y Example: https://192.168.99.100:8443
auth-token-path The path to a file that contains the auth token for the job-launcher service account. This is used to authenticate with the Kubernetes API. See below for more information. Required unless auth-token is set. Y
auth-token The auth token for the job-launcher service account. This is used to authenticate with the Kubernetes API. See below for more information. Required if auth-token-path is not set. N
kubernetes-namespace The Kubernetes namespace in which to create jobs. Note that the account specified by the auth-token setting must have full API privileges within this namespace. See Kubernetes cluster requirements below for more information. N rstudio
shared-process-namespace Use a shared process namespace. This improves behavior when sending termination signals to processes. N true
verify-ssl-certs Whether or not to verify SSL certificates when connecting to api-url. Only applicable if connecting over HTTPS. Do not disable this option in production use. N 1
certificate-authority Certificate authority to use when connecting to Kubernetes over SSL and when verifying SSL certificates. This must be the Base64-encoded PEM certificate reported by Kubernetes as the certificate authority in use. Leave this blank to use the system root CA store. N
watch-timeout-seconds Number of seconds before the watch calls to Kubernetes stop. It is strongly recommended to not change this value unless instructed by Posit support. N 180
fetch-limit The maximum amount of objects to request per API call from the Kubernetes Service for GET collection requests. It is recommended you only change the default if you run into size issues with the returned payloads. N 500
use-templating Enables the new Kubernetes object templating feature (see Kubernetes object templating below). When enabled, any configured job-json-overrides are ignored. N 0

In order to generate the contents for the file pointed to by auth-token-path (or the value for auth-token), run the following commands. Note that the account must first be created and given appropriate permissions (see Kubernetes cluster requirements below). The file pointed to by auth-token-path must be owned by the account configured as your server-user (usually rstudio-server).

KUBERNETES_AUTH_SECRET=$(kubectl get serviceaccount job-launcher --namespace=rstudio -o jsonpath='{.secrets[0].name}')

# Write token to file. This file must be owned by the `server-user` (usually `rstudio-server`).
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d > /etc/rstudio/kubernetes.launcher.token
chmod 0600 /etc/rstudio/kubernetes.launcher.token
sudo chown rstudio-server /etc/rstudio/kubernetes.launcher.token

# Print token for copy/paste.
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d

Kubernetes container auto configuration

If you are running the Launcher within a Kubernetes container, a few configuration variables can be inferred automatically by using Kubernetes-injected environment variables and files. These values are automatically added by Kubernetes when a container is launched. Therefore, it is not required to configure these options when running the Launcher within Kubernetes.

Config Option Obtained From
api-url https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}
auth-token /var/run/secrets/kubernetes.io/serviceaccount/token
certificate-authority Base64-encoded value of /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

User and group profiles

The Kubernetes plugin also allows you to specify user and group configuration profiles, similar to Posit Workbench’s profiles, in the configuration file /etc/rstudio/launcher.kubernetes.profiles.conf (or any arbitrary file as specified in profile-config within the main configuration file; see above). These are entirely optional.

Profiles are divided into sections of three different types:

  • Global ([*])

  • Per-group ([@groupname])

  • Per-user ([username])

Here is an example profiles file that illustrates each of these types:

/etc/rstudio/launcher.kubernetes.profiles.conf
[*]
placement-constraints=node,region:us,region:eu
default-cpus=1
default-cpus-request=0.5
default-mem-mb=512
default-mem-mb-request=256
max-cpus=2
max-mem-mb=1024
container-images=r-session:3.4.2,r-session:3.5.0
allow-unknown-images=0

[@posit-power-users]
default-cpus=4
default-mem-mb=4096
default-nvidia-gpus=0
default-amd-gpus=0
max-nvidia-gpus=2
max-amd-gpus=3
max-cpus=20
max-mem-mb=20480
container-images=r-session:3.4.2,r-session:3.5.0,r-session:preview
allow-unknown-images=1

[jsmith]
max-cpus=3

This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory, and use only two different R containers. It also specifies that members of the posit-power-users group will be allowed to use much more resources, including GPUs, and the ability to see the r-session:preview image, in addition to being able to run any image they specify.

Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below.

/etc/rstudio/launcher.kubernetes.profiles.conf

Config Option Description Required (Y/N) Default Value
container-images Comma-separated string of allowed images that users may see and run. N
default-container-image The default container image to use for the Job if none is specified. N
allow-unknown-images Whether or not to allow users to run any image they want within their job containers, or if they have to use the ones specified in container-images N 1
placement-constraints Comma-separated string of available placement constraints in the form of key1:value1,key2:value2,... where the :value part is optional to indicate free-form fields. See next section for more details N
default-cpus Number of CPUs available to a job by default if not specified by the job. N 0.0 (infinite - managed by Kubernetes)
default-cpus-request Number of CPUs requested to be available to a job by default if not specified by the job. Corresponds to the Kubernetes CPU Request. N 0.0 (not specified - managed by Kubernetes)
default-mem-mb Number of MB of RAM available to a job by default if not specified by the job. Corresponds to the Kubernetes Memory Limit. N 0.0 (infinite - managed by Kubernetes)
default-mem-mb-request Number of MB of RAM requested to be available to a job by default if not specified by the job. Corresponds to the Kubernetes Memory Request. N 0.0 (not specified - managed by Kubernetes)
max-cpus Maximum number of CPUs available to a job. Corresponds to the Kubernetes CPU Limit. N 0.0 (infinite - managed by Kubernetes)
max-cpus-request Maximum number of CPUs that can be requested. Corresponds to the Kubernetes CPU Request. N 0.0 (not specified - managed by Kubernetes)
max-mem-mb Maximum number of MB of RAM available to a job. Corresponds to the Kubernetes Memory Limit. N 0.0 (infinite - managed by Kubernetes)
max-mem-mb-request Maximum number of MB of RAM that can be requested. Corresponds to the Kubernetes Memory Request. N 0.0 (not speified - managed by Kubernetes)
job-json-overrides JSON path overrides of the generated Kubernetes Job JSON. See [Modifying jobs]. N
cpu-request-ratio Ratio within the range (0.0, 1.0] representing the Kubernetes container resource request to set for the CPU. This will be the ratio of the limit amount specified by the user when creating the job if no request was specified and no default was determined via profile settings. N 1.0
memory-request-ratio Ratio within the range (0.0, 1.0] representing the Kubernetes container resource request to set for the memory. This will be the ratio of the limit amount specified by the user when creating the job if no request was specified and no default was determined via profile settings. N 1.0
default-nvidia-gpus Number of NVIDIA GPUs available to a job by default if not specified by the job. See below for more information. N 0
default-amd-gpus Number of AMD GPUs available to a job by default if not specified by the job. See below for more information. N 0
max-nvidia-gpus Maximum number of NVIDIA GPUs available to a job. See below for more information. N 0
max-amd-gpus Maximum number of AMD GPUs available to a job. See below for more information. N 0
resource-profiles Available resource profiles. See Resource profiles. N
allow-custom-resources Whether jobs can use the custom resource profile. See Resource profiles. N 1

Note that resource limits correspond to the Kubernetes container resource limits, which represent hard caps for the resources a job can use. Kubernetes allows jobs to request less resources and occasionally burst up to the limit amount, and this can be controlled by setting the cpu-request-ratio and memory-request-ratio settings as detailed above. Note that resource management in Kubernetes is a complex topic, and in general you should simply leave these to the default value of 1.0 unless you understand the implications of using both requests and limits. See here for more information.

In order to provide GPUs as a schedulable resource, you must first enable the feature in Kubernetes by installing the necessary GPU drivers and device plugins supplied by your desired vendor (AMD or NVIDIA). Once available in Kubernetes, simply set the desired default and max values for the GPU type you intend to use. If not using GPUs, no GPU configuration in the profiles is necessary. For information on adding support for GPUs in Kubernetes, see the Kubernetes documentation.

Resource profiles

Resource profiles greatly simplify the task of assigning CPU, memory, or GPU resources for a job (provided that GPUs are available). They are configured in the optional /etc/rstudio/launcher.kubernetes.resources.conf file. For example:

/etc/rstudio/launcher.kubernetes.resources.conf
[default]
name = "Default" # optional, derived from the section name when absent
cpus=1
mem-mb=4096

[small]
cpus=1
mem-mb=512

[default-gpu]
cpus=1
mem-mb=4096
nvidia-gpus=1
amd-gpus=0

[hugemem]
name = "Huge Memory"
cpus=8
mem-mb=262144

By default, all profiles are available to all users, and jobs can also use a special custom profile to specify CPU, memory, and GPU resources directly instead. However, users are still subject to the constraints in User and group profiles, and administrators may also limit access to individual resource profiles with that configuration file.

For example, suppose an admin wants to restrict the resource profiles above such that (1) GPUs and large memory jobs are only available to users in the bioinformatics group; and (2) only users in the posit-power-users group can use the custom resource profile to set their own resources directly. This might result in the following /etc/rstudio/launcher.kubernetes.profiles.conf file:

/etc/rstudio/launcher.kubernetes.profiles.conf
[*]
resource-profiles=default,small
allow-custom-resources=0

[@bioinformatics]
resource-profiles=default,small,default-gpu,hugemem

[@posit-power-users]
resource-profiles=default,small,default-gpu,hugemem
allow-custom-resources=1

The settings available in each section of the /etc/rstudio/launcher.kubernetes.resources.conf file are described in more depth in the table below:

/etc/rstudio/launcher.kubernetes.resources.conf

Config Option Description Required (Y/N) Default Value
name A user-friendly name for the profile, e.g. Default (1 CPU, 4G mem) or m4.xlarge. N The section title
cpus The CPU limit. Y
cpus-request The CPU request. N
mem-mb The memory limit, in megabytes. Y
mem-mb-request The memory request, in megabytes. N
nvidia-gpus Number of NVIDIA GPUs, if supported by the cluster. N 0
amd-gpus Number of AMD GPUs, if supported by the cluster. N 0
placement-constraints Any placement constraints, as a comma-separated list of the form key1:value1,key2:value2. N

Kubernetes object templating

The new preferred method for modifying objects (jobs and services) submitted to Kubernetes is to use the new templating feature. This allows templating of the entire YAML payload that is submitted to the Kubernetes API using a syntax similar to Helm charts. This new method is easier to use than the previous job-json-overrides functionality, and allows for the use of conditional logic to make job modification dynamic, as opposed to the static transformations offered by job-json-overrides.

To use the templating feature, enable it in /etc/rstudio/launcher.kubernetes.conf:

/etc/rstudio/launcher.kubernetes.conf
use-templating=1

Launcher supports templating of these Kubernetes Resource types

Resource API Docs
Job Job API
Service Service API

Generating templates

After restarting the Launcher, job and service templates will automatically be written to the Kubernetes Launcher scratch-path (/var/lib/rstudio-launcher/Kubernetes by default), and these templates will be used to to create the jobs and services that are submitted to Kubernetes when starting Launcher jobs. These templates will only be created if they do not already exist to ensure that any changes made are not overwritten.

You can also generate the templates via a command instead of having to first run the Launcher to create them. This can be done with the --generate-templates command:

sudo /path/to/rstudio-kubernetes-launcher --generate-templates

Note that the templates will be created in the scratch-path as mentioned above. Only the templates found in the scratch-path will be used for templating. The scratch-path is determined by parsing the launcher.conf and launcher.kubernetes.conf files. If these files do not yet exist, you can specify the scratch-path by adding --scratch-path <path> to the --generate-templates command.

Modifying templates

Once the templates have been generated, you can modify them as necessary for your Kubernetes cluster. For example, you can add additional annotations to jobs, add Linux capabilities to the container, or even run a side-car container.

To modify the templates, edit the job.tpl and service.tpl files that were previously generated within the scratch-path. You can add additional fields to the job, but it is strongly recommended that you do not delete fields from the template. Doing so may cause your jobs to fail to launch properly, or cause unintended subtle issues with Job Launcher functionality.

Changes to the template require either a restart of the Launcher or a SIGHUP signal. The SIGHUP signal can be sent to the process to cause the templates to be reloaded during run-time. Note that if the new changes are not valid, the original templates will continue to be used until valid changes are reloaded.

sudo kill -s SIGHUP $(pidof rstudio-kubernetes-launcher)

The object templates are modeled after Helm charts and have identical syntax. Like Helm charts, conditional logic, loops, and various functions are supported via the Sprig library. In contrast to Helm, there are no Values files and no directory structure for templates - all templates are simply stored in the scratch-path. The only values available for use in the template are available on the .Job object which represents the job being submitted via the Launcher.

All templates must start with a comment indicating the version of the template in use, which must be compatible with the version required by the Launcher. Starting with Launcher version 2.6.0, this version follows Semantic Versioning. This is to ensure that your templates are up-to-date with what is needed by the Launcher (see Examples for details). Launcher will fail to start or load a template that does not satisfy the required version. The expected version comment is generated when invoking the --generate-templates command.

The following fields are available on the .Job object:

Field Type Description
id string The unique ID for the job. Only set for service objects.
cluster string The name of the Launcher cluster.
generateName string Used for generating unique object names within Kubernetes to ensure object names do not collide.
name string The name of the Job.
user string The submitting user of the Job.
workingDirectory string The working directory for the job command being executed.
container map The container that will run the job command.
container.image string The Docker image of the container.
container.runAsUser int The UID of the container. May be nil if the default of 0 (root) is used.
container.runAsGroup int The GID of the container. May be nil if the default of 0 (root) is used.
container.supplementalGroupIds int array An array of supplemental UIDs for the container user. May be empty.
host string The desired host to run the Job on. Usually empty.
command string The Job command to execute. If empty, exe will be specified.
exe string The executable to execute. If empty, command will be specified.
stdin string The Job’s stdin.
args string array The arguments for the command/executable to be run.
placementConstraints map array The placement constraints to be used for deciding where the job should be run.
placementConstraint.name string The name of the Placement Constraint. Example: availability-zone, region, etc.
placementConstraint.value string The value of the Placement Constraint. Example: us-east-2, m3xlarge, etc.
exposedPorts map array The ports that should be exposed/opened for the job.
exposedPort.protocol string The protocol for the port (TCP or UDP).
exposedPort.targetPort string The port within the container to expose/open.
metadata map The arbitrary JSON object that was provided via the metadata field. Used to apply overrides to annotations, labels, init containers, etc.
mounts map array The requested mounts for the job. This is generally complicated to work with to transform into Kubernetes objects, so volumes and volumeMounts are provided.
mount.mountPath string The path where the mount should be mounted to.
mount.readOnly bool Whether or not the mount should be read-only.
mount.mountSource map The description of the mount itself.
mount.mountSource.type string The type of the mount (e.g. host, nfs, etc.).
mount.mountSource.source map The underlying description of the type of the mount. Varies by mount type.
config map array Job-specific config unique to the Kubernetes Launcher. Currently used to specify secret env vars.
config.name string The name of the Job config.
config.value string The value of the Job config.
resourceLimits map array The resource limits to be used for the Job.
resourceLimit.type string The type of the resource limit (memory, cpuCount, NVIDIA GPUs, or AMD GPUs).
resourceLimit.value string The value of the resource limit.
volumes map array The Kubernetes volumes that should be mounted. This is constructed from the requested Job mounts.
volume.name string The unique name of the volume.
volume.? map The sub-object describing the volume. This varies based on the type of the volume being mounted.
volumeMounts map array The Kubernetes volume mounts describing how volumes should be mounted. This is constructed from the requested Job mounts.
volumeMount.name string The name of the volume to be mounted. Must match a volume.name.
volumeMount.mountPath string The path where the mount should be mounted to.
volumeMount.readOnly bool Whether or not the mount should be read-only.
tags string array The tags for the job.
servicePortsJson string Used by the Launcher to ensure that services are created properly after a restart.
shareProcessNamespace bool Reflects the current sharedProcessNamespace configuration setting to control if a container should use shared process namespacing.
memoryRequestRatio decimal Reflects the memoryRequestRatio as specified in the User/Group profiles. Deprecated - do not use.
cpuRequestRatio decimal Reflects the cpuRequestRatio as specified in the User/Group profiles. Deprecated - do not use.
serviceAccountName string The Kubernetes Service Account specified for the Job Pods, if any.

The following template functions are available for use in addition to the previously mentioned Sprig template functions:

Name Description Example
include Renders the specified template file (other than job.tpl and service.tpl) with the specified values and returns the result. {{ include “custom.tpl” . }}
toYaml Renders the specified object as YAML. {{ toYaml .Job.volumes }}
exec Executes the specified command or executable with the given arguments, returning an ExecResult type, which if rendered directly returns the stdout for the process. See Examples below. {{ exec “echo” “Hello, world” }}
groups Returns a list of groups for the specified user by shelling out to the groups Linux command. {{ groups .Job.user }}

To see what has been modified in the template files, you can use the --diff-templates command. This will show the output of the diff Linux command comparing the changes that you’ve made to the original templates generated with the --generate-templates command. Note that the diff command must be present on the PATH in order to use this functionality.

sudo /path/to/rstudio-kubernetes-launcher --diff-templates

Examples

Adding host aliases to the job

Host Aliases are defined on spec.template.spec.

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
  backoffLimit: 0
  template:
    (...omitted for brevity...)
    spec:
      hostAliases:
        - ip: "10.2.141.12",
          hostnames: ["db01"]
        - ip: "10.2.141.13",
          hostnames: ["db02"]
Adding Linux capabilities to the job container

Capabilities are defined on spec.template.spec.securityContext. The securityContext is conditionally constructed and is not present for all jobs, so you need to modify the template so that it is always constructed with your desired capabilities.

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
  backoffLimit: 0
  template:
    (...omitted for brevity...)
    spec:
      (...omitted for brevity...)
      securityContext:
        {{- if $securityContext }}
        {{- range $key, $val := $securityContext }}
        {{ $key }}: {{ $val }}
        {{- end }}
        capabilities:
          add: ["NET_ADMIN", "SYS_PTRACE"]
Adding an ImagePullSecret

imagePullSecrets are defined on spec.template.spec.

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
  backoffLimit: 0
  template:
    (...omitted for brevity...)
    spec:
      imagePullSecrets:
        - name: mysecret
Custom annotation with dynamic execution

It can be useful to annotate jobs with special annotations depending on certain business logic. This business logic can be encapsulated in a command or executable that you run that decides the value of the annotation. The following example shows what it might look like to fetch the organizational cost center for a user given their user name and a custom application maintained by the business.

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        my.org/costCenter: {{ exec "get-user-details" "--cost-center" .Job.user }}

The hypothetical get-user-details command would then write the user’s cost center to stdout when invoked, which would cause it to be stamped as an annotation on the job as my.org/costCenter which could be used with various Kubernetes reporting tools.

Using the exec function, it is also possible to obtain more information about the result of the invoked process from the ExecResult return value:

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        {{- $res := exec "get-user-details" "--cost-center" .Job.user }}
        {{- if eq $res.ExitCode 1 }}
        my.org/costCenter: "UNKNOWN: {{ $res.Err }}"
        {{- else }}
        my.org/costCenter: {{ $res.Stdout }}
        {{- end }}

The following fields are available on an ExecResult:

Field Type Description
ExitCode int The exit code of the process. If the process encountered an error before running or exited due to signal, this is set to -1.
Err error The error that occurred while running the process, if any. May be nil.
Stdout string The stdout of the process.
Stderr string The stderr of the process.
Add annotation for group members

In this example, we use the groups template function to determine if the job user belongs to the admin-grp group, adding an annotation if so.

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        {{- $grp := groups .Job.user }}
        {{- if has "admin-grp" $grp }}
        my.org/admin: "true"
        {{- end }}
IAM permissions with AWS roles for EKS service accounts

When running the Launcher in an EKS cluster within AWS, you can setup IAM roles to be assumed by running Launcher jobs. The setup within AWS requires the following:

  • Enabling the OIDC provider for your EKS cluster.

  • Various IAM roles which can be assumed. Each of these roles will be associated with a specific Kubernetes service account.

  • Various Kubernetes service accounts, one for each IAM role you want assumable by Launcher jobs.

For details on how to create these within AWS, see the AWS EKS Documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs, thus allowing EKS to automatically associate AWS credentials with the started pods to allow your users to access IAM-protected resources.

For example, let’s assume you have written a script called get-kube-svc-account that maintains a many-to-one mapping of users to Kubernetes service accounts (which implies a mapping of users to IAM roles). We can modify the job template to stamp the correct service account on the running pod by invoking the script like so.

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        (...omitted for brevity...)
      generateName: {{ toYaml .Job.generateName }}
    spec:
      {{- $res := exec "get-kube-svc-acct" .Job.user }}
      {{- if eq $res.ExitCode 0 }}
      serviceAccountName: {{ $res.stdOut }}
      {{- end }}
      (...omitted for brevity...)
      {{- $res := exec "id" "-g" .Job.user }}
      {{- if eq $res.ExitCode 0 }}
      {{- $_ := set $securityContext "fsGroup" $res.Stdout }}
      {{- end }}
      {{- if $securityContext }}
      securityContext:
        {{- range $key, $val := $securityContext }}
        {{ $key }}: {{ $val }}
        {{- end }}
      {{- end }}
Important

The fsGroup parameter is required to ensure that the secret IAM temporary credential files are able to be read by the user running within the pod. Without this, only the root user would be able to read these files, preventing your user from accessing AWS resources. Care must be taken to ensure that the running user *actually* belongs to the fsGroup, as this group ID is added to the running pod user. A mistake here could inadvertently give the user access to other files unintentionally, such as any shared files that are mounted with a volume.

Though a hypothetical mapping script was used for this example, you could use a more robust approach, such as calling out to a service that maintains a mapping, or by managing the mappings in LDAP or some other user database.

Azure AD tokens for AKS service accounts (Azure Workload Identity)

Similar to the previous example on AWS IAM roles, Azure provides Azure Workload Identity to allow access of Azure resources within AKS pods. The setup within Azure requires the following:

  • Enabling the OIDC provider for your AKS cluster.

  • Installing the Mutating Admission Webhook within the cluster.

  • Creation of Azure resources that you will need to access within AKS. Each resource must specify the service principal of a specific Azure Active Directory application which will be tied to an AKS service account that will be used when running Launcher jobs.

  • Various AKS service accounts that have been associated with the aforementioned Azure Active Directory application.

For details on how to create these within Azure, see the Azure Workload Identity documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs (similar to the previous example), thus allowing your users access to your Azure resources within Launcher jobs. Use the example in the AWS IAM section to start your Launcher jobs with the desired AKS service account.

IAM permissions with GKE (GKE Workload Identity)

GKE also allows access to IAM-protected resources within Google Cloud. The setup within Google Cloud requires the following:

  • Enabling Workload Identity on your GKE cluster.

  • Enabling Workload Identity on your GKE node pool.

  • IAM service account(s) that has associated IAM roles that provide access to the Google Cloud resources you want available in your Launcher jobs.

  • Various GKE service accounts, one for each IAM service account that provides the previously mapped IAM roles to the account.

For details on how to create these within Google Cloud, see the GKE Workload Identity documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs (similar to the previous examples), thus allowing your users access to your Google Cloud resources within Launcher jobs. Use the example in the AWS IAM section to start your Launcher jobs with the desired GKE service account.

You will also need to add a nodeSelector to the job pod (via the job.tpl file) to ensure that the GKE metadata server is accessible from the node(s) that will be running your jobs.

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        (...omitted for brevity...)
      generateName: {{ toYaml .Job.generateName }}
    spec:
      nodeSelector:
        iam.gke.io/gke-metadata-server-enabled: "true"
IAM permissions with kiam (deprecated)

You can extend the above example to include IAM roles per group using kiam. The following example will set the users in the group admin-grp to an existing IAM role (rs-admin-role). All other users are given a different IAM role (rs-user-role).

job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
  generateName: {{ toYaml .Job.generateName }}
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        {{- $grp := groups .Job.user }}
        {{- if has "admin-grp" $grp }}
        my.org/admin: "true"
        iam.amazonaws.com/role: "rs-admin-role"
        {{- else }}
        iam.amazonaws.com/role: "rs-user-role"
        {{- end }}
Change the Kubernetes Service type

Launcher supports configuring a Kubernetes Service with a type of ClusterIP, LoadBalancer, or NodePort. Launcher will create a Kubernetes Service only when a job definition includes one or more exposedPorts. Please reference the Launcher API documentation for further details.

To make the Service only be reachable from within the cluster set the type field to ClusterIP as shown in the following example.

service.tpl
# Version: 2.2.0
apiVersion: v1
kind: Service
metadata:
  (...omitted for brevity...)
spec:
  (...omitted for brevity...)
  clusterIP: ''
  type: ClusterIP

On cloud providers which support external load balancers, setting type field to LoadBalancer provisions a load balancer for your Service. Consult the documentation from your cloud provider for details on support for additional parameters (e.g. loadBalancerIP or loadBalancerClass).

Important

Using the LoadBalancer service type may result in new infrastructure being provisioned by your cloud provider. Your cloud provider will charge you for any infrastructure that is provisioned.

service.tpl
# Version: 2.2.0
apiVersion: v1
kind: Service
metadata:
  (...omitted for brevity...)
spec:
  (...omitted for brevity...)
  clusterIP: ''
  type: LoadBalancer
AWS Fargate

Launcher supports running jobs on AWS Fargate. Please review the AWS Fargate Considerations and the Fargate pod configuration to understand the limits of scheduling pods.

Note

Pods scheduled on AWS Fargate may take 10 minutes or longer to start. Using a larger instance and a smaller container image can result in a faster startup time.

  • Create one or more Fargate Profiles.

  • Configure the Launcher Kubernetes Plugin for one or more Launcher Clusters

  • Modify the Kubernetes Templates for the relevant Launcher Cluster(s)

    • The Kubernetes Service type must be one of ClusterIP or LoadBalancer. Fargate does not support NodePort.

    • If your Fargate Profile specifies Kubernetes labels to match, you can use placement constraints to conditionally add these labels to the job template.

      Configure a placement constraint in the appropriate profiles configuration file

      launcher.kubernetes.profiles.conf

      [*]
      placement-constraints=eks.amazonaws.com/compute-type:fargate

      The matching label(s) need to be added to the pod metadata (spec.template.metadata.labels). In this example, the Fargate profile is configured to match the label node-type=fargate.

      service.tpl
      spec:
        (...omitted for brevity...)
        template:
          (...omitted for brevity...)
          metadata:
            (...omitted for brevity...)
            labels:
              (...potential additional labels...)
              {{- range .Job.placementConstraints -}}
                {{- if and (eq .name "eks.amazonaws.com/compute-type") (eq .value "fargate")}}
              node-type: fargate
                {{- end }}
              {{- end }}
Important

AWS Fargate schedules each Pod within a Virtual Machine (VM). This VM will not be terminated until the Kubernetes Job is deleted.

AWS will charge you for these EC2 instances until the Jobs are deleted. To ensure these nodes are cleaned up promptly, lower the value of job-expiry-hours in the relevant plugin configuration file.

To clean up Launcher jobs immediately after completion use the following configuration, which will terminate the associated EC2 instance when the Kubernetes Job is removed.

job-expiry-hours=0

Validating templates

Once changes have been made to templates, it is recommended run the --validate-templates command to ensure that the changes are valid YAML and that no important Job Launcher pieces have been tampered with inadvertently. This command generates a test Job payload and renders the templates in the scratch-path.

sudo /path/to/rstudio-kubernetes-launcher --validate-templates

If any errors are found during validation, they will be reported so that the templates can be modified to fix them. The validator tests for the following issues:

  • Is the template itself valid? Do all functions exist and is all syntax correct?
  • Is the rendered template YAML valid?
  • Is there any extra white space on the YAML object?
  • Are all pieces of the object needed by the Job Launcher available and of the correct type?
  • Are all required hard-coded values set correctly?
Note

Errors produced during validation are strongly indicative of issues that should be addressed, but the Launcher does not require validation to complete without errors before using the templates.

It is also sometimes desirable to inspect the actual YAML that is generated after templating. This can be done by adding --verbose to the --validate-templates command detailed above.

JSON overrides

Whenever a job is submitted to the Kubernetes Launcher plugin, a JSON job object is generated and sent to Kubernetes. In some cases, it may be desirable to add or modify fields within this automatically generated JSON blob.

In order to do that, you may specify job-json-overrides within the profiles file. The form of the value should be "{json path}":"{path to json value file}","{json path 2}":"{path to json value file 2}",....

The JSON path should be a valid JSON path pointer as specified in the JSON Pointer RFC.

The JSON value path specified must be a file readable by the service user, and must contain valid JSON. For example, to add Host Aliases to all submitted jobs:

/etc/rstudio/launcher.kubernetes.profiles.conf
job-json-overrides="/spec/template/spec/hostAliases":"/etc/rstudio/kube-host-aliases"
/etc/rstudio/kube-host-aliases
[
  {
    "ip": "10.2.141.12",
    "hostnames": ["db01"]
  },
  {
    "ip": "10.2.141.13",
    "hostnames": ["db02"]
  }
]

Because the pod itself is nested within the Kubernetes Job object, it is located at the path /spec/template/spec. In the example above, we simply add a JSON object representing the HostAlias array as defined by the Kubernetes API. See the Kubernetes API Documentation for an exhaustive list of fields that can be set.

Any job-json-overrides-specified fields will overwrite already existing fields in the auto-generated job spec. Note that the Kubernetes Launcher plugin requires certain fields to be set in order to properly parse saved job data. It is strongly recommended you use the job-json-overrides feature sparingly, and only use it to add additional fields to the automatically generated job object when necessary.

Kubernetes cluster requirements

In order for the Kubernetes plugin to run correctly, the following assumptions about the Kubernetes cluster must be true:

  • The Kubernetes API must be enabled and reachable from the machine running the Job Launcher
  • There must be a namespace to create jobs in, which can be specified via the kubernetes-namespace configuration mentioned above (this defaults to rstudio)
  • There must be a service account that has full API access for all endpoints and API groups underneath the aforementioned namespace, and the account’s auth token must be supplied to the plugin via the auth-token setting
  • The service account must have access to view the nodes list via the API (optional, but will restrict IP addresses returned for a job to the internal IP if not properly configured, as /nodes is needed to fetch a node’s external IP address)
  • The cluster must have the metrics-server add-on running and working properly to provide job resource utilization streaming

In order to use placement constraints, you must attach labels to the node that match the given configured placement constraints. For example, if you have a node with the label az=us-east and have a placement constraint defined az:us-east, incoming jobs specified with the az:us-east placement constraint will be routed to the desired node. For more information on Kubernetes’ placement constraints, see here.

The following sample script can be run to create a job-launcher service account and rstudio namespace, granting the service account (and thus, the launcher) full API access to manage Workbench jobs:

kubectl create namespace rstudio
kubectl create serviceaccount job-launcher --namespace rstudio
cat > job-launcher-role.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: job-launcher-role
  namespace: rstudio
rules:
  - apiGroups:
      - ""
    resources:
      - "pods"
      - "pods/log"
      - "pods/attach"
      - "pods/exec"
    verbs:
      - "get"
      - "create"
      - "update"
      - "patch"
      - "watch"
      - "list"
      - "delete"
  - apiGroups:
      - ""
    resources:
      - "events"
    verbs:
      - "watch"
  - apiGroups:
      - ""
    resources:
      - "services"
    verbs:
      - "create"
      - "get"
      - "watch"
      - "list"
      - "delete"
  - apiGroups:
      - "batch"
    resources:
      - "jobs"
    verbs:
      - "create"
      - "update"
      - "patch"
      - "get"
      - "watch"
      - "list"
      - "delete"
  - apiGroups:
      - "metrics.k8s.io"
    resources:
      - "pods"
    verbs:
      - "get"
  - apiGroups:
      - ""
    resources:
      - "serviceaccounts"
    verbs:
      - "list"
EOF
kubectl create -f job-launcher-role.yaml; rm job-launcher-role.yaml
kubectl create rolebinding job-launcher-role-binding --namespace rstudio \
   --role=job-launcher-role \
   --serviceaccount=rstudio:job-launcher
kubectl create clusterrole job-launcher-clusters \
   --verb=get,watch,list \
   --resource=nodes
kubectl create clusterrolebinding job-launcher-list-clusters \
  --clusterrole=job-launcher-clusters \
  --group=system:serviceaccounts:rstudio

It should be noted that the ClusterRole created above is only used to get information about the nodes in the cluster that can run Launcher jobs. This is sometimes necessary to ensure that the Launcher can determine all of the IP addresses that belong to the node to ensure that they are reported properly to upstream clients of the Launcher. This ensures that external clients can connect to their Launcher jobs as required. If you ensure that all clients are connecting to the Launcher internally within the same network segment (meaning that clients can connect to the internal IP address of the Kubernetes node), you can forego the ClusterRole and ClusterRoleBinding. If you run into problems where clients cannot connect to their Launcher jobs, you may need the external IP address of the node(s) in your cluster, which will require the ClusterRole to be given to the Launcher service account.