3 Kubernetes Plugin
The Kubernetes Job Launcher Plugin provides the capability to launch executables on a Kubernetes cluster.
3.1 Configuration
It is recommended not to change any of the default values and only configure required fields as outlined below.
/etc/rstudio/launcher.kubernetes.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
server-user | User to run the executable as. The plugin should be started as root, and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | rstudio-server |
thread-pool-size | Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | Number of CPUs * 2 |
enable-debug-logging | Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). | N | 0 |
scratch-path | Scratch directory where the plugin writes temporary state. | N | /var/lib/rstudio-launcher/{name of plugin} |
job-expiry-hours | Number of hours before completed jobs are removed from the system. | N | 24 |
profile-config | Path to the user and group profiles configuration file (explained in more detail below). | N | /etc/rstudio/launcher.kubernetes.profiles.conf |
api-url | The Kubernetes API base URL. This can be an HTTP or HTTPS URL. The URL should be up to, but not including the /api endpoint. | Y | Example: https://192.168.99.100:8443 |
auth-token | The auth token for the job-launcher service account. This is used to authenticate with the Kubernetes API. This should be base-64 encoded. See below for more information. |
Y | |
kubernetes-namespace | The Kubernetes namespace to create jobs in. Note that the account specified by the auth-token setting must have full API privileges within this namespace. See Kubernetes Cluster Requirements below for more information. |
N | rstudio |
verify-ssl-certs | Whether or not to verify SSL certificates when connecting to api-url . Only applicable if connecting over HTTPS. For production use, you should always leave the default or have this set to true, but it can be disabled for testing purposes. |
N | 1 |
certificate-authority | Certificate authority to use when connecting to Kuberentes over SSL and when verifying SSL certificates. This must be a Base64-encoded PEM certificate, which is what most Kubernetes systems will report as the certificate authority in use. Leave this blank to just use the system root CA store. | N | |
watch-timeout-seconds | Number of seconds before the watch calls to Kubernetes stops. This is to help prevent job status updates from hanging in some environments due to network middleware silently dropping idle connections. It is recommended to keep the default, but it can be raised if job status hangs are not apparent, or turned off by setting this to 0. | N | 180 |
fetch-limit | The maximum amount of objects to request per API call from the Kubernetes Service for GET collection requests. It is recommended you only change the default if you run into size issues with the returned payloads. | N | 500 |
In order to retrieve the auth-token
value, run the following commands. Note that the account must first be created and given appropriate permissions (see Kubernetes Cluster Requirements below).
KUBERNETES_AUTH_SECRET=$(kubectl get serviceaccount job-launcher --namespace=rstudio -o jsonpath='{.secrets[0].name}')
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d
3.1.1 User and Group Profiles
The Kubernetes plugin also allows you to specify user and group configuration profiles, similar to RStudio Workbench’s profiles, in the configuration file /etc/rstudio/launcher.kubernetes.profiles.conf
(or any arbitrary file as specified in profile-config
within the main configuration file; see above). These are entirely optional.
Profiles are divided into sections of three different types:
Global ([*])
Per-group ([@groupname])
Per-user ([username])
Here’s an example profiles file that illustrates each of these types:
/etc/rstudio/launcher.kubernetes.profiles.conf
[*]
placement-constraints=node,region:us,region:eu
default-cpus=1
default-mem-mb=512
max-cpus=2
max-mem-mb=1024
container-images=r-session:3.4.2,r-session:3.5.0
allow-unknown-images=0
[@rstudio-power-users]
default-cpus=4
default-mem-mb=4096
default-nvidia-gpus=0
default-amd-gpus=0
max-nvidia-gpus=2
max-amd-gpus=3
max-cpus=20
max-mem-mb=20480
container-images=r-session:3.4.2,r-session:3.5.0,r-session:preview
allow-unknown-images=1
[jsmith]
max-cpus=3
This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory, and use only two different R containers. It also specifies that members of the rstudio-power-users group will be allowed to use much more resources, including GPUs, and the ability to see the r-session:preview
image, in addition to being able to run any image they specify.
Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below.
/etc/rstudio/launcher.kubernetes.profiles.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
container-images | Comma-separated string of allowed images that users may see and run. | N | |
default-container-image | The default container image to use for the Job if none is specified. | N | |
allow-unknown-images | Whether or not to allow users to run any image they want within their job containers, or if they have to use the ones specified in container-images |
N | 1 |
placement-constraints | Comma-separated string of available placement constraints in the form of key1:value1,key2:value2,... where the :value part is optional to indicate free-form fields. See next section for more details |
N | |
default-cpus | Number of CPUs available to a job by default if not specified by the job. | N | 0.0 (infinite - managed by Kubernetes) |
default-mem-mb | Number of MB of RAM available to a job by default if not specified by the job. | N | 0.0 (infinite - managed by Kubernetes) |
max-cpus | Maximum number of CPUs available to a job. | N | 0.0 (infinite - managed by Kubernetes) |
max-mem-mb | Maximum number of MB of RAM available to a job. | N | 0.0 (infinite - managed by Kubernetes) |
job-json-overrides | JSON path overrides of the generated Kubernetes Job JSON. See Modifying Jobs. | N | |
cpu-request-ratio | Ratio within the range (0.0, 1.0] representing the Kubernetes container resource request to set for the CPU. This will be the ratio of the limit amount specified by the user when creating the job. |
N | 1.0 |
memory-request-ratio | Ratio within the range (0.0, 1.0] representing the Kubernetes container resource request to set for the memory. This will be the ratio of the limit amount specified by the user when creating the job. |
N | 1.0 |
default-nvidia-gpus | Number of NVIDIA GPUs available to a job by default if not specified by the job. See below for more information. | N | 0 |
default-amd-gpus | Number of AMD GPUs available to a job by default if not specified by the job. See below for more information. | N | 0 |
max-nvidia-gpus | Maximum number of NVIDIA GPUs available to a job. See below for more information. | N | 0 |
max-amd-gpus | Maximum number of AMD GPUs available to a job. See below for more information. | N | 0 |
Note that resource limits correspond to the Kubernetes container resource limits, which represent hard caps for the resources a job can use. Kubernetes allows jobs to request
less resources and occasionally burst up to the limit
amount, and this can be controlled by setting the cpu-request-ratio
and memory-request-ratio
settings as detailed above. Note that resource management in Kubernetes is a complex topic, and in general you should simply leave these to the default value of 1.0
unless you understand the implications of using both requests
and limits
. See here for more information.
In order to provide GPUs as a schedulable resource, you must first enable the feature in Kubernetes by installing the necessary GPU drivers and device plugins supplied by your desired vendor (AMD or NVIDIA). Once available in Kubernetes, simply set the desired default and max values for the GPU type you intend to use. If not using GPUs, no GPU configuration in the profiles is necessary. For information on adding support for GPUs in Kubernetes, see the Kubernetes documentation.
3.1.2 Modifying Jobs
Whenever a job is submitted to the Kubernetes Launcher plugin, a JSON job object is generated and sent to Kubernetes. In some cases, it may be desireable to add or modify fields within this automatically generated JSON blob.
In order to do that, you may specify job-json-overrides
within the profiles file. The form of the value should be "{json path}":"{path to json value file}","{json path 2}":"{path to json value file 2}",...
.
The JSON path should be a valid JSON path pointer as specified in the JSON Pointer RFC.
The JSON value path specified must be a file readable by the service user, and must contain valid JSON. For example, to add Host Aliases to all submitted jobs:
/etc/rstudio/launcher.kubernetes.profiles.conf
job-json-overrides="/spec/template/spec/hostAliases":"/etc/rstudio/kube-host-aliases"
/etc/rstudio/kube-host-aliases
[
{
"ip": "10.2.141.12",
"hostnames": ["db01"]
},
{
"ip": "10.2.141.13",
"hostnames": ["db02"]
}
]
Because the pod itself is nested within the Kubernetes Job object, it is located at the path /spec/template/spec
. In the example above, we simply add a JSON object representing the HostAlias
array as defined by the Kubernetes API. See the Kubernetes API Documentation for an exhaustive list of fields that can be set.
Any job-json-overrides
-specified fields will overwrite already existing fields in the auto-generated job spec. Note that the Kubernetes Launcher plugin requires certain fields to be set in order to properly parse saved job data. It is strongly recommended you use the job-json-overrides
feature sparingly, and only use it to add additional fields to the automatically generated job object when necessary.
3.2 Kubernetes Cluster Requirements
In order for the Kubernetes plugin to run correctly, the following assumptions about the Kubernetes cluster must be true:
- The Kubernetes API must be enabled and reachable from the machine running the Job Launcher
- There must be a namespace to create jobs in, which can be specified via the
kubernetes-namespace
configuration mentioned above (this defaults torstudio
) - There must be a service account that has full API access for all endpoints and API groups underneath the aforementioned namespace, and the account’s auth token must be supplied to the plugin via the
auth-token
setting - The service account must have access to view the nodes list via the API (optional, but will restrict IP addresses returned for a job to the internal IP if not properly configured, as
/nodes
is needed to fetch a node’s external IP address) - The cluster must have the metrics-server addon running and working properly to provide job resource utilization streaming
In order to use placement constraints, you must attach labels to the node that match the given configured placement constraints. For example, if you have a node with the label az=us-east
and have a placement constraint defined az:us-east
, incoming jobs specified with the az:us-east
placement constraint will be routed to the desired node. For more information on Kubernete’s placement constraints, see here.
The following sample script can be run to create a job-launcher
service account and rstudio
namespace, granting the service account (and thus, the launcher) full API access to manage RStudio jobs:
kubectl create namespace rstudio
kubectl create serviceaccount job-launcher --namespace rstudio
kubectl create rolebinding job-launcher-admin \
--clusterrole=cluster-admin \
--group=system:serviceaccounts:rstudio \
--namespace=rstudio
kubectl create clusterrole job-launcher-clusters \
--verb=get,watch,list \
--resource=nodes
kubectl create clusterrolebinding job-launcher-list-clusters \
--clusterrole=job-launcher-clusters \
--group=system:serviceaccounts:rstudio