Environment Management
Immutable data science environments
The following cookbook example uses the Python and R environment management features described in the Posit Connect Admin Guide. We also use off-host execution mode with Connect for this example, to demonstrate how to create an immutable and reproducible data science environment.
This example demonstrates how to use the official rstudio/r-session-complete
image, which is suitable for running content on Posit Workbench, when deploying content to Posit Connect. By re-using this image, we can ensure that the exact same packages that were used when developing our content in Posit Workbench are used when executing our content on Connect.
The source code for the content used in this example can be found in our Python examples.
Prerequisites
Completing this cookbook example requires the following:
- a Connect installation configured to use off-host execution
- a Connect API key with the administrator role
- push access to a container registry
Create the image
First, we define our image which can be used to develop our content on Workbench, and then later is also used to execute our content on Connect. We are using the r-session-complete
image as the base and we are installing additional Python and R packages that are required by our content.
Dockerfile
FROM ghcr.io/rstudio/r-session-complete:jammy-2023.06.1--cd1a0c5
ARG GIT_SHA="4e4be3f59f0fbcf3ccecc724a00b0da7a4ad6f07"
ARG CRAN_MIRROR="https://p3m.dev/cran/__linux__/jammy/latest"
ARG PYPI_MIRROR="https://p3m.dev/pypi/latest/simple"
# Install the Python packages
# This commands installs the Python packages defined in the requirements.txt
# which pins the package versions and provides an immutable set of Python dependencies.
RUN pip install --upgrade pip && \
curl -sSfL https://raw.githubusercontent.com/sol-eng/python-examples/${GIT_SHA}/reticulated-image-classifier/requirements.txt \
-o /tmp/requirements.txt && \
pip install --default-timeout=1000 --index-url=${PYPI_MIRROR} -r /tmp/requirements.txt && \
rm /tmp/requirements.txt
# Install the R packages
ENV RENV_PATHS_LIBRARY renv/library
RUN R -e $"install.packages('renv', repos = c(CRAN = '${CRAN_MIRROR}'))" && \
curl -sSfL https://raw.githubusercontent.com/sol-eng/python-examples/${GIT_SHA}/reticulated-image-classifier/renv.lock \
-o /tmp/renv.lock && \
R -e $"renv::restore(lockfile='/tmp/renv.lock', repos = c(CRAN = '${CRAN_MIRROR}'))" && \
rm /tmp/renv.lock
Build the image with:
# use a container registry that you have push access to
CONTAINER_REGISTRY="myorg/myrepo"
# build the image
docker build . -t ${CONTAINER_REGISTRY}/image-classifier:jammy
# push it to your registry
docker push ${CONTAINER_REGISTRY}/image-classifier:jammy
Add the execution environment
Next, we use the Connect Server API POST /v1/environments
endpoint to create a new execution environment. This execution environment can then be used by content.
The value for matching
in the environment created is exact
. This indicates that the environment should only be used if it is explicitly requested by a piece of content. Connect never chooses this environment during automatic selection.
Creating an environment via the /v1/environments
API endpoint requires the administrator role.
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/environments \
'{
--data "title": "Custom image classifier",
"description": "My custom image classifier environment",
"cluster_name": "Kubernetes",
"name": "'${CONTAINER_REGISTRY}'/image-classifier:jammy",
"matching": "exact",
"r": {
"installations": [
{
"version": "4.2.3",
"path": "/opt/R/4.2.3/bin/R"
}
]
},
"python": {
"installations": [
{
"version": "3.9.14",
"path": "/opt/python/3.9.14/bin/python"
}
]
}
}'
Deploying the content
First, create a new content item using the Posit Connect Server API. The request payload specifies initial values for default_image_name
, default_r_environment_management
, and default_py_environment_management
. By setting default_image_name
during the initial deployment, we ensure that Connect uses our custom image the first time the content builds during the deployment. We specify false
for both default_r_environment_management
and default_py_environment_management
so that Connect does not attempt to install any Python or R packages during the first build and when the content executes, it uses the packages that are installed on the image instead of looking for packages in the Python/R package cache.
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content \
'{
--data "name": "my-image-classifier-app",
"default_image_name": "'${CONTAINER_REGISTRY}'/image-classifier:jammy",
"default_r_environment_management": false,
"default_py_environment_management": false
}'
Make a note of the guid
in the server response. We use this as our CONTENT_GUID
later when we deploy our application.
Next, we need to clone the content to our workstation and create a content bundle so that we can publish it to the Connect server.
# clone the repo
git clone https://github.com/sol-eng/python-examples.git
git checkout -b connect-custom-execution-env 4e4be3f59f0fbcf3ccecc724a00b0da7a4ad6f07
cd python-examples
# create the content bundle
tar czvf bundle.tar.gz -C ./reticulated-image-classifier ./
# upload the content bundle to Posit Connect
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content/${CONTENT_GUID}/bundles \
"bundle.tar.gz" --data-binary @
Make a note of the id
in the server response. We use this as our BUNDLE_ID
in the next step.
Now we can activate the bundle to complete the content deployment.
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content/${CONTENT_GUID}/deploy \
'{
--data "bundle_id": "'${BUNDLE_ID}'"
}'
The server logs should indicate that the content requests our custom image and that there is no package installation required for this deployment:
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle created with R version 4.2.3 and Python version 3.9.14 is compatible with environment Kubernetes::myorg/myrepo/image-classifier:jammy with R version 4.2.3 from /opt/R/4.2.3/bin/R and Python version 3.9.14 from /opt/python/3.9.14/bin/python " bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle requested no R environment restore; Connect will not perform any R package installation." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle requested no Python environment restore; Connect will not perform any Python package installation." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.785Z" level=info msg="Launching Shiny application..." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
The image classifier application should now be fully published and available through the Connect dashboard.