Skip to content

Python and Mirroring PyPI#

RSPM supports creating a Python repository with a source that mirrors the Python Package Index (PyPI).

Adding a PyPI repository to an RSPM installation will:

  • Provide a full mirror of all packages available on PyPI
  • Enable fully reproducible dependency management through historic PyPI snapshots
  • Locally cache all downloaded Python packages for quicker installs

System Requirements#

In addition to the recommended system requirements, supporting Python packages will require additional disk storage depending on the number of packages being used.

Info

The entirety of PyPI currently requires about 10 TB of storage. Your actual storage needs will depend on your usage. Deep learning packages, such as Tensorflow and PyTorch, are notoriously large, with hundreds of gigabytes needed for each project's collection of files. If you do not anticipate using deep learning packages, a starting storage size of 50 GB is likely adequate. If you do intend to use deep learning packages, you should plan for 500 GB or more.

Quickstart#

The quickest way to make PyPI packages available for your RSPM installation is by running these commands:

Terminal

$ rspm create repo --name=pypi --type=python --description='Access PyPI packages'
$ rspm subscribe --repo=pypi --source=pypi
$ rspm sync --type=pypi

For more information about these commands, scroll down to the Python PyPI Repository section.

Note

If RSPM is served from a subdirectory like /rspm, the Server.Address configuration option needs to be set in the configuration file for PyPI to generate URLs properly.

User Configuration#

Once a Python repository has been successfully created and synced with the RStudio PyPI service, users need to configure their local system and pip to install from RSPM.

To find instructions specific to your RSPM installation:

  1. Follow the Quickstart or Creating a Python PyPI Repository instructions.
  2. Navigate to the RSPM homepage.
  3. Select the relevant Python repository from the sidebar.
  4. Click the Setup button at the top of the page.

In general, users can either install from RSPM in a one-off basis:

Terminal

$ pip install --index-url http(s)://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL

or configure pip to use RSPM in a persistent manner:

Terminal

$ pip config set global.index-url http(s)://[HOST:PORT]/latest/simple

Note

If you use HTTP, pip will ignore your repository by default. Using only the configuration above, pip will show a warning message like this:

WARNING: The repository located at [HOST] is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host [HOST]'.

To configure pip to use the unencrypted HTTP RSPM server, you must use the --trusted-host flag or configuration option.

Terminal

$ pip install --trusted-host [HOST] --index-url http://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL

or configure pip to use RSPM in a persistent manner:

Terminal

$ pip config set global.index-url http://[HOST:PORT]/latest/simple
$ pip config set global.trusted-host [HOST]

Note

If you use HTTPS but do not provide your RSPM installation with a valid SSL certificate, pip will throw SSL: CERTIFICATE_VERIFY_FAILED errors when installing packages, because it attempts to verify proper HTTPS configuration by default. To configure pip to ignore these errors, you need to use the --trusted-host flag or configuration option.

Terminal

$ pip install --trusted-host [HOST] --index-url https://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL

or configure pip to use RSPM in a persistent manner:

Terminal

$ pip config set global.index-url https://[HOST:PORT]/latest/simple
$ pip config set global.trusted-host [HOST]

Creating a Python PyPI Repository#

In the Quickstart section above, we're performing the following operations:

  • Create a Python repository with a description:

Terminal

$ rspm create repo --name=pypi --type=python --description='Access PyPI packages'
<< Repository: pypi - Python
  • Subscribe the repository to the preconfigured PyPI source:

Terminal

$ rspm subscribe --repo=pypi --source=pypi
<< Repository: pypi
<< Sources:
<< --pypi (Python)
  • Ensure that RSPM has the appropriate metadata using the sync command. RSPM pulls packages and metadata from the RStudio PyPI service.

Terminal

$ rspm sync --type=pypi
<< Initiated PyPI synchronization for pypi. Depending on how much data has been previously synchronized, this could take a while. Actions will appear in the Package Manager UI as they are completed.

<< Snapshots for pypi: 0 / 34 [----------------------------------------------------------------------------------------------------------------------------------]
<< Packages in pypi snapshot: 14127 / 231734 [======>-------------------------------------------------------------------------------------------------------] 5m9

Note

If you try subscribing a non-Python type repository to a Python source, you'll get the error source type must be compatible with repository type.

The PyPI Source#

After syncing with the RStudio PyPI service, the local RSPM installation will have all of the metadata, for all the packages on PyPI. Only when a package is requested, for example by pip, is it retrieved from the RStudio PyPI service.

Scheduled Synchronization#

By default, RSPM will sync with the RStudio PyPI service once daily. This schedule can be configured using the PyPI SyncSchedule option, for example:

; /etc/rstudio-pm/rstudio-pm.gcfg
...

[PyPI]
SyncSchedule = 0 1 * * *

...

Note

Although RSPM automatically syncs daily, the RStudio PyPI service may not update packages every day.