Future builder

This document is a continuation of Santos’ work about “Explicit Builders”. It builds on top of that document some extra features and makes some decisions about the final goal, proposing a clear direction to move forward with intermediate steps keeping backward and forward compatibility.

Note

A lot of things have changed since this document was written. We have had multiple discussions where we already took some decisions and discarded some of the ideas/details proposed here. The document was merged as-is without a cleaned up and there could be some inconsistencies. Note that build.jobs and build.commands are already implemented without definig a contract yet, and with small differences from the idea described here.

Please, refer to the following links to read more about all the discussions we already had:

Public discussions:

https://github.com/readthedocs/readthedocs.org/issues/9062

https://github.com/readthedocs/readthedocs.org/issues/1083

https://github.com/readthedocs/readthedocs.org/issues/9063

https://github.com/readthedocs/readthedocs.org/issues/9088

Private discussions:

https://github.com/readthedocs/meta/discussions/9

https://github.com/readthedocs/meta/discussions/14

https://github.com/readthedocs/meta/discussions/17

Goals 

Keep the current builder working as-is
Keep backward and forward (with intermediate steps) compatibility
Define a clear support for newbie, intermediate and advanced users
Allow users to override a command, run pre/post hook commands or define all commands by themselves
Remove the Read the Docs requirement of having access to the build process
Translate our current magic at build time to a defined contract with the user
Provide a way to add a command argument without implementing it as a config file (e.g. fail_on_warning)
Define a path forward towards supporting other tools
Re-write all readthedocs-sphinx-ext features to post-processsing HTML features
Reduce complexity maintained by Read the Docs’ core team
Make Read the Docs responsible for Sphinx support and delegate other tools to the community
Eventually support upload pre-build docs
Allow us to add a feature with a defined contract without worry about breaking old builds
Introduce build.builder: 2 config (does not install pre-defined packages) for these new features
Motivate users to migrate to v2 to finally deprecate this magic by educating users

Steps ran by the builder 

Read the Docs currently controls all the build process. Users are only allowed to modify very limited behavior by using a .readthedocs.yaml file. This drove us to implement features like sphinx.fail_on_warning, submodules, among others, at a high implementation and maintenance cost to the core team. Besides, this hasn’t been enough for more advanced users that require more control over these commands.

This document proposes to clearly define the steps the builder ran and allow users to override them depending on their needings:

Newbie user / simple platform usage: Read the Docs controls all the commands (current builder)
Intermediate user: ability to override one or more commands plus running pre/post hooks
Advanced user: controls all the commands executed by the builder

The steps identified so far are:

Checkout
Expose project data via environment variables (*)
Create environment (virtualenv / conda)
Install dependencies
Build documentation
Generate defined contract (metadata.yaml)
Post-process HTML (*)
Upload to storage (*)

Steps marked with (*) are managed by Read the Docs and can’t be overwritten.

Defined contract 

Projects building on Read the Docs must provide a metadata.yaml file after running their last command. This file contains all the data required by Read the Docs to be able to add its integrations. If this file is not provided or malformed, Read the Docs will fail the build and stop the process communicating to the user that there was a problem with the metadata.yaml and we require them to fix the problem.

Note

There is no restriction about how this file is generated (e.g. generated with Python, Bash, statically uploaded to the repository, etc) Read the Docs does not have control over it and it’s only responsible for generating it when building with Sphinx.

The following is an example of a metadata.yaml that is generated by Read the Docs when building Sphinx documentation:

# metadata.yaml
version: 1
tool:
  name: sphinx
  version: 3.5.1
  builder: html
readthedocs:
  html_output: ./_build/html/
  pdf_output: ./_build/pdf/myproject.pdf
  epub_output: ./_build/pdf/myproject.epub
  search:
    enabled: true
    css_identifier: #search-form > input[name="q"]
  analytics: false
  flyout: false
  canonical: docs.myproject.com
  language: en

Warning

The metadata.yaml contract is not defined yet. This is just an example of what we could expect from it to be able to add our integrations.

Config file 

As we mentioned, we want all users to use the same config file and have a clear way to override commands as they need. This will be done by using the current .readthedocs.yaml file that we already have by adding two new keys: build.jobs and build.commands.

If neither build.jobs or build.commands are present in the config file, Read the Docs will execute the builder we currently support without modification, keeping compatibility with all projects already building successfully.

When users make usage of jobs: or commands: keys we are not responsible for them in case they fail. In these cases, we only check for a metadata.yaml file and run our code to add the integrations.

`build.jobs`

It allows users to execute one or multiple pre/post hooks and/or overwrite one or multiple commands. These are some examples where this is useful:

User wants to pass an extra argument to sphinx-build
Project requires to execute a command before building
User has a personal/private PyPI URL
Install project with pip install -e (see https://github.com/readthedocs/readthedocs.org/issues/6243)
Disable git shallow clone (see https://github.com/readthedocs/readthedocs.org/issues/5989)
Call pip install with --constraint (see https://github.com/readthedocs/readthedocs.org/issues/7258)
Do something _before_ install (see https://github.com/readthedocs/readthedocs.org/issues/6662)
Use a conda lock file to create the environment (see https://github.com/readthedocs/readthedocs.org/issues/7772)
Run a check after the build is done (e.g. sphinx-build -W -b linkcheck . _build/html)
Create virtualenv with --system-site-packages
etc

# .readthedocs.yaml
build:
  builder: 2
  jobs:
    pre_checkout:
    checkout: git clone --branch main https://github.com/readthedocs/readthedocs.org
    post_checkout:
    pre_create_environment:
    create_environment: python -m virtualenv venv
    post_create_environment:
    pre_install:
    install: pip install -r requirements.txt
    post_install:
    pre_build:
    build:
      html: sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
      pdf: latexmk -r latexmkrc -pdf -f -dvi- -ps- -jobname=test-builds -interaction=nonstopmode
      epub: sphinx -T -j auto -b epub -d _build/doctrees -D language=en . _build/epub
    post_build:
    pre_metadata:
    metadata: ./metadata_sphinx.py
    post_medatada:

Note

All these commands are executed passing all the exposed environment variables.

If the user only provides a subset of these jobs, we ran our default commands if the user does not provide them (see Steps ran by the builder). For example, the following YAML is enough when the project requires running Doxygen as a pre-build step:

# .readthedocs.yaml
build:
  builder: 2
  jobs:
    # https://breathe.readthedocs.io/en/latest/readthedocs.html#generating-doxygen-xml-files
    pre_build: cd ../doxygen; doxygen

`build.commands`

It allows users to have full control over the commands executed in the build process. These are some examples where this is useful:

project with a custom build process that does map ours
specific requirements that we can’t/want to cover as a general rule
build documentation with a different tool than Sphinx

# .readthedocs.yaml
build:
  builder: 2
  commands:
    - git clone --branch main https://github.com/readthedocs/readthedocs.org
    - pip install -r requirements.txt
    - sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
    - ./metadata.py

Intermediate steps for rollout 

Remove all the exposed data in the conf.py.tmpl file and move it to metadata.yaml
Define structure required for metadata.yaml as contract
Define the environment variables required (e.g. some from html_context) and execute all commands with them
Build documentation using this contract
Leave readthedocs-sphinx-ext as the only package installed and extension install in conf.py.tmpl
Add build.builder: 2 config without any magic
Build everything needed to support build.jobs and build.commands keys
Write guides about how to use the new keys
Re-write readthedocs-sphinx-ext features to post-process HTML features

Final notes 

The migration path from v1 to v2 will require users to explicitly specify their requirements (we don’t install pre-defined packages anymore)
We probably not want to support build.jobs on v1 to reduce core team’s time maintaining that code without the ability to update it due to projects randomly breaking.
We would be able to start building documentation using new tools without having to integrate them.
Building on Read the Docs with a new tool will require: - the user to execute a different set of commands by overriding the defaults. - the project/build/user to expose a metadata.yaml with the contract that Read the Docs expects. - none, some or all the integrations will be added to the HTML output (these have to be implemented at Read the Docs core)
We are not responsible for extra formats (e.g. PDF, ePub, etc) on other tools.
Focus on support Sphinx with nice integrations made in a tool-agnostic way that can be re-used.
Removing the manipulation of conf.py.tmpl does not require us to implement the same manipulation for projects using the new potential feature sphinx.yaml file.