Read the Docs developer documentation

Documentation for running your own local version of Read the Docs for development, or taking the open source Read the Docs codebase for your own custom installation.

Contributing to Read the Docs

Are you here to help on Read the Docs? Awesome! ❤️

Read the Docs, and all of it’s related projects, are all community maintained, open-source projects. We hope you feel welcome as you begin contributing to any of these projects. You’ll find that development is primarily supported by our core team members, who all work on Read the Docs full-time.

All members of our community are expected to follow our Code of Conduct. Please make sure you are welcoming and friendly in all of our spaces.

Get in touch

If you have a question or comment, we generally suggest the following communication channels:

  • Ask usage questions (“How do I?”) on StackOverflow.

  • Report bugs, suggest features, or view the source code on GitHub.

  • Discuss development topics on Gitter.

Contributing

There are plenty of places to contribute to Read the Docs, but if you are just starting with contributions, we suggest focusing on the following areas:

Contributing to development

If you want to deep dive and help out with development on Read the Docs, then first get the project installed locally according to the installation guide. After that is done we suggest you have a look at tickets in our issue tracker that are labelled Good First Issue. These are meant to be a great way to get a smooth start and won’t put you in front of the most complex parts of the system.

If you are up to more challenging tasks with a bigger scope, then there are a set of tickets with a Feature or Improvement tag. These tickets have a general overview and description of the work required to finish. If you want to start somewhere, this would be a good place to start (make sure that the issue also have the Accepted label). That said, these aren’t necessarily the easiest tickets. They are simply things that are explained. If you still didn’t find something to work on, search for the Sprintable label. Those tickets are meant to be standalone and can be worked on ad-hoc.

You can read all of our Read the Docs developer documentation to understand more the development of Read the Docs. When contributing code, then please follow the standard Contribution Guidelines set forth at contribution-guide.org.

Contributing to documentation

Documentation for Read the Docs itself is hosted by Read the Docs at https://docs.readthedocs.io (likely the website you are currently reading).

There are guidelines around writing and formatting documentation for the project. For full details, including how to build it, see Building and contributing to documentation.

Contributing to translations

We use Transifex to manage localization for all of our projects that we support localization on. If you are interested in contributing, we suggest joining a team on one of our projects on Transifex. From there, you can suggest translations, and can even be added as a reviewer, so you can correct and approve suggestions.

If you don’t see your language in our list of approved languages for any of our projects, feel free to suggest the language on Transifex to start the process.

Triaging issues

Everyone is encouraged to help improving, refining, verifying and prioritizing issues on Github. The Read the Docs core Read the Docs team uses the following guidelines for issue triage on all of our projects. These guidelines describe the issue lifecycle step-by-step.

Note

You will need Triage permission on the project in order to do this. You can ask one of the members of the Read the Docs team to give you access.

Tip

Triaging helps identify problems and solutions and ultimately what issues that are ready to be worked on. The core Read the Docs team maintains a separate Roadmap of prioritized issues - issues will only end up on that Roadmap after they have been triaged.

Initial triage

When sitting down to do some triaging work, we start with the list of untriaged tickets. We consider all tickets that do not have a label as untriaged. The first step is to categorize the ticket into one of the following categories and either close the ticket or assign an appropriate label. The reported issue …

… is not valid

If you think the ticket is invalid comment why you think it is invalid, then close the ticket. Tickets might be invalid if they were already fixed in the past or it was decided that the proposed feature will not be implemented because it does not conform with the overall goal of Read the Docs. Also if you happen to know that the problem was already reported, reference the other ticket that is already addressing the problem and close the duplicate.

Examples:

  • Builds fail when using matplotlib: If the described issue was already fixed, then explain and instruct to re-trigger the build.

  • Provide way to upload arbitrary HTML files: It was already decided that Read the Docs is not a dull hosting platform for HTML. So explain this and close the ticket.

… does not provide enough information

Add the label Needed: more information if the reported issue does not contain enough information to decide if it is valid or not and ask on the ticket for the required information to go forward. We will re-triage all tickets that have the label Needed: more information assigned. If the original reporter left new information we can try to re-categorize the ticket. If the reporter did not come back to provide more required information after a long enough time, we will close the ticket (this will be roughly about two weeks).

Examples:

  • My builds stopped working. Please help! Ask for a link to the build log and for which project is affected.

… is a valid feature proposal

If the ticket contains a feature that aligns with the goals of Read the Docs, then add the label Feature. If the proposal seems valid but requires further discussion between core contributors because there might be different possibilities on how to implement the feature, then also add the label Needed: design decision.

Examples:

  • Provide better integration with service XYZ

  • Achieve world domination (also needs the label Needed: design decision)

… is a small change to the source code

If the ticket is about code cleanup or small changes to existing features would likely have the Improvement label. The distinction for this label is that these issues have a lower priority than a Bug, and aren’t implementing new features.

Examples:

  • Refactor namedtuples to dataclasess

  • Change font size for the project’s title

… is a valid problem within the code base:

If it’s a valid bug, then add the label Bug. Try to reference related issues if you come across any.

Examples:

  • Builds fail if conf.py contains non-ascii letters

… is a currently valid problem with the infrastructure:

Users might report about web server downtimes or that builds are not triggered. If the ticket needs investigation on the servers, then add the label Operations.

Examples:

  • Builds are not starting

… is a question and needs answering:

If the ticket contains a question about the Read the Docs platform or the code, then add the label Support.

Examples:

  • My account was set inactive. Why?

  • How to use C modules with Sphinx autodoc?

  • Why are my builds failing?

… requires a one-time action on the server:

Tasks that require a one time action on the server should be assigned the two labels Support and Operations.

Examples:

  • Please change my username

  • Please set me as owner of this abandoned project

After we finished the initial triaging of new tickets, no ticket should be left without a label.

Additional labels for categorization

Additionally to the labels already involved in the section above, we have a few more at hand to further categorize issues.

High Priority

If the issue is urgent, assign this label. In the best case also go forward to resolve the ticket yourself as soon as possible.

Good First Issue

This label marks tickets that are easy to get started with. The ticket should be ideal for beginners to dive into the code base. Better is if the fix for the issue only involves touching one part of the code.

Sprintable

Sprintable are all tickets that have the right amount of scope to be handled during a sprint. They are very focused and encapsulated.

For a full list of available labels and their meanings, see Overview of issue labels.

Code of Conduct

Like the technical community as a whole, the Read the Docs team and community is made up of a mixture of professionals and volunteers from all over the world, working on every aspect of the mission - including mentorship, teaching, and connecting people.

Diversity is one of our huge strengths, but it can also lead to communication issues and unhappiness. To that end, we have a few ground rules that we ask people to adhere to. This code applies equally to founders, mentors and those seeking help and guidance.

This isn’t an exhaustive list of things that you can’t do. Rather, take it in the spirit in which it’s intended - a guide to make it easier to enrich all of us and the technical communities in which we participate.

This code of conduct applies to all spaces managed by the Read the Docs project. This includes live chat, mailing lists, the issue tracker, and any other forums created by the project team which the community uses for communication. In addition, violations of this code outside these spaces may affect a person’s ability to participate within them.

If you believe someone is violating the code of conduct, we ask that you report it by emailing dev@readthedocs.org.

  • Be friendly and patient.

  • Be welcoming. We strive to be a community that welcomes and supports people of all backgrounds and identities. This includes, but is not limited to members of any race, ethnicity, culture, national origin, colour, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability.

  • Be considerate. Your work will be used by other people, and you in turn will depend on the work of others. Any decision you take will affect users and colleagues, and you should take those consequences into account when making decisions. Remember that we’re a world-wide community, so you might not be communicating in someone else’s primary language.

  • Be respectful. Not all of us will agree all the time, but disagreement is no excuse for poor behavior and poor manners. We might all experience some frustration now and then, but we cannot allow that frustration to turn into a personal attack. It’s important to remember that a community where people feel uncomfortable or threatened is not a productive one. Members of the Read the Docs community should be respectful when dealing with other members as well as with people outside the Read the Docs community.

  • Be careful in the words that you choose. We are a community of professionals, and we conduct ourselves professionally. Be kind to others. Do not insult or put down other participants. Harassment and other exclusionary behavior aren’t acceptable. This includes, but is not limited to:

    • Violent threats or language directed against another person.

    • Discriminatory jokes and language.

    • Posting sexually explicit or violent material.

    • Posting (or threatening to post) other people’s personally identifying information (“doxing”).

    • Personal insults, especially those using racist or sexist terms.

    • Unwelcome sexual attention.

    • Advocating for, or encouraging, any of the above behavior.

    • Repeated harassment of others. In general, if someone asks you to stop, then stop.

  • When we disagree, try to understand why. Disagreements, both social and technical, happen all the time and Read the Docs is no exception. It is important that we resolve disagreements and differing views constructively. Remember that we’re different. The strength of Read the Docs comes from its varied community, people from a wide range of backgrounds. Different people have different perspectives on issues. Being unable to understand why someone holds a viewpoint doesn’t mean that they’re wrong. Don’t forget that it is human to err and blaming each other doesn’t get us anywhere. Instead, focus on helping to resolve issues and learning from mistakes.

Original text courtesy of the Speak Up! project. This version was adopted from the Django Code of Conduct.

Overview of issue labels

Here is a full list of labels that we use in the GitHub issue tracker and what they stand for.

Accepted

Issues with this label are issues that the core team has accepted on to the roadmap. The core team focuses on accepted bugs, features, and improvements that are on our immediate roadmap and will give priority to these issues. Pull requests could be delayed or closed if the pull request doesn’t align with our current roadmap. An issue or pull request that has not been accepted should either eventually move to an accepted state, or should be closed. As an issue is accepted, we will find room for it on our roadmap or roadmap backlog.

Bug

An issue describing unexpected or malicious behaviour of the readthedocs.org software. A Bug issue differs from an Improvement issue in that Bug issues are given priority on our roadmap. On release, these issues generally only warrant incrementing the patch level version.

Design

Issues related to the UI of the readthedocs.org website.

Feature

Issues that describe new features. Issues that do not describe new features, such as code cleanup or fixes that are not related to a bug, should probably be given the Improvement label instead. On release, issues with the Feature label warrant at least a minor version increase.

Good First Issue

This label marks issues that are easy to get started with. The issue should be ideal for beginners to dive into the code base.

Priority: high

Issues with this label should be resolved as quickly as possible.

Priority: low

Issues with this label won’t have the immediate focus of the core team.

Improvement

An issue with this label is not a Bug nor a Feature. Code cleanup or small changes to existing features would likely have this label. The distinction for this label is that these issues have a lower priority on our roadmap compared to issues labeled Bug, and aren’t implementing new features, such as a Feature issue might.

Needed: design decision

Issues that need a design decision are blocked for development until a project leader clarifies the way in which the issue should be approached.

Needed: documentation

If an issue involves creating or refining documentation, this label will be assigned.

Needed: more information

This label indicates that a reply with more information is required from the bug reporter. If no response is given by the reporter, the issue is considered invalid after 2 weeks and will be closed. See the documentation about our triage process for more information.

Needed: patch

This label indicates that a patch is required in order to resolve the issue. A fix should be proposed via a pull request on GitHub.

Needed: tests

This label indicates that a better test coverage is required to resolve the issue. New tests should be proposed via a pull request on GitHub.

Needed: replication

This label indicates that a bug has been reported, but has not been successfully replicated by another user or contributor yet.

Operations

Issues that require changes in the server infrastructure.

PR: work in progress

Pull requests that are not complete yet. A final review is not possible yet, but every pull request is open for discussion.

PR: hotfix

Pull request was applied directly to production after a release. These pull requests still need review to be merged into the next release.

Sprintable

Sprintable are all issues that have the right amount of scope to be handled during a sprint. They are very focused and encapsulated.

Status: blocked

The issue cannot be resolved until some other issue has been closed. See the issue’s log for which issue is blocking this issue.

Status: stale

A issue is stale if it there has been no activity on it for 90 days. Once a issue is determined to be stale, it will be closed after 2 weeks unless there is activity on the issue.

Support

Questions that needs answering but do not require code changes or issues that only require a one time action on the server will have this label. See the documentation about our triage process for more information.

Roadmap

Process

We publicly organize our product roadmap, on our GitHub Roadmap. Here, you will find several views into our roadmap:

Current sprint

Work that core team is currently responsible for.

Backlog

Work that we have planned for future sprints. Items with an assigned timeframe have generally been discussed already by the team. Items that do not yet have a timeframe assigned are not yet a priority of the core team.

The focus of the core team will be on roadmap and sprint items. These items are promoted from our backlog before each sprint begins.

Triaging issues for the Roadmap

Issues are triaged before they are worked on, involving a number of steps that are covered in Contributing to Read the Docs. Everyone can take part in helping to triage issues, read more in Triaging issues. Additionally, issues are considered for the Roadmap according to the following process:

  • New issues coming in will be triaged, but won’t yet be considered part of our roadmap.

  • If the issue is a valid bug, it will be assigned the Accepted label and will be prioritized, likely on an upcoming release.

  • If the issues are a feature or improvement, the issue might go through a design decision phase before being accepted and assigned to our roadmap. This is a good time to discuss how to address the problem technically. Skipping this phase might result in your PR being blocked, sent back to design decision, or perhaps even discarded. It’s best to be active here before submitting a PR for a feature or improvement.

  • The core team will only work on accepted issues, and will give PR review priority to accepted roadmap/sprint issues. Pull requests addressing issues that are not on our roadmap are welcome, but we cannot guarantee review response, even for small or easy to review pull requests.

Where to contribute

It’s best to pick off issues from our roadmap, and specifically from our backlog, to ensure your pull request gets attention. If you find an issue that is not currently on our roadmap, we suggest asking about the priority of the issue. In some cases, we might put the issue on our roadmap to give it priority.

Design documents

This is where we outline the design of major parts of our project. Generally this is only available for features that have been build in the recent past, but we hope to write more of them over time.

Warning

These documents may not match the final implementation, or may be out of date.

API v3 design document

This document describes the design, some decisions already made and built (current Version 1 of APIv3) and an implementation plan for next Versions of APIv3.

APIv3 will be designed to be easy to use and useful to perform read and write operations as the main two goals.

It will be based on Resources as APIv2 but considering the Project resource as the main one, from where most of the endpoint will be based on it.

Goals

  • Easy to use for our users (access most of resources by slug)

  • Useful to perform read and write operations

  • Authentication/Authorization

    • Authentication based on scoped-tokens

    • Handle Authorization nicely using an abstraction layer

  • Cover most useful cases:

    • Integration on CI (check build status, trigger new build, etc)

    • Usage from public Sphinx/MkDocs extensions

    • Allow creation of flyout menu client-side

    • Simplify migration from other services (import projects, create multiple redirects, etc)

Non-Goals

  • Filter by arbitrary and useless fields

    • “Builds with exit_code=1

    • “Builds containing ERROR on their output”

    • “Projects created after X datetime”

    • “Versions with tag python

  • Cover all the actions available from the WebUI

Problems with APIv2

There are several problem with our current APIv2 that we can list:

  • No authentication

  • It’s read-only

  • Not designed for slugs

  • Useful APIs not exposed (only for internal usage currently)

  • Error reporting is a mess

  • Relationships between API resources is not obvious

  • Footer API endpoint returns HTML

Implementation stages

Version 1

The first implementation of APIv3 will cover the following aspects:

  • Authentication

    • all endpoints require authentication via Authorization: request header

    • detail endpoints are available for all authenticated users

    • only Project’s maintainers can access listing endpoints

    • personalized listing

  • Read and Write

    • edit attributes from Version (only active and privacy_level)

    • trigger Build for a specific Version

  • Accessible by slug

    • Projects are accessed by slug

    • Versions are accessed by slug

    • /projects/ endpoint is the main one and all of the other are nested under it

    • Builds are accessed by id, as exception to this rule

    • access all (active/non-active) Versions of a Project by slug

    • get latest Build for a Project (and Version) by slug

    • filter by relevant fields

  • Proper status codes to report errors

  • Browse-able endpoints

    • browse is allowed hitting /api/v3/projects/ as starting point

    • ability to navigate clicking on other resources under _links attribute

  • Rate limited

Version 2

Note

This is currently implemented and live.

Second iteration will polish issues found from the first step, and add new endpoints to allow import a project and configure it without the needed of using the WebUI as a main goal.

After Version 2 is deployed, we will invite users that reach us as beta testers to receive more feedback and continue improving it by supporting more use cases.

This iteration will include:

  • Minor changes to fields returned in the objects

  • Import Project endpoint

  • Edit Project attributes (“Settings” and “Advanced settings-Global settings” in the WebUI)

  • Trigger Build for default version

  • Allow CRUD for Redirect, Environment Variables and Notifications (WebHook and EmailHook)

  • Create/Delete a Project as subproject of another Project

  • Documentation

Version 3

Third iteration will implement granular permissions. Keeping in mind how Sphinx extension will use it:

  • sphinx-version-warning needs to get all active Versions of a Project

  • An extension that creates a landing page, will need all the subprojects of a Project

To fulfill these requirements, this iteration will include:

  • Scope-based authorization token

Version 4
  • Specific endpoint for our flyout menu (returning JSON instead of HTML)

Out of roadmap

These are some features that we may want to build at some point. Although, they are currently out of our near roadmap because they don’t affect too many users, or are for internal usage only.

  • CRUD for Domain

  • Add User as maintainer

  • Give access to a documentation page (objects.inv, /design/core.html)

  • Internal Build process

Nice to have

Better handling of docs URLs

proxito is the component of our code base in charge of serving documentation to users and handling any other URLs from the user documentation domain.

The current implementation has some problems that are discussed in this document, and an alternative implementation is proposed to solve those problems.

Goals

  • Simplifying our parsing logic for URLs

  • Removing reserved paths and ambiguities from URLs

  • Allow serving docs from a different prefix and subproject prefix.

Non-goals

  • Allowing fully arbitrary URL generation for projects, like changing the order of the elements or removing them.

Current implementation

The current implementation is based on Django URLs trying to match a pattern that looks like a single project, a versioned project, or a subproject, this means that a couple of URLs are reserved, and won’t resolve to the correct file if it exists (https://github.com/readthedocs/readthedocs.org/issues/8399, https://github.com/readthedocs/readthedocs.org/issues/2292), this usually happens with single version projects.

And to support custom URLs we are hacking into Django’s urlconf to override it at runtime, this doesn’t allow us to implement custom URLs for subprojects easily (https://github.com/readthedocs/readthedocs.org/pull/8327).

Alternative implementation

Instead of trying to map a URL to a view, we first analyze the root project (given from the subdomain), and based on that we map each part of the URL to the current project and version.

This will allow us to re-use this code in our unresolver without the need to override the Django’s urlconf at runtime, or guessing a project only by the structure of its URL.

Terminology:

Root project

The project from where the documentation is served (usually the parent project of a subproject or translation).

Current project

The project that owns the current file being served (a subproject, a translation, etc).

Requested file

The final path to the file that we need to serve from the current project.

Look up process

Proxito will process all documentation requests from a single docs serve view, exluding /_ URLs.

This view then will process the current URL using the root project as follows:

  • Check if the root project has translations (the project itself is a translation if isn’t a single version project), and the first part is a language code and the second is a version.

    • If the lang code doesn’t match, we continue.

    • If the lang code matches, but the version doesn’t, we return 404.

  • Check if it has subprojects and the first part of the URL matches the subprojects prefix (projects), and if the second part of the URL matches a subproject alias.

    • If the subproject prefix or the alias don’t match, we continue.

    • If they match, we try to match the rest of the URL for translations/versions and single versions (i.e, we don’t search for subprojects) and we use the subproject as the new root project.

  • Check if the project is a single version. Here we just try to serve the rest of the URL as the file.

  • Check if the first part of the URL is page, then this is a page redirect. Note that this is done after we have discarded the project being a single version project, since it doesn’t makes sense to use that redirect with single version projects, and it could collide with the project having a page/ directory.

  • 404 if none of the above rules match.

Custom URLs

We are using custom URLs mainly to serve the documentation from a different directory:

  • deeplearning/nemo/user-guide/docs/$language/$version/$filename

  • deeplearning/nemo/user-guide/docs/$language/$version/$filename

  • deeplearning/frameworks/nvtx-plugins/user-guide/docs/$language/$version/$filename

We always keep the lang/version/filename order, do we need/want to support changing this order? Doesn’t seem useful to do so.

So, what we need is have a way to specify a prefix only. We would have a prefix used for translations and another one used for subprojects. These prefixes will be set in the root project.

The look up order would be as follow:

  • If the root project has a custom prefix, and the current URL matches that prefix, remove the prefix and follow the translations and single version look up process. We exclude subprojects from it, i.e, we don’t check for {prefix}/projects.

  • If the root project has subprojects and a custom subprojects prefix (projects by default), and if the current URL matches that prefix, and the next part of the URL matches a subproject alias, continue with the subproject look up process.

Examples

The next examples are organized in the following way:

  • First there is a list of the projects involved, with their available versions.

  • The first project would be the root project.

  • The other projects will be related to the root project (their relationship is given by their name).

  • Next we will have a table of the requests, and their result.

Project with versions and translations

Projects:

  • project (latest, 1.0)

  • project-es (latest, 1.0)

Requests:

Request

Requested file

Current project

Note

/en/latest/manual/index.html

/latest/manual/index.html

project

/en/1.0/manual/index.html

/1.0/manual/index.html

project

/en/1.0/404

404

project

The file doesn’t exist

/en/2.0/manual/index.html

404

project

The version doesn’t exist

/es/latest/manual/index.html

/latest/manual/index.html

project-es

/es/1.0/manual/index.html

/1.0/manual/index.html

project-es

/es/1.0/404

404

project-es

The translation exist, but not the file

/es/2.0/manual/index.html

404

project-es

The translation exist, but not the version

/pt/latest/manual/index.html

404

project

The translation doesn’t exist

Project with subprojects and translations

Projects:

  • project (latest, 1.0)

  • project-es (latest, 1.0)

  • subproject (latest, 1.0)

  • subproject-es (latest, 1.0)

Request

Requested file

Current project

Note

/projects/subproject/en/latest/manual/index.html

/latest/manual/index.html

subproject

/projects/subproject/en/latest/404

404

subproject

The subproject exists, but not the file

/projects/subproject/en/2.x/manual/index.html

404

subproject

The subproject exists, but not the version

/projects/subproject/es/latest/manual/index.html

/latest/manual/index.html

subproject-es

/projects/subproject/br/latest/manual/index.html

404

subproject

The subproject exists, but not the translation

/projects/nothing/en/latest/manual/index.html

404

project

The subproject doesn’t exist

/manual/index.html

404

project

Single version project with subprojects

Projects:

  • project (latest)

  • subproject (latest, 1.0)

  • subproject-es (latest, 1.0)

Request

Requested file

Current project

Note

/projects/subproject/en/latest/manual/index.html

/latest/manual/index.html

subproject

/projects/subproject/en/latest/404

404

subproject

The subproject exists, but the file doesn’t

/projects/subproject/en/2.x/manual/index.html

404

subproject

The subproject exists, but the version doesn’t

/projects/subproject/es/latest/manual/index.html

/latest/manual/index.html

subproject-es

/projects/subproject/br/latest/manual/index.html

404

subproject

The subproject exists, but the translation doesn’t

/projects/nothing/en/latest/manual/index.html

404

project

The subproject doesn’t exist

/manual/index.html

/latest/manual/index.html

project

/404

404

project

The file doesn’t exist

/projects/index.html

/latest/projects/index.html

project

The project has a projects directory!

/en/index.html

/latest/en/index.html

project

The project has an en directory!

Project with single version subprojects

Projects:

  • project (latest, 1.0)

  • project-es (latest, 1.0)

  • subproject (latest)

Request

Requested file

Current project

Note

/projects/subproject/manual/index.html

/latest/manual/index.html

subproject

/projects/subproject/en/latest/manual/index.html

404

subproject

The subproject is single version

/projects/subproject/404

404

subproject

The subproject exists, but the file doesn’t

/projects/subproject/br/latest/manual/index.html

/latest/br/latest/manual/index.html

subproject

The subproject has a br directory!

/projects/nothing/manual/index.html

404

project

The subproject doesn’t exist

/en/latest/manual/index.html

/latest/manual/index.html

project

/404

404

project

Project with custom prefix
  • project (latest, 1.0)

  • subproject (latest, 1.0)

project has the prefix prefix, and sub subproject prefix.

Request

Requested file

Current project

Note

/en/latest/manual/index.html

404

project

The prefix doesn’t match

/prefix/en/latest/manual/index.html

/latest/manual/index.html

project

/projects/subproject/en/latest/manual/index.html

404

project

The subproject prefix doesn’t match

/sub/subproject/en/latest/manual/index.html

/latest/manual/index.html

subproject

/sub/nothing/en/latest/manual/index.html

404

project

The subproject doesn’t exist

Project with custom subproject prefix (empty)
  • project (latest, 1.0)

  • subproject (latest, 1.0)

project has the / subproject prefix, this allow us to serve subprojects without using a prefix.

Request

Requested file

Current project

Note

/en/latest/manual/index.html

/latest/manual/index.html

project

/projects/subproject/en/latest/manual/index.html

404

project

The subproject prefix doesn’t match

/subproject/en/latest/manual/index.html

/latest/manual/index.html

subproject

/nothing/en/latest/manual/index.html

/latest/manual/index.html

project

The subproject/file doesn’t exist

Implementation example

This is a simplified version of the implementation, there are some small optimizations and validations that will be in the final implementation.

In the final implementation we will be using regular expressions to extract the parts from the URL.

from readthedocs.projects.models import Project

LANGUAGES = {"es", "en"}


def pop_parts(path, n):
    if path[0] == "/":
        path = path[1:]
    parts = path.split("/", maxsplit=n)
    start, end = parts[:n], parts[n:]
    end = end[0] if end else ""
    return start, end


def resolve(canonical_project: Project, path: str, check_subprojects=True):
    prefix = "/"
    if canonical_project.prefix:
        prefix = canonical_project.prefix
    subproject_prefix = "/projects"
    if canonical_project.subproject_prefix:
        subproject_prefix = canonical_project.subproject_prefix

    # Multiversion project.
    if path.startswith(prefix):
        new_path = path.removeprefix(prefix)
        parts, new_path = pop_parts(new_path, 2)
        language, version_slug = parts
        if not canonical_project.single_version and language in LANGUAGES:
            if canonical_project.language == language:
                project = canonical_project
        else:
            project = canonical_project.translations.filter(language=language).first()
            if project:
                version = project.versions.filter(slug=version_slug).first()
                if version:
                    return project, version, new_path
                return project, None, None

    # Subprojects.
    if check_subprojects and path.startswith(subproject_prefix):
        new_path = path.removeprefix(subproject_prefix)
        parts, new_path = pop_parts(new_path, 1)
        project_slug = parts[0]
        project = canonical_project.subprojects.filter(alias=project_slug).first()
        if project:
            return resolve(
                canonical_project=project,
                path=new_path,
                check_subprojects=False,
            )

    # Single project.
    if path.startswith(prefix):
        new_path = path.removeprefix(prefix)
        if canonical_project.single_version:
            version = canonical_project.versions.filter(
                slug=canonical_project.default_version
            ).first()
            if version:
                return canonical_project, version, new_path
            return canonical_project, None, None

    return None, None, None


def view(canonical_project, path):
    current_project, version, file = resolve(
        canonical_project=canonical_project,
        path=path,
    )
    if current_project and version:
        return serve(current_project, version, file)

    if current_project:
        return serve_404(current_project)

    return serve_404(canonical_project)


def serve_404(project, version=None):
    pass


def serve(project, version, file):
    pass
Performance

Performance is mainly driven by the number of database lookups. There is an additional impact performing a regex lookup.

  • A single version project:

    • /index.html: 1, the version.

    • /projects/guides/index.html: 2, the version and one additional lookup for a path that looks like a subproject.

  • A multi version project:

    • /en/latest/index.html: 1, the version.

    • /es/latest/index.html: 2, the translation and the version.

    • /br/latest/index.html: 1, the translation (it doesn’t exist).

  • A project with single version subprojects:

    • /projects/subproject/index.html: 2, the subproject and its version.

  • A project with multi version subprojects:

    • /projects/subproject/en/latest/index.html: 2, the subproject and its version.

    • /projects/subproject/es/latest/index.html: 3, the subproject, the translation, and its version.

    • /projects/subproject/br/latest/index.html: 2, the subproject and the translation (it doesn’t exist).

As seen, the number of database lookups are the minimal required to get the current project and version, this is a minimum of 1, and maximum of 3.

Questions

  • When using custom URLs, should we support changing the URLs that aren’t related to doc serving?

    These are:

    • Health check

    • Proxied APIs

    • robots and sitemap

    • The page redirect

    This can be useful for people that proxy us from another path.

  • Should we use the urlconf from the subproject when processing it? This is an URL like /projects/subproject/custom/prefix/en/latest/index.html.

    I don’t think that’s useful, but it should be easy to support if needed.

  • Should we support the page redirect when using a custom subproject prefix? This is /{prefix}/subproject/page/index.html.

Build images

This document describes how Read the Docs uses the Docker Images and how they are named. Besides, it proposes a path forward about a new way to create, name and use our Docker build images to reduce its complexity and support installation of other languages (e.g. nodejs, rust, go) as extra requirements.

Introduction

We use Docker images to build user’s documentation. Each time a build is triggered, one of our VMs picks the task and go through different steps:

  1. run some application code to spin up a Docker image into a container

  2. execute git inside the container to clone the repository

  3. analyze and parse files (.readthedocs.yaml) from the repository outside the container

  4. spin up a new Docker container based on the config file

  5. create the environment and install docs’ dependencies inside the container

  6. execute build commands inside the container

  7. push the output generated by build commands to the storage

All those steps depends on specific commands versions: git, python, virtualenv, conda, etc. Currently, we are pinning only a few of them in our Docker images and that have caused issues when re-deploying these images with bugfixes: the images are not reproducible over time.

Note

We have been improving the reproducibility of our images by adding some tests cases. These are run inside the Docker image after it’s built and check that it contains the versions we expect.

To allow users to pin the image we ended up exposing three images: stable, latest and testing. With that naming, we were able to bugfix issues and add more features on each image without asking the users to change the image selected in their config file.

Then, when a completely different image appeared and after testing testing image enough, we discarded stable, old latest became the new stable and old testing became the new latest. This produced issues to people pinning their images to any of these names because after this change, we changed all the images for all the users and many build issues arrised!

Goals

  • release completely new Docker images without forcing users to change their pinned image (stable, latest, testing)

  • allow users to select language requirements instead of an image name

  • use a base image with the dependencies that don’t change frequently (OS and base requirements)

  • base image naming is tied to the OS version (e.g. Ubuntu LTS)

  • allow us to add/update a Python version without affecting the base image

  • reduce size on builder VM disks by sharing Docker image layers

  • allow users to specify extra languages (e.g. nodejs, rust, go)

  • de-motivate the usage of stable, latest and testing; and promote declaring language requirements instead

  • new images won’t contain old/deprecated OS (eg. Ubuntu 18) and Python versions (eg. 3.5, miniconda2)

  • install language requirements at built time using asdf and its plugins

  • create local mirrors for all languages supported

  • deleting a pre-built image won’t make builds to fail; only make them slower

  • support only the latest Ubuntu LTS version and keep the previous one as long as it’s officially supported

Non goals

  • allow creation/usage of custom Docker images

  • allow to execute arbitrary commands via hooks (eg. pre_build)

  • automatically build & push all images on commit

  • pre-built multiple images for all the languages combinations

Pre-built build image structure

The new pre-built images will depend only on the Ubuntu OS. They will contain all the requirements to add extra languages support at built time via asdf command.

  • ubuntu20-base

    • labels

    • environment variables

    • system dependencies

    • install requirements

    • LaTeX dependencies (for PDF generation)

    • languages version manager (asdf) and its plugins for each language

    • UID and GID

Instead of building all the Docker image per language versions combination, it will be easier to install all of them at build time using the same steps. Installing a language only adds a few seconds when binaries are provided. However, to reduce the time to install these languages as much as possible, a local mirror hosted on S3 for each language will be created.

It’s important to note that Python does not provide binaries and compiling a version takes around ~2 minutes. However, the Python versions could be pre-compiled and expose their binaries via S3 to builders. Then, at build time, the builder will only download the binary and copy it in the correct path.

Note

Depending on the demand, Read the Docs may pre-build the most common combinations of languages used by users. For example, ubuntu20+python39+node14 or ubuntu20+python39+node14+rust1. However, this is seen as an optimization for the future and it’s not required for this document.

Build steps

With this new approach, the steps followed by a builder will be:

  1. run some application code to spin up the -base Docker image into a container

  2. execute git inside the container to clone the repository

  3. analyze and parse files (.readthedocs.yaml) from the repository outside the container

  4. spin up a new Docker container based on the Ubuntu OS specified in the config file

  5. install all language dependencies from the cache

  6. create the environment and install docs’ dependencies inside the container

  7. execute build commands inside the container

  8. push the output generated by build commands to the storage

The main difference with the current approach are:

  • the image to spin up is selected depending on the OS version

  • all language dependencies are installed at build time

  • languages not offering binaries are pre-compiled by Read the Docs and stored in the cache

  • miniconda/mambaforge are now managed with the same management tool (e.g. asdf install python miniconda3-4.7.12)

Specifying extra languages requirements

Different users may have different requirements. People with specific language dependencies will be able to install them by using .readthedocs.yaml config file. Example:

build:
  os: ubuntu20
  languages:
    python: "3.9"  # supports "pypy3", "miniconda3" and "mambaforge"
    nodejs: "14"
    rust: "1.54"
    golang: "1.17"

Important highlights:

  • do not treat Python language different from the others (will help us to support other non-Python doctools in the future)

  • specifying build.languages.python: "3" will use Python version 3.x.y, and may differ between builds

  • specifying build.languages.python: "3.9" will use Python version 3.9.y, and may differ between builds

  • specifying build.languages.nodejs: "14" will use nodejs version 14.x.y, and may differ between builds

  • if no full version is declared, it will try first latest available on our cache, and then the latest on asdf (it has to match the first part of the version declared)

  • specifying minor language versions is not allowed (e.g. 3.7.11)

  • not specifying build.os will make the config file parser to fail

  • not specifying build.languages will make the config file parsing to fail (at least one is required)

  • specifying only build.languages.nodejs and using Sphinx to build the docs, will make the build to fail (e.g. “Command not found”)

  • build.image is incompatible with build.os or build.languages and will produce an error

  • python.version is incompatible with build.os or build.languages and will produce an error

  • Ubuntu 18 will still be available via stable and latest images, but not in new ones

  • only a subset (not defined yet) of python, nodejs, rust and go versions on asdf are available to select

Note

We are moving away from users specifying a particular Docker image. With the new approach, users will specify the languages requirements they need, and Read the Docs will decide if it will use a pre-built image or will spin up the base one and install these languages on the fly.

However, build.image will be still available for backward compatibility with stable, latest and testing but won’t support the new build.languages config.

Note that knowing exactly what packages users are installing, could allow us to pre-build the most common combinations used images: ubuntu20+py39+node14.

Time required to install languages at build time

Testings using time command in ASG instances to install extra languages took these “real” times:

  • build-default

    • python 3.9.6: 2m21.331s

    • mambaforge 4.10.1: 0m26.291s

    • miniconda3 4.7.12: 0m9.955s

    • nodejs 14.17.5: 0m5.603s

    • rust 1.54.0: 0m13.587s

    • golang 1.17: 1m30.428s

  • build-large

    • python 3.9.6: 2m33.688s

    • mambaforge 4.10.1: 0m28.781s

    • miniconda3 4.7.12: 0m10.551s

    • nodejs 14.17.5: 0m6.136s

    • rust 1.54.0: 0m14.716s

    • golang 1.17: 1m36.470s

Note that the only one that required compilation was Python. All the others, spent 100% of its time downloading the binary. These download times are way better from EU with a home internet connection.

In the worst scenario: “none of the specified language version has a pre-built image”, the build will require ~5 minutes to install all the language requirements. By providing only pre-built images with the Python version (that’s the most time consuming), build times will only require ~2 minutes to install the others. However, requiring one version of each language is not a common case.

Cache language binaries on S3

asdf scripts can be altered to download the .tar.gz dist files from a different mirror than the official one. Read the Docs can make usage of this to create a mirror hosted locally on S3 to get faster download speeds. This will make a good improvement for languages that offer binaries: nodejs, rust and go:

However, currently Python does not offer binaries and a different solution is needed. Python versions can be pre-compiled once and expose the output on the S3 for the builders to download and extract in the correct PATH.

Tip

Since we are building a special cache for pre-compiled Python versions, we could use the same method for all the other languages instead of creating a full mirror (many Gigabyes) This simple bash script download the language sources, compiles it and upload it to S3 without requiring a mirror. Note that it works in the same way for all the languages, not just for Python.

Questions

What Python versions will be pre-compiled and cached?

At start only a small subset of Python version will be pre-compiled:

  • 2.7.x

  • 3.7.x

  • 3.8.x

  • 3.9.x

  • 3.10.x

  • pypy3.x

How do we upgrade a Python version?

Python patch versions can be upgraded by re-compiling the new patch version and making it available in our cache. For example, if version 3.9.6 is the one available and 3.9.7 is released, after updating our cache:

  • users specifying build.languages.python: "3.9" will get the 3.9.7 version

  • users specifying build.languages.python: "3" will get the 3.9.7 version

As we will have control over these version, we can decide when to upgrade (if ever required) and we can roll back if the new pre-compiled version was built with a problem.

Note

Python versions may need to be re-compiled each time that the -base image is re-built. This is due that some underlying libraries that Python depend on may have changed.

Note

Installing always the latest version is harder to maintain. It will require building the newest version each time a new patch version is released. Beacause of that, Read the Docs will always be behind official releases. Besides, it will give projects different versions more often.

Exposing to the user the patch version would require to cache many different versions ourselves, and if the user selects one patched version that we don’t have cached by mistake, those builds will add extra build time.

How do we add a Python version?

Adding a new Python version requires:

  • pre-compile the desired version for each Ubuntu OS version supported

  • upload the compressed output to S3

  • add the supported version to the config file validator

How do we remove an old Python version?

At some point, an old version of Python will be deprecated (eg. 3.4) and will be removed. To achieve this, we can just remove the pre-compiled Python version from the cache.

However, unless it’s strictly neeed for some specific reason, we shouldn’t require to remove support for a Python version as long as we support the Ubuntu OS version where this version was compiled for.

In any case, we will know which projects are using these versions because they are pinning these specific versions in the config file. We could show a message in the build output page and also send them an email with the EOL date for this image.

However, removing pre-compiled Python version that it’s being currently used by some users won’t make their builds to fail. Instead, that Python version will be compiled and installed at build time; adding a “penalization” time to those projects and motivating them to move forward to a newer version.

How do we upgrade system versions?

We usually don’t upgrade these dependencies unless we upgrade the Ubuntu version. So, they will be only upgraded when we go from Ubuntu 18.04 LTS to Ubuntu 20.04 LTS for example.

Examples of these versions are:

  • doxygen

  • git

  • subversion

  • pandoc

  • swig

  • latex

This case will introduce a new base image. Example, ubuntu22-base in 2022. Note that these images will be completely isolated from the rest and don’t require them to rebuild. This also allow us to start testing a newer Ubuntu version (e.g. 22.04 LTS) without breaking people’s builds, even before it’s officially released.

How do we add an extra requirement?

In case we need to add an extra requirement to the base image, we will need to rebuild all of them. The new image may have different package versions since there may be updates on the Ubuntu repositories. This conveys some risk here, but in general we shouldn’t require to add packages to the base images.

In case we need an extra requirement for all our images, I’d recommend to add it when creating a new base image.

If it’s strongly needed and we can’t wait for a new base image, we could install it at build time in a similar way as we do with build.apt_packages as a temporal workaround.

How do we create a mirror for each language?

A mirror can be created with wget together with rclone:

  1. Download all the files from the official mirror:

    # https://stackoverflow.com/questions/29802579/create-private-mirror-of-http-nodejs-org-dist
    wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -e robots=off http://nodejs.org/dist
    
  2. Upload all the files to S3:

    # https://rclone.org/s3/
    rclone sync -i nodejs.org s3:languages
    

Note

Downloading a copy of the official mirror took 15m and 52Gb.

How local development will work with the new approach?

Local development will require scripts to clone the official mirrors for each language and upload them to MinIO (S3). Besides, a script to define a set of Python version, pre-compile them and also upload them to S3.

This is already covered by this simple bash script and tested in this PR with a POC: https://github.com/readthedocs/readthedocs.org/pull/8453

Deprecation plan

After this design document gets implemented and tested, all our current images (stable, latest, testing) will be deprecated and their usage will be de-motivated. However, we could keep them on our builders to give users a good time to migrate their projects to the new ones.

We may want to keep only the latest Ubuntu LTS release available in production, with a special consideration for our current Ubuntu 18.04 LTS on stable, latest and testing because 100% of the projects depend on them currently. Once Ubuntu 22.04 LTS is released, we should communicate that Ubuntu 20.04 LTS is deprecated, and keep it available in our servers during the time that’s officially supported by Ubuntu during the “Maintenance updates” (see “Login term support and interim releases” in https://ubuntu.com/about/release-cycle). As an example, Ubuntu 22.04 LTS will be officially released on April 2022 and we will offer support for it until 2027.

Warning

Deleting -base images from the build servers will make project’s builds to fail. We want to keep supporting them as much as we can, but having a well-defined deprecation policy is a win.

Work required and rollout plan

The following steps are required to support the full proposal of this document.

  1. allow users to install extras languages requirements via config file

    • update config file to support build.os and build.languages config

    • modify builder code to run asdf install for all supported languages

  2. build a new base Docker image with new structure (ubuntu20-base)

    • build new image with Ubuntu 20.04 LTS and pre-installed asdf with all its plugins

    • do not install any language version on base image

    • deploy builders with new base image

At this point, we will have a full working setup. It will be opt-in by using the new configs build.os and build.languages. However, all languages will be installed at build time; which will “penalize” all projects because all of them will have to install Python.

After testing this for some time, we can continue with the following steps that provides a cache to optimize installation times:

  1. create mirrors on S3 for all supported languages

  2. create mirror for pre-compiled latest 3 Python versions, Python 2.7 and PyPy3

Conclusion

There is no need to differentiate the images by its state (stable, latest, testing) but by its main base differences: OS. The version of the OS will change many library versions, LaTeX dependencies, basic required commands like git and more, that doesn’t seem to be useful to have the same OS version with different states.

Allowing users to install extra languages by using the Config File will cover most of the support requests we have had in the past. It also will allow us to know more about how our users are using the platform to make future decisions based on this data. Exposing users how we want them to use our platform will allow us to be able to maintain it longer, than giving the option to select a specific Docker image by name that we can’t guarrantee it will be frozen.

Finally, having the ability to deprecate and remove pre-built images from our builders over time, will reduce the maintainance work required from the the core team. We can always support all the languages versions by installing them at build time. The only required pre-built image for this are the OS -base images. In fact, even after decided to deprecate and removed a pre-built image from the builders, we can re-build it if we find that it’s affecting many projects and slowing down their builds too much, causing us problems.

Embed APIv3

The Embed API allows users to embed content from documentation pages in other sites. It has been treated as an experimental feature without public documentation or real applications, but recently it started to be used widely (mainly because we created the hoverxref Sphinx extension).

The main goal of this document is to design a new version of the Embed API to be more user friendly, make it more stable over time, support embedding content from pages not hosted at Read the Docs, and remove some quirkiness that makes it hard to maintain and difficult to use.

Note

This work is part of the CZI grant that Read the Docs received.

Current implementation

The current implementation of the API is partially documented in How to embed content from your documentation. It has some known problems:

  • There are different ways of querying the API: ?url= (generic) and ?doc= (relies on Sphinx’s specific concept)

  • Doesn’t support MkDocs

  • Lookups are slow (~500 ms)

  • IDs returned aren’t well formed (like empty IDs "headers": [{"title": "#"}])

  • The content is always an array of one element

  • It tries different variations of the original ID

  • It doesn’t return valid HTML for definition lists (dd tags without a dt tag)

Goals

We plan to add new features and define a contract that works the same for all HTML. This project has the following goals:

  • Support embedding content from pages hosted outside Read the Docs

  • Do not depend on Sphinx .fjson files

  • Query and parse the .html file directly (from our storage or from an external request)

  • Rewrite all links returned in the content to make them absolute

  • Require a valid HTML id selector

  • Accept only ?url= request GET argument to query the endpoint

  • Support ?nwords= and ?nparagraphs= to return chunked content

  • Handle special cases for particular doctools (e.g. Sphinx requires to return the .parent() element for dl)

  • Make explicit the client is asking to handle the special cases (e.g. send ?doctool=sphinx&version=4.0.1&writer=html4)

  • Delete HTML tags from the original document (for well-defined special cases)

  • Add HTTP cache headers to cache responses

  • Allow CORS from everywhere only for public projects

The contract

Return the HTML tag (and its children) with the id selector requested and replace all the relative links from its content making them absolute.

Note

Any other case outside this contract will be considered special and will be implemented only under ?doctool=, ?version= and ?writer= arguments.

If no id selector is sent to the request, the content of the first meaningful HTML tag (<main>, <div role="main"> or other well-defined standard tags) identifier found is returned.

Embed endpoints

This is the list of endpoints to be implemented in APIv3:

GET /api/v3/embed/

Returns the exact HTML content for a specific identifier (id). If no anchor identifier is specified the content of the first one returned.

Example request:

$ curl https://readthedocs.org/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment

Example response:

{
   "project": "docs",
   "version": "latest",
   "language": "en",
   "path": "development/install.html",
   "title": "Development Installation",
   "url": "https://docs.readthedocs.io/en/latest/install.html#set-up-your-environment",
   "id": "set-up-your-environment",
   "content": "<div class=\"section\" id=\"development-installation\">\n<h1>Development Installation<a class=\"headerlink\" href=\"https://docs.readthedocs.io/en/stable/development/install.html#development-installation\" title=\"Permalink to this headline\">¶</a></h1>\n ..."
}
Query Parameters:
  • (required) (url) – Full URL for the documentation page with optional anchor identifier.

GET /api/v3/embed/metadata/

Returns all the available metadata for an specific page.

Note

As it’s not trivial to get the title associated with a particular id and it’s not easy to get a nested list of identifiers, we may not implement this endpoint in initial version.

The endpoint as-is, is mainly useful to explore/discover what are the identifiers available for a particular page –which is handy in the development process of a new tool that consumes the API. Because of this, we don’t have too much traction to add it in the initial version.

Example request:

$ curl https://readthedocs.org/api/v3/embed/metadata/?url=https://docs.readthedocs.io/en/latest/development/install.html

Example response:

{
  "identifiers": {
      "id": "set-up-your-environment",
      "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
      "_links": {
          "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment"
      }
  },
  {
      "id": "check-that-everything-works",
      "url": "https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
      "_links": {
          "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works"
      }
   },
}
Query Parameters:
  • (required) (url) – Full URL for the documentation page

Handle specific Sphinx cases

We are currently handling some special cases for Sphinx due how it writes the HTML output structure. In some cases, we look for the HTML tag with the identifier requested but we return the .next() HTML tag or the .parent() tag instead of the requested one.

Currently, we have identified that this happens for definition tags (dl, dt, dd) –but may be other cases we don’t know yet. Sphinx adds the id= attribute to the dt tag, which contains only the title of the definition, but as a user, we are expecting the description of it.

In the following example we will return the whole dl HTML tag instead of the HTML tag with the identifier id="term-name" as requested by the client, because otherwise the “Term definition for Term Name” content won’t be included and the response would be useless.

<dl class="glossary docutils">
  <dt id="term-name">Term Name</dt>
  <dd>Term definition for Term Name</dd>
</dl>

If the definition list (dl) has more than one definition it will return only the term requested. Considering the following example, with the request ?url=glossary.html#term-name

<dl class="glossary docutils">
  ...

  <dt id="term-name">Term Name</dt>
  <dd>Term definition for Term Name</dd>

  <dt id="term-unknown">Term Unknown</dt>
  <dd>Term definition for Term Unknown </dd>

  ...
</dl>

It will return the whole dl with only the dt and dd for id requested:

<dl class="glossary docutils">
  <dt id="term-name">Term Name</dt>
  <dd>Term definition for Term Name</dd>
</dl>

However, this assumptions may not apply to documentation pages built with a different doctool than Sphinx. For this reason, we need to communicate to the API that we want to handle this special cases in the backend. This will be done by appending a request GET argument to the Embed API endpoint: ?doctool=sphinx&version=4.0.1&writer=html4. In this case, the backend will known that has to deal with these special cases.

Note

This leaves the door open to be able to support more special cases (e.g. for other doctools) without breaking the actual behavior.

Support for external documents

When the ?url= argument passed belongs to a documentation page not hosted on Read the Docs, the endpoint will do an external request to download the HTML file, parse it and return the content for the identifier requested.

The whole logic should be the same, the only difference would be where the source HTML comes from.

Warning

We should be careful with the URL received from the user because those may be internal URLs and we could be leaking some data. Example: ?url=http://localhost/some-weird-endpoint or ?url=http://169.254.169.254/latest/meta-data/ (see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html).

This is related to SSRF (https://en.wikipedia.org/wiki/Server-side_request_forgery). It doesn’t seem to be a huge problem, but something to consider.

Also, the endpoint may need to limit the requests per-external domain to avoid using our servers to take down another site.

Note

Due to the potential security issues mentioned, we will start with an allowed list of domains for common Sphinx docs projects. Projects like Django and Python, where sphinx-hoverxref users might commonly want to embed from. We aren’t planning to allow arbitrary HTML from any website.

Handle project’s domain changes

The proposed Embed APIv3 implementation only allows ?url= argument to embed content from that page. That URL can be:

  • a URL for a project hosted under <project-slug>.readthedocs.io

  • a URL for a project with a custom domain

In the first case, we can easily get the project’s slug directly from the URL. However, in the second case we get the project’s slug by querying our database for a Domain object with the full domain from the URL.

Now, consider that all the links in the documentation page that uses Embed APIv3 are pointing to docs.example.com and the author decides to change the domain to be docs.newdomain.com. At this point there are different possible scenarios:

  • The user creates a new Domain object with docs.newdomain.com as domain’s name. In this case, old links will keep working because we still have the old Domain object in our database and we can use it to get the project’s slug.

  • The user deletes the old Domain besides creating the new one. In this scenario, our query for a Domain with name docs.example.com to our database will fail. We will need to do a request to docs.example.com and check for a 3xx response status code and in that case, we can read the Location: HTTP header to find the new domain’s name for the documentation. Once we have the new domain from the redirect response, we can query our database again to find out the project’s slug.

    Note

    We will follow up to 5 redirects to find out the project’s domain.

Embed APIv2 deprecation

The v2 is currently widely used by projects using the sphinx-hoverxref extension. Because of that, we need to keep supporting it as-is for a long time.

Next steps on this direction should be:

  • Add a note in the documentation mentioning this endpoint is deprecated

  • Promote the usage of the new Embed APIv3

  • Migrate the sphinx-hoverxref extension to use the new endpoint

Once we have done them, we could check our NGINX logs to find out if there are people still using APIv2, contact them and let them know that they have some months to migrate since the endpoint is deprecated and will be removed.

Unanswered questions

  • How do we distinguish between our APIv3 for resources (models in the database) from these “feature API endpoints”?

Future builder

This document is a continuation of Santos’ work about “Explicit Builders”. It builds on top of that document some extra features and makes some decisions about the final goal, proposing a clear direction to move forward with intermediate steps keeping backward and forward compatibility.

Note

A lot of things have changed since this document was written. We have had multiple discussions where we already took some decisions and discarded some of the ideas/details proposed here. The document was merged as-is without a cleaned up and there could be some inconsistencies. Note that build.jobs and build.commands are already implemented without definig a contract yet, and with small differences from the idea described here.

Please, refer to the following links to read more about all the discussions we already had:

Goals

  • Keep the current builder working as-is

  • Keep backward and forward (with intermediate steps) compatibility

  • Define a clear support for newbie, intermediate and advanced users

  • Allow users to override a command, run pre/post hook commands or define all commands by themselves

  • Remove the Read the Docs requirement of having access to the build process

  • Translate our current magic at build time to a defined contract with the user

  • Provide a way to add a command argument without implementing it as a config file (e.g. fail_on_warning)

  • Define a path forward towards supporting other tools

  • Re-write all readthedocs-sphinx-ext features to post-processsing HTML features

  • Reduce complexity maintained by Read the Docs’ core team

  • Make Read the Docs responsible for Sphinx support and delegate other tools to the community

  • Eventually support upload pre-build docs

  • Allow us to add a feature with a defined contract without worry about breaking old builds

  • Introduce build.builder: 2 config (does not install pre-defined packages) for these new features

  • Motivate users to migrate to v2 to finally deprecate this magic by educating users

Steps ran by the builder

Read the Docs currently controls all the build process. Users are only allowed to modify very limited behavior by using a .readthedocs.yaml file. This drove us to implement features like sphinx.fail_on_warning, submodules, among others, at a high implementation and maintenance cost to the core team. Besides, this hasn’t been enough for more advanced users that require more control over these commands.

This document proposes to clearly define the steps the builder ran and allow users to override them depending on their needings:

  • Newbie user / simple platform usage: Read the Docs controls all the commands (current builder)

  • Intermediate user: ability to override one or more commands plus running pre/post hooks

  • Advanced user: controls all the commands executed by the builder

The steps identified so far are:

  1. Checkout

  2. Expose project data via environment variables (*)

  3. Create environment (virtualenv / conda)

  4. Install dependencies

  5. Build documentation

  6. Generate defined contract (metadata.yaml)

  7. Post-process HTML (*)

  8. Upload to storage (*)

Steps marked with (*) are managed by Read the Docs and can’t be overwritten.

Defined contract

Projects building on Read the Docs must provide a metadata.yaml file after running their last command. This file contains all the data required by Read the Docs to be able to add its integrations. If this file is not provided or malformed, Read the Docs will fail the build and stop the process communicating to the user that there was a problem with the metadata.yaml and we require them to fix the problem.

Note

There is no restriction about how this file is generated (e.g. generated with Python, Bash, statically uploaded to the repository, etc) Read the Docs does not have control over it and it’s only responsible for generating it when building with Sphinx.

The following is an example of a metadata.yaml that is generated by Read the Docs when building Sphinx documentation:

# metadata.yaml
version: 1
tool:
  name: sphinx
  version: 3.5.1
  builder: html
readthedocs:
  html_output: ./_build/html/
  pdf_output: ./_build/pdf/myproject.pdf
  epub_output: ./_build/pdf/myproject.epub
  search:
    enabled: true
    css_identifier: #search-form > input[name="q"]
  analytics: false
  flyout: false
  canonical: docs.myproject.com
  language: en

Warning

The metadata.yaml contract is not defined yet. This is just an example of what we could expect from it to be able to add our integrations.

Config file

As we mentioned, we want all users to use the same config file and have a clear way to override commands as they need. This will be done by using the current .readthedocs.yaml file that we already have by adding two new keys: build.jobs and build.commands.

If neither build.jobs or build.commands are present in the config file, Read the Docs will execute the builder we currently support without modification, keeping compatibility with all projects already building successfully.

When users make usage of jobs: or commands: keys we are not responsible for them in case they fail. In these cases, we only check for a metadata.yaml file and run our code to add the integrations.

build.jobs

It allows users to execute one or multiple pre/post hooks and/or overwrite one or multiple commands. These are some examples where this is useful:

# .readthedocs.yaml
build:
  builder: 2
  jobs:
    pre_checkout:
    checkout: git clone --branch main https://github.com/readthedocs/readthedocs.org
    post_checkout:
    pre_create_environment:
    create_environment: python -m virtualenv venv
    post_create_environment:
    pre_install:
    install: pip install -r requirements.txt
    post_install:
    pre_build:
    build:
      html: sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
      pdf: latexmk -r latexmkrc -pdf -f -dvi- -ps- -jobname=test-builds -interaction=nonstopmode
      epub: sphinx -T -j auto -b epub -d _build/doctrees -D language=en . _build/epub
    post_build:
    pre_metadata:
    metadata: ./metadata_sphinx.py
    post_medatada:

Note

All these commands are executed passing all the exposed environment variables.

If the user only provides a subset of these jobs, we ran our default commands if the user does not provide them (see Steps ran by the builder). For example, the following YAML is enough when the project requires running Doxygen as a pre-build step:

# .readthedocs.yaml
build:
  builder: 2
  jobs:
    # https://breathe.readthedocs.io/en/latest/readthedocs.html#generating-doxygen-xml-files
    pre_build: cd ../doxygen; doxygen
build.commands

It allows users to have full control over the commands executed in the build process. These are some examples where this is useful:

  • project with a custom build process that does map ours

  • specific requirements that we can’t/want to cover as a general rule

  • build documentation with a different tool than Sphinx

# .readthedocs.yaml
build:
  builder: 2
  commands:
    - git clone --branch main https://github.com/readthedocs/readthedocs.org
    - pip install -r requirements.txt
    - sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
    - ./metadata.py

Intermediate steps for rollout

  1. Remove all the exposed data in the conf.py.tmpl file and move it to metadata.yaml

  2. Define structure required for metadata.yaml as contract

  3. Define the environment variables required (e.g. some from html_context) and execute all commands with them

  4. Build documentation using this contract

  5. Leave readthedocs-sphinx-ext as the only package installed and extension install in conf.py.tmpl

  6. Add build.builder: 2 config without any magic

  7. Build everything needed to support build.jobs and build.commands keys

  8. Write guides about how to use the new keys

  9. Re-write readthedocs-sphinx-ext features to post-process HTML features

Final notes

  • The migration path from v1 to v2 will require users to explicitly specify their requirements (we don’t install pre-defined packages anymore)

  • We probably not want to support build.jobs on v1 to reduce core team’s time maintaining that code without the ability to update it due to projects randomly breaking.

  • We would be able to start building documentation using new tools without having to integrate them.

  • Building on Read the Docs with a new tool will require: - the user to execute a different set of commands by overriding the defaults. - the project/build/user to expose a metadata.yaml with the contract that Read the Docs expects. - none, some or all the integrations will be added to the HTML output (these have to be implemented at Read the Docs core)

  • We are not responsible for extra formats (e.g. PDF, ePub, etc) on other tools.

  • Focus on support Sphinx with nice integrations made in a tool-agnostic way that can be re-used.

  • Removing the manipulation of conf.py.tmpl does not require us to implement the same manipulation for projects using the new potential feature sphinx.yaml file.

In-doc search UI

Giving readers the ability to easily search the information that they are looking for is important for us. We have already upgraded to the latest version of Elasticsearch and we plan to implement search as you type feature for all the documentations hosted by us. It will be designed to provide instant results as soon as the user starts typing in the search bar with a clean and minimal frontend. This design document aims to provides the details of it. This is a GSoC’19 project.

Warning

This design document details future features that are not yet implemented. To discuss this document, please get in touch in the issue tracker.

The final result may look something like this:

_images/in-doc-search-ui-demo.gif

Short demo

Goals And non-Goals

Project goals
  • Support a search-as-you-type/autocomplete interface.

  • Support across all (or virtually all) Sphinx themes.

  • Support for the JavaScript user experience down to IE11 or graceful degradation where we can’t support it.

  • Project maintainers should have a way to opt-in/opt-out of this feature.

  • (Optional) Project maintainers should have the flexibility to change some of the styles using custom CSS and JS files.

Non-goals
  • For the initial release, we are targeting only Sphinx documentations as we don’t index MkDocs documentations to our Elasticsearch index.

Existing search implementation

We have a detailed documentation explaining the underlying architecture of our search backend and how we index documents to our Elasticsearch index. You can read about it here.

Proposed architecture for in-doc search UI

Frontend
Technologies

Frontend is to designed in a theme agnostics way. For that, we explored various libraries which may be of use but none of them fits our needs. So, we might be using vanilla JavaScript for this purpose. This will provide us some advantages over using any third party library:

  • Better control over the DOM.

  • Performance benefits.

Proposed architecture

We plan to select the search bar, which is present in every theme, and use the querySelector() method of JavaScript. Then add an event listener to it to listen for the changes and fire a search query to our backend as soon as there is any change. Our backend will then return the suggestions, which will be shown to the user in a clean and minimal UI. We will be using document.createElement() and node.removeChild() method provided by JavaScript as we don’t want empty <div> hanging out in the DOM.

We have a few ways to include the required JavaScript and CSS files in all the projects:

  • Add CSS into readthedocs-doc-embed.css and JS into readthedocs-doc-embed.js and it will get included.

  • Package the in-doc search into it’s own self-contained CSS and JS files and include them in a similar manner to readthedocs-doc-embed.*.

  • It might be possible to package up the in-doc CSS/JS as a sphinx extension. This might be nice because then it’s easy to enable it on a per-project basis. When we are ready to roll it out to a wider audience, we can make a decision to just turn it on for everybody (put it in here) or we could enable it as an opt-in feature like the 404 extension.

UI/UX

We have two ways which can be used to show suggestions to the user.

  • Show suggestions below the search bar.

  • Open a full page search interface when the user click on search field.

Backend

We have a few options to support search as you type feature, but we need to decide that which option would be best for our use-case.

Edge NGram Tokenizer
  • Pros

    • More effective than Completion Suggester when it comes to autocompleting words that can appear in any order.

    • It is considerable fast because most of the work is being done at index time, hence the time taken for autocompletion is reduced.

    • Supports highlighting of the matching terms.

  • Cons

    • Requires greater disk space.

Completion suggester
  • Pros

    • Really fast as it is optimized for speed.

    • Does not require large disk space.

  • Cons

    • Matching always starts at the beginning of the text. So, for example, “Hel” will match “Hello, World” but not “World Hello”.

    • Highlighting of the matching words is not supported.

    • According to the official docs for Completion Suggester, fast lookups are costly to build and are stored in-memory.

Milestones

Milestone

Due Date

A local implementation of the project.

12th June, 2019

In-doc search on a test project hosted on Read the Docs using the RTD Search API.

20th June, 2019

In-doc search on docs.readthedocs.io.

20th June, 2019

Friendly user trial where users can add this on their own docs.

5th July, 2019

Additional UX testing on the top-10 Sphinx themes.

15th July, 2019

Finalize the UI.

25th July, 2019

Improve the search backend for efficient and fast search results.

10th August, 2019

Open questions
  • Should we rely on jQuery, any third party library or pure vanilla JavaScript?

  • Are the subprojects to be searched?

  • Is our existing Search API is sufficient?

  • Should we go for edge ngrams or completion suggester?

Notification system: a new approach after lot of discussions

Notifications have been a recurrent topic in the last years. We have talked about different problems and solution’s approaches during these years. However, due to the complexity of the change, and without having a clear path, it has been hard to prioritize.

We’ve written a lot about the problems and potential solutions for the current notification system. This is a non-complete list of them just for reference:

At the offsite in Portland, Anthony and myself were able to talk deeply about this and wrote a bunch of thoughts in a Google Doc. We had pretty similar ideas and we thought we were solving most of the problems we identified already.

I read all of these issues and all the discussions I found and wrote this document that summarizes my proposal: create a new notification system that we can customize and expand as we need in the future:

  • A Django model to store the notifications’ data

  • API endpoints to retrieve the notifications for a particular resource (User, Build, Project, Organization)

  • Frontend code to display them (outside the scope of this document)

Goals

  • Keep raising exceptions for errors from the build process

  • Ability to add non-error notifications from the build process

  • Add extra metadata associated to the notification: icon, header, body, etc

  • Support different types of notifications (e.g. error, warning, note, tip)

  • Re-use the new notification system for product updates (e.g. new features, deprecated config keys)

  • Message content lives on Python classes that can be translated and formatted with objects (e.g. Build, Project)

  • Message could have richer content (e.g. HTML code) to generate links and emphasis

  • Notifications have trackable state (e.g. unread (default)=never shown, read=shown, dismissed=don’t show again, cancelled=auto-removed after user action)

  • An object (e.g. Build, Organization) can have more than 1 notification attached

  • Remove hardcoded notifications from the templates

  • Notifications can be attached to Project, Organization, Build and User models

  • Specific notifications can be shown under the user’s bell icon

  • Easy way to cleanup notification on status changes (e.g. subscription failure notification is auto-deleted after CC updated)

  • Notifications attached to Organization/Project dissappear for all the users once they are dismissed by anyone

Non-goals

  • Create new Build “state” or “status” option for these fields

  • Implement the new notification in the old dashboard

  • Define front-end code implementation

  • Replace email or webhook notifications

Small notes and other considerations

  • Django message system is not enough for this purpose.

  • Use a new model to store all the required data (expandable in the future)

  • How do we handle translations? We should use _("This is the message shown to the user") in Python code and return the proper translation when they are read.

  • Reduce complexity on Build object (remove Build.status and Build.error fields among others).

  • Since the Build object could have more than 1 notification, when showing them, we will sort them by importane: errors, warnings, note, tip.

  • In case we need a pretty specific order, we can add an extra field for that, but it adds unnecessary complexity at this point.

  • For those notifications that are attached to the Project or Organization, should it be shown to all the members even if they don’t have admin permissions? If yes, this is good because all of them will be notified but only some of them will be able to take an action. If no, non-admin users won’t see the notification and won’t be able to communicate this to the admins

  • Notification could be attached to a BuildCommand in case we want to display a specific message on a command itself. We don’t know how useful this will be, but it’s something we can consider in the future.

  • Notification preferences: what kind of notifications I want to see in my own bell icon?

    • Build errors

    • Build tips

    • Product updates

    • Blog post news

    • Organization updates

    • Project updates

Implementation ideas

This section shows all the classes and models involved for the notification system as well as some already known use-cases.

Note

Accessing the database from the build process

Builders doesn’t have access to the database due to security reasons. We had solved this limitation by creating an API endpoint the builder hits once they need to interact with the databse to get a Project, Version and Build resources, create a BuildCommand resource, etc.

Besides, the build process is capable to trigger Celery tasks that are useful for managing more complex logic that also require accessing from and writing to the database.

Currently, readthedocs.doc_builder.director.Director and readthedocs.doc_builder.environments.DockerBuildEnvironment have access to the API client and can use it to create the Notification resources.

I plan to use the same pattern to create Notification resources by hitting the API from the director or the build environment. In case we require hitting the API from other places, we will need to pass the API client instance to those other classes as well.

Message class definition

This class encapsulates the content of the notification (e.g. header, body, icon, etc) –the message it’s shown to the uer–, and some helper logic to return in the API response.

class Message:
    def __init__(self):
        header = str
        body = str
        icon = str
        icon_style = str(SOLID, DUOTONE)
        type = str(ERROR, WARINIG, NOTE, TIP)

    def get_display_icon(self):
        if self.icon:
            return self.icon

        if self.type == ERROR:
            return "fa-exclamation"
        if self.type == WARNING:
            return "fa-triangle-exclamation"
Definition of notifications to display to users

This constant defines all the possible notifications to be displayed to the user. Each notification has to be defined here using the Message class previously defined.

NOTIFICATION_MESSAGES = {
    "generic-with-build-id": Message(
        header=_("Unknown problem"),
        # Note the message receives the instance it's attached to
        # and could be use it to inject related data
        body=_(
            """
      There was a problem with Read the Docs while building your documentation.
      Please try again later.
      If this problem persists,
      report this error to us with your build id ({instance[pk]}).
    """,
            type=ERROR,
        ),
    ),
    "build-os-required": Message(
        header=_("Invalid configuration"),
        body=_(
            """
      The configuration key "build.os" is required to build your documentation.
      <a href='https://docs.readthedocs.io/en/stable/config-file/v2.html#build-os'>Read more.</a>
    """,
            type=ERROR,
        ),
    ),
    "cancelled-by-user": Message(
        header=_("User action"),
        body=_(
            """
      Build cancelled by the user.
    """,
            type=ERROR,
        ),
    ),
    "os-ubuntu-18.04-deprecated": Message(
        header=_("Deprecated OS selected"),
        body=_(
            """
      Ubuntu 18.04 is deprecated and will be removed soon.
      Update your <code>.readthedocs.yaml</code> to use a newer image.
    """,
            type=TIP,
        ),
    ),
}
Notification model definition

This class is the representation of a notification attached to an resource (e.g. User, Build, etc) in the database. It contains an identifier (message_id) pointing to one of the messages defined in the previous section (key in constant NOTIFICATION_MESSAGES).

import textwrap
from django.utils.translation import gettext_noop as _


class Notification(TimeStampedModel):
    # Message identifier
    message_id = models.CharField(max_length=128)

    # UNREAD: the notification was not shown to the user
    # READ: the notifiation was shown
    # DISMISSED: the notification was shown and the user dismissed it
    # CANCELLED: removed automatically because the user has done the action required (e.g. paid the subscription)
    state = models.CharField(
        choices=[UNREAD, READ, DISMISSED, CANCELLED],
        default=UNREAD,
        db_index=True,
    )

    # Makes the notification imposible to dismiss (useful for Build notifications)
    dismissable = models.BooleanField(default=False)

    # Show the notification under the bell icon for the user
    news = models.BooleanField(default=False, help_text="Show under bell icon")

    # Notification attached to
    #
    # Uses ContentType for this.
    # https://docs.djangoproject.com/en/4.2/ref/contrib/contenttypes/#generic-relations
    #
    attached_to_content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    attached_to_id = models.PositiveIntegerField()
    attached_to = GenericForeignKey("attached_to_content_type", "attached_to_id")

    # If we don't want to use ContentType, we could define all the potential models
    # the notification could be attached to
    #
    # organization = models.ForeignKey(Organization, null=True, blank=True, default=None)
    # project = models.ForeignKey(Project, null=True, blank=True, default=None)
    # build = models.ForeignKey(Build, null=True, blank=True, default=None)
    # user = models.ForeignKey(User, null=True, blank=True, default=None)

    def get_display_message(self):
        return textwrap.dedent(
            NOTIFICATION_MESSAGES.get(self.message_id).format(
                instance=self.attached_to,  # Build, Project, Organization, User
            )
        )
Attach error Notification during the build process

During the build, we will keep raising exceptions to both things:

  • stop the build process immediately

  • communicate back to the doc_builder.director.Director class the build failed.

The director is the one in charge of creating the error Notification, in a similar way it currently works now. The only difference is that instead of saving the error under Build.error as it currently works now, it will create a Notification object and attach it to the particular Build. Note the director does not have access to the DB, so it will need to create/associate the object via an API endpoint/Celery task.

Example of how the exception BuildCancelled creates an error Notification:

class UpdateDocsTask(...):
    def on_failure(self):
        self.data.api_client.build(self.data.build["id"]).notifications.post(
            {
                "message_id": "cancelled-by-user",
                # Override default fields if required
                "type": WARNING,
            }
        )
Attach non-error Notification during the build process

During the build, we will be able attach non-error notifications with the following pattern:

  • check something in particular (e.g. using a deprecated key in readthedocs.yaml)

  • create a non-error Notification and attach it to the particular Build object

class DockerBuildEnvironment(...):
    def check_deprecated_os_image(self):
        if self.config.build.os == "ubuntu-18.04":
            self.api_client.build(self.data.build["id"]).notifications.post(
                {
                    "message_id": "os-ubuntu-18.04-deprecated",
                }
            )
Show a Notification under the user’s bell icon

If we want to show a notification on a user’s profile, we can create the notification as follows, maybe from a simple script ran in the Django shell’s console after publishing a blog post:

users_to_show_notification = User.objects.filter(...)

for user in users_to_show_notification:
    Notification.objects.create(
        message_id="blog-post-beta-addons",
        dismissable=True,
        news=True,
        attached_to=User,
        attached_to_id=user.id,
    )
Remove notification on status change

When we show a notification for an unpaid subscription, we want to remove it once the user has updated and paid the subscription. We can do this with the following code:

@handler("customer.subscription.updated", "customer.subscription.deleted")
def subscription_updated_event(event):
    if subscription.status == ACTIVE:
        organization = Organization.objects.get(slug="read-the-docs")

        Notification.objects.filter(
            message_id="subscription-update-your-cc-details",
            state__in=[UNREAD, READ],
            attached_to=Organization,
            attached_to_id=organization.id,
        ).update(state=CANCELLED)

API definition

I will follows the same pattern we have on APIv3 that uses nested endpoints. This means that we will add a /notifications/ postfix to most of the resource endpoints where we want to be able to attach/list notifications.

Notifications list
GET /api/v3/users/(str: user_username)/notifications/

Retrieve a list of all the notifications for this user.

GET /api/v3/projects/(str: project_slug)/notifications/

Retrieve a list of all the notifications for this project.

GET /api/v3/organizations/(str: organization_slug)/notifications/

Retrieve a list of all the notifications for this organization.

GET /api/v3/projects/(str: project_slug)/builds/(int: build_id)/notifications/

Retrieve a list of all the notifications for this build.

Example response:

{
    "count": 25,
    "next": "/api/v3/projects/pip/builds/12345/notifications/?unread=true&sort=type&limit=10&offset=10",
    "previous": null,
    "results": [
        {
            "message_id": "cancelled-by-user",
            "state": "unread",
            "dismissable": false,
            "news": false,
            "attached_to": "build",
            "message": {
                "header": "User action",
                "body": "Build cancelled by the user.",
                "type": "error",
                "icon": "fa-exclamation",
                "icon_style": "duotone",
            }
        }
    ]
}
Query Parameters:
  • unread (boolean) – return only unread notifications

  • type (string) – filter notifications by type (error, note, tip)

  • sort (string) – sort the notifications (type, date (default))

Notification create
POST /api/v3/projects/(str: project_slug)/builds/(int: build_id)/notifications/

Create a notification for the resource. In this example, for a Build resource.

Example request:

{
    "message_id": "cancelled-by-user",
    "type": "error",
    "state": "unread",
    "dismissable": false,
    "news": false,
}

Note

Similar API endpoints will be created for each of the resources we want to attach a Notification (e.g. User, Organization, etc)

Notification update
PATCH /api/v3/projects/(str: project_slug)/builds/(int: build_id)/notifications/(int: notification_id)/

Update an existing notification. Mainly used to change the state from the front-end.

Example request:

{
    "state": "read",
}

Note

Similar API endpoints will be created for each of the resources we want to attach a Notification (e.g. User, Organization, etc)

Backward compatibility

It’s not strickly required, but if we want, we could extract the current notification logic from:

  • Django templates

    • “Don’t want setup.py called?”

    • build.image config key is deprecated

    • Configuration file is required

    • build.commands is a beta feature

  • Build.error fields

    • Build cancelled by user

    • Unknown exception

    • build.os is not found

    • No config file

    • No checkout revision

    • Failed when cloning the repository

    • etc

and iterate over all the Build objects to create a Notification object for each of them.

I’m not planning to implement the “new notification system” in the old templates. It doesn’t make sense to spend time in them since we are deprecating them.

Old builds will keep using the current notification approach based on build.error field. New builds won’t have build.error anymore and they will use the new notification system on ext-theme.

New search API

Goals

  • Allow to configure search at the API level, instead of having the options in the database.

  • Allow to search a group of projects/versions at the same time.

  • Bring the same syntax to the dashboard search.

Syntax

The parameters will be given in the query using the key:value syntax. Inspired by GitHub and other services.

Currently the values from all parameters don’t include spaces, so surrounding the value with quotes won’t be supported (key:"value").

To avoid interpreting a query as a parameter, an escape character can be put in place, for example project\:docs won’t be interpreted as a parameter, but as the search term project:docs. This is only necessary if the query includes a valid parameter, unknown parameters (foo:bar) don’t require escaping.

All other tokens that don’t match a valid parameter, will be join to form the final search term.

Parameters

project:

Indicates the project and version to includes results from (this doesn’t include subprojects). If the version isn’t provided, the default version is used.

Examples:

  • project:docs/latest

  • project:docs

It can be one or more project parameters. At least one is required.

If the user doesn’t have permission over one version or if the version doesn’t exist, we don’t include results from that version. We don’t fail the search, this is so users can use one endpoint for all their users, without worrying about what permissions each user has or updating it after a version or project has been deleted.

The / is used as separator, but it could be any other character that isn’t present in the slug of a version or project. : was considered (project:docs:latest), but it could be hard to read since : is already used to separate the key from the value.

subprojects:

This allows to specify from what project exactly we are going to return subprojects from, and also include the version we are going to try to match. This includes the parent project in the results.

As the project parameter, the version can be optional, and defaults to the default version of the parent project.

user:

Include results from projects the given user has access to. The only supported value is @me, which is an alias for the current user.

Including subprojects

Now that we are returning results only from the given projects, we need an easy way to include results from subprojects. Some ideas for implementing this feature are:

include-subprojects:true

This doesn’t make it clear from what projects we are going to include subprojects from. We could make it so it returns subprojects for all projects. Users will probably use this with one project only.

subprojects:project/version (inclusive)

This allows to specify from what project exactly we are going to return subprojects from, and also include the version we are going to try to match. This includes the parent project in the results.

As the project parameter, the version can be optional, and defaults to the default version of the parent project.

subprojects:project/version (exclusive)

This is the same as the above, but it doesn’t include the parent project in the results. If we want to include the results from the project, then the query will be project:project/latest subprojects:project/latest. Is this useful?

The second option was chosen, since that’s the current behavior of our search when searching on a project with subprojects, and avoids having to repeat the project if the user wants to include it in the search too.

Cache

Since the request could be attached to more than one project. We will return all the list of projects for the cache tags, this is project1, project1:version, project2, project2:version.

CORS

Since the request could be attached to more than one project. we can’t make the decision if we should enable CORS or not on a given request from the middleware easily, so we won’t allow cross site requests when using the new API for now. We would need to refactor our CORS code, so every view can decide if CORS should be allowed or not, for this case, cross site requests will be allowed only if all versions of the final search are public, another alternative could be to always allow cross site requests, but when a request is cross site, we only return results from public versions.

Analytics

We will record the same query for each project that was used in the final search.

Response

The response will be similar to the old one, but will include extra information about the search, like the projects, versions, and the query that were used in the final search.

And the version, project, and project_alias attributes will now be objects.

We could just re-use the old response too, since the only breaking changes would be the attributes now being objects, and we aren’t adding any new information to those objects (yet). But also, re-using the current serializers shouldn’t be a problem either.

{
  "count": 1,
  "next": null,
  "previous": null,
  "projects": [
    {
      "slug": "docs",
      "versions": [
        {
          "slug": "latest"
        }
      ]
    }
  ],
  "query": "The final query used in the search",
  "results": [
    {
      "type": "page",
      "project": {
        "slug": "docs",
        "alias": null
      },
      "version": {
        "slug": "latest"
      },
      "title": "Main Features",
      "path": "/en/latest/features.html",
      "domain": "https://docs.readthedocs.io",
      "highlights": {
        "title": []
      },
      "blocks": [
        {
          "type": "section",
          "id": "full-text-search",
          "title": "Full-Text Search",
          "content": "We provide search across all the projects that we host. This actually comes in two different search experiences: dashboard search on the Read the Docs dashboard and in-doc search on documentation sites, using your own theme and our search results. We offer a number of search features: Search across subprojects Search results land on the exact content you were looking for Search across projects you have access to (available on Read the Docs for Business) A full range of search operators including exact matching and excluding phrases. Learn more about Server Side Search.",
          "highlights": {
            "title": [
              "Full-<span>Text</span> Search"
            ],
            "content": []
          }
        },
        {
          "type": "domain",
          "role": "http:post",
          "name": "/api/v3/projects/",
          "id": "post--api-v3-projects-",
          "content": "Import a project under authenticated user. Example request: BashPython$ curl \\ -X POST \\ -H \"Authorization: Token <token>\" https://readthedocs.org/api/v3/projects/ \\ -H \"Content-Type: application/json\" \\ -d @body.json import requests import json URL = 'https://readthedocs.org/api/v3/projects/' TOKEN = '<token>' HEADERS = {'Authorization': f'token {TOKEN}'} data = json.load(open('body.json', 'rb')) response = requests.post( URL, json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like, { \"name\": \"Test Project\", \"repository\": { \"url\": \"https://github.com/readthedocs/template\", \"type\": \"git\" }, \"homepage\": \"http://template.readthedocs.io/\", \"programming_language\": \"py\", \"language\": \"es\" } Example response: See Project details Note Read the Docs for Business, also accepts",
          "highlights": {
            "name": [],
            "content": [
              ", json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like,  &quot;name&quot;: &quot;<span>Test</span>"
            ]
          }
        }
      ]
    }
  ]
}

Examples

  • project:docs project:dev/latest test: search for test in the default version of the docs project, and in the latest version of the dev project.

  • a project:docs/stable search term: search for a search term in the stable version of the docs project.

  • project:docs project\:project/version: search for project::project/version in the default version of the docs project.

  • search: invalid, at least one project is required.

Future features

  • Allow searching on several versions of the same project (the API response is prepared to support this).

  • Allow searching on all versions of a project easily, with a syntax like project:docs/* or project:docs/@all.

  • Allow specify the type of search:

    • Multi match (query as is)

    • Simple query string (allows using the ES query syntax)

    • Fuzzy search (same as multi match, but with with fuzziness)

  • Add the org filter, so users can search by all projects that belong to an organization. We would show results of the default versions of each project.

Proposed contents for new Sphinx guides

Note

This work is in progress, see discussion on this Sphinx issue and the pull requests linked at the end.

The two main objectives are:

  • Contributing a good Sphinx tutorial for beginners. This should introduce the readers to all the various Sphinx major features in a pedagogical way, and be mostly focused on Markdown using MyST. We would try to find a place for it in the official Sphinx documentation.

  • Write a new narrative tutorial for Read the Docs that complements the existing guides and offers a cohesive story of how to use the service.

Sphinx tutorial

Appendixes are optional, i.e. not required to follow the tutorial, but highly recommended.

  1. The Sphinx way

    • Preliminary section giving an overview of what Sphinx is, how it works, how reStructuredText and Markdown/MyST are related to it, some terminology (toctree, builders), what can be done with it.

  2. About this tutorial

    • A section explaining the approach of the tutorial, as well as how to download the result of each section for closer inspection or for skipping parts of it.

  3. Getting started

    1. Creating our project

      • Present a fictitious goal for a documentation project

      • Create a blank README.md to introduce the most basic elements of Markdown (headings and paragraph text)

    2. Installing Sphinx and cookiecutter in a new development environment

      • Install Python (or miniforge)

      • Create a virtual environment (and/or conda environment)

      • Activate our virtual environment (it will always be the first step)

      • Install Sphinx inside the virtual environment

      • Check that sphinx-build --help works (yay!)

    3. Creating the documentation layout

      • Apply our cookiecutter to create a minimal docs/ directory (similar to what sphinx-quickstart does, but with source and build separation by default, project release 0.1, English language, and a MyST index, if at all) [1]

      • Check that the correct files are created (yay!)

    4. Appendix: Using version control

      • Install git (we will not use it during the tutorial)

      • Add a proper .gitignore file (copied from gitignore.io)

      • Create the first commit for the project (yay!)

  4. First steps to document our project using Sphinx

    1. Converting our documentation to local HTML

      • Create (or minimally tweak) index.md

      • Build the HTML output using sphinx-build -b -W html doc doc/_build/html [2]

      • Navigate to doc/_build/html and launch an HTTP server (python -m http.server)

      • Open http://localhost:8000 in a web browser, and see the HTML documentation (yay!)

    2. Converting our documentation to other formats

      • Build PseudoXML using make pseudoxml

      • Build Text using make text

      • See how the various formats change the output (yay!)

    3. Appendix: Simplify documentation building by using Make [3]

      • Install Make (nothing is needed on Windows, make.bat is standalone)

      • Add more content to index.md

      • Build HTML doing cd doc && make html

      • Observe that the HTML docs have changed (yay!)

    4. Appendix: PDF without LaTeX using rinoh (beta)

  5. Customizing Sphinx configuration

    1. Changing the HTML theme

      • Install https://pypi.org/project/furo/

      • Change the html_theme in conf.py

      • Rebuild the HTML documentation and observe that the theme has changed (yay!)

    2. Changing the PDF appearance

      • Add a latex_theme and set it to howto

      • Rebuild make latexpdf

      • Check that the appearance changed (yay!)

    3. Enable an extension

      • Add a string to the extensions list in conf.py for sphinx.ext.duration

      • Rebuild the HTML docs make html and notice that now the times are printed (yay!)

  6. Writing narrative documentation with Sphinx

    • First focus on index.md, diving more into Markdown and mentioning Semantic Line Breaks.

    • Then add another .md file to teach how toctree works.

    • Then continue introducing elements of the syntax to add pictures, cross-references, and the like.

  7. Describing code in Sphinx

    • Explain the Python domain as part of narrative documentation to interleave code with text, include doctests, and justify the usefulness of the next section.

  8. Autogenerating documentation from code in Sphinx

  9. Deploying a Sphinx project online

    • A bit of background on the options: GitHub/GitLab Pages, custom server, Netlify, Read the Docs

    • Make reference to Read the Docs tutorial

  10. Appendix: Using Jupyter notebooks inside Sphinx

  11. Appendix: Understanding the docutils document tree

  12. Appendix: Where to go from here

    • Refer the user to the Sphinx, reST and MyST references, prominent projects already using Sphinx, compilations of themes and extensions, the development documentation.

Read the Docs tutorial

  1. The Read the Docs way

  2. Getting started

    1. Preparing our project on GitHub

      • Fork a starter GitHub repository (something like our demo template, as a starting point that helps mimicking the sphinx-quickstart or cookiecutter step without having to checkout the code locally)

    2. Importing our project to Read the Docs

      • Sign up with GitHub on RTD

      • Import the project (don’t “Edit advanced project options”, we will do this later)

      • The project is created on RTD

      • Browse “builds”, open the build live logs, wait a couple of minutes, open the docs (yay!)

    3. Basic configuration changes

      • Add a description, homepage, and tags

      • Configure your email for build failure notification (until we turn them on by default)

      • Enable “build pull requests for this project” in the advanced settings

      • Edit a file from the GitHub UI as part of a new branch, and open a pull request

      • See the RTD check on the GitHub PR UI, wait a few minutes, open result (yay!)

  3. Customizing the build process

    • Use readthedocs.yaml (rather than the web UI) to customize build formats, change build requirements and Python version, enable fail-on-warnings

  4. Versioning documentation

    • Explain how to manage versions on RTD: create release branches, activate the corresponding version, browse them in the version selector, selectively build versions

    • Intermediate topics: hide versions, create Automation Rules

  5. Getting insights from your projects

    • Move around the project, explore results in Traffic Analytics

    • Play around with server-side search, explore results in Search Analytics

  6. Managing translations

  7. Where to go from here

    • Reference our existing guides, prominent projects already using RTD, domain configuration, our support form, our contributing documentation

Possible new how-to Guides

Some ideas for extra guides on specific topics, still for beginners but more problem-oriented documents, covering a wide range of use cases:

  • How to turn a bunch of Markdown files into a Sphinx project

  • How to turn a bunch of Jupyter notebooks into a Sphinx project

  • How to localize an existing Sphinx project

  • How to customize the appearance of the HTML output of a Sphinx project

  • How to convert existing reStructuredText documentation to Markdown

  • How to use Doxygen autogenerated documentation inside a Sphinx project

  • How to keep a changelog of your project

Reference

All the references should be external: the Sphinx reference, the MyST and reST syntax specs, and so forth.

Organizations

Currently we don’t support organizations in the community site (a way to group different projects), we only support individual accounts.

Several integrations that we support like GitHub and Bitbucket have organizations, where users group their repositories and manage them in groups rather than individually.

Why move organizations in the community site?

We support organizations in the commercial site, having no organizations in the community site makes the code maintenance difficult for Read the Docs developers. Having organizations in the community site will make the differences between both more easy to manage.

Users from the community site can have organizations in external sites from where we import their projects (like GitHub, Gitlab). Currently users have all projects from different organizations in their account. Having not a clear way to group/separate those.

We are going to first move the code, and after that enable the feature on the community site.

How are we going to support organizations?

Currently only users can own projects in the community site. With organizations this is going to change to: Users and organizations can own projects.

With this, the migration process would be straightforward for the community site.

For the commercial site we are only to allow organizations to own projects for now (since the we have only subscriptions per organizations).

What features of organizations are we going to support?

We have the following features in the commercial site that we don’t have on the community site:

  • Owners

  • Teams

  • Permissions

  • Subscriptions

Owners should be included to represent owners of the current organization.

Teams, this is also handy to manage access to different projects under the same organization.

Permissions, currently we have two type of permissions for teams: admin and read only. Read only permissions doesn’t make sense in the community site since we only support public projects/versions (we do support private versions now, but we are planning to remove those). So, we should only support admin permissions for teams.

Subscriptions, this is only valid for the corporate site, since we don’t charge for use in the community site.

How to migrate current projects

Since we are not replacing the current implementation, we don’t need to migrate current projects from the community site nor from the corporate site.

How to migrate the organizations app

The migration can be split in:

  1. Remove/simplify code from the organizations app on the corporate site.

  2. Isolate/separate models and code that isn’t going to be moved.

  3. Start by moving the models, managers, and figure out how to handle migrations.

  4. Move the rest of the code as needed.

  5. Activate organizations app on the community site.

  6. Integrate the code from the community site to the new code.

  7. UI changes

We should start by removing unused features and dead code from the organizations in the corporate site, and simplify existing code if possible (some of this was already done).

Isolate/separate the models to be moved from the ones that aren’t going to be moved. We should move the models that aren’t going to me moved to another app.

  • Plan

  • PlanFeature

  • Subscription

This app can be named subscriptions. We can get around the table names and migrations by setting the explicitly the table name to organizations_<model>, and doing a fake migration. Following suggestions in https://stackoverflow.com/questions/48860227/moving-multiple-models-from-one-django-app-to-another, that way we avoid having any downtime during the migration and any inconvenient caused from renaming the tables manually.

Code related to subscriptions should be moved out from the organizations app.

After that, it should be easier to move the organizations app (or part of it) to the community site (and no changes to table names would be required).

We start by moving the models.

  • Organization

  • OrganizationOwner

  • Team

  • TeamInvite

  • TeamMember

Migrations aren’t moved, since all current migrations depend on other models that aren’t going to be moved. In the community site we run an initial migration, for the corporate site we run a fake migration. The migrations left from the commercial site can be removed after that.

For managers and querysets that depend on subscriptions, we can use our pattern to make overridable classes (inheriting from SettingsOverrideObject).

Templates, urls, views, forms, notifications, signals, tasks can be moved later (we just need to make use of the models from the readthedocs.organizations module).

If we decide to integrate organizations in the community site, we can add/move the UI elements and enable the app.

After the app is moved, we can move more code that depends on organizations to the community site.

Namespace

Currently we use the project’s slug as namespace, in the commercial site we use the combination of organization.slug + project.slug as namespace, since in the corporate site we don’t care so much about a unique namespace between all users, but a unique namespace per organization.

For the community site probably this approach isn’t the best, since we always serve docs publicly from slug.readthedocs.io. And most of the users don’t have a custom domain.

The corporate site will use organization.slug + project.slug as slug, And the community site will always use project.slug as slug, even if the project belongs to an organization.

We need to refactor the way we get the namespace to be more easy to manage in both sites.

Future Changes

Changes that aren’t needed immediately after the migration, but that should be done:

  • UI for organizations in the community site.

  • Add new endpoints to the API (v3 only).

  • Make the relationship between the models Organization and Project one to many (currently many to many).

Design of pull request builder

Background

This will focus on automatically building documentation for Pull Requests on Read the Docs projects. This is one of the most requested feature of Read the Docs. This document will serve as a design document for discussing how to implement this features.

Scope

  • Making Pull Requests work like temporary Version

  • Excluding PR Versions from Elasticsearch Indexing

  • Adding a PR Builds Tab in the Project Dashboard

  • Updating the Footer API

  • Adding Warning Banner to Docs

  • Serving PR Docs

  • Excluding PR Versions from Search Engines

  • Receiving pull_request webhook event from Github

  • Fetching data from pull requests

  • Storing PR Version build Data

  • Creating PR Versions when a pull request is opened and Triggering a build

  • Triggering Builds on new commits on a PR

  • Status reporting to Github

Fetching data from pull requests

We already get Pull request events from Github webhooks. We can utilize that to fetch data from pull requests. when a pull_request event is triggered we can fetch the data of that pull request. We can fetch the pull request by doing something similar to travis-ci. ie: git fetch origin +refs/pull/<pr_number>/merge:

Modeling pull requests as a type of version

Pull requests can be Treated as a Type of Temporary Version. We might consider adding a VERSION_TYPES to the Version model.

  • If we go with VERSION_TYPES we can add something like pull_request alongside Tag and Branch.

We should add Version and Build Model Managers for PR and Regular Versions and Builds. The proposed names for PR and Regular Version and Build Mangers are external and internal.

We can then use Version.internal.all() to get all regular versions, Version.external.all() to get all PR versions.

We can then use Build.internal.all() to get all regular version builds, Build.external.all() to get all PR version builds.

Excluding PR versions from Elasticsearch indexing

We should exclude to PR Versions from being Indexed to Elasticsearch. We need to update the queryset to exclude PR Versions.

Adding a PR builds tab in the project dashboard

We can add a Tab in the project dashboard that will listout the PR Builds of that project. We can name it PR Builds.

Creating versions for pull requests

If the Github webhook event is pull_request and action is opened, this means a pull request was opened in the projects repository. We can create a Version from the Payload data and trigger a initial build for the version. A version will be created whenever RTD receives an event like this.

Triggering build for new commits in a pull request

We might want to trigger a new build for the PR version if there is a new commit on the PR. If the Github webhook event is pull_request and action is synchronize, this means a new commit was added to the pull request.

Status reporting to GitHub

We could send build status reports to Github. We could send if the build was Successful or Failed. We can also send the build URL. By this we could show if the build passed or failed on Github something like travis-ci does.

As we already have the repo:status scope on our OAuth App, we can send the status report to Github using the Github Status API.

Sending the status report would be something like this:

POST /repos/:owner/:repo/statuses/:sha
{
    "state": "success",
    "target_url": "<pr_build_url>",
    "description": "The build succeeded!",
    "context": "continuous-documentation/read-the-docs"
}

Storing pull request docs

We need to think about how and where to store data after a PR Version build is finished. We can store the data in a blob storage.

Excluding PR versions from search engines

We should Exclude the PR Versions from Search Engines, because it might cause problems for RTD users. As users might land to a pull request doc but not the original Project Docs. This will cause confusion for the users.

Serving PR docs

We need to think about how we want to serve the PR Docs.

  • We could serve the PR Docs from another Domain.

  • We could serve the PR Docs using <pr_number> namespace on the same Domain.

    • Using pr-<pr_number> as the version slug https://<project_slug>.readthedocs.io/<language_code>/pr-<pr_number>/

    • Using pr subdomain https://pr.<project_slug>.readthedocs.io/<pr_number>/

Adding warning banner to Docs

We need to add a warning banner to the PR Version Docs to let the users know that this is a Draft/PR version. We can use a sphinx extension that we will force to install on the PR Versions to add the warning banner.

Privacy levels

This document describes how to handle and unify privacy levels on the community and commercial version of Read the Docs.

Current state

Currently, we have three privacy levels for projects and versions:

  1. Public

  2. Private

  3. Protected (currently hidden)

These levels of privacy aren’t clear and bring confusion to our users. Also, the private level doesn’t makes sense on the community site, since we only support public projects.

Places where we use the privacy levels are:

  • On serving docs

  • Footer

  • Dashboard

Project level privacy

Project level privacy was meant to control the dashboard visibility.

This privacy level brings to confusion when users want to make a version public. We should remove all the project privacy levels.

For the community site the dashboard would be always visible, and for the commercial site, the dashboard would be always hidden.

The project privacy level is also used to serve the 404.html page, show robots.txt, and show sitemap.xml. The privacy level from versions should be used instead.

Some other ideas about keeping the privacy level is to dictate the default version level of new versions, but removing all other logic related to this privacy level. This can be (or is going to be) possible with automation rules, so we can just remove the field.

Version level privacy

Version level privacy is mainly used to restrict access to documentation. For public level, everyone can access to the documentation. For private level, only users that are maintainers or that belong to a team with access (for the commercial site) can access to the documentation.

The protected privacy level was meant to hide versions from listings and search. For the community site these versions are treated like public versions, and on the commercial site they are treated like private.

The protected privacy level is currently hidden. To keep the behavior of hiding versions from listings and search, a new field should be added to the Version model and forms: hidden (#5321). The privacy level (public or private) would be respected to determine access to the documentation.

For the community site, the privacy level would be public and can’t be changed.

The default privacy level of new versions for the commercial site would be private (this is the DEFAULT_PRIVACY_LEVEL setting).

Overview

For the community site:

  • The project’s dashboard is visible to all users.

  • All versions are always public.

  • The footer shows links to the project’s dashboard (build, downloads, home) to all users.

  • Only versions with hidden = False are listed on the footer and appear on search results.

  • If a project has a 404.html file on the default version, it’s served.

  • If a project has a robots.txt file on the default version, it’s served.

  • A sitemap.xml file is always served.

For the commercial site:

  • The project’s dashboard is visible to only users that have read permission over the project.

  • The footer shows links to the project’s dashboard (build, downloads, home) to only admin users.

  • Only versions with hidden = False are listed on the footer and appear on search results.

  • If a project has a 404.html file on the default version, it’s served if the user has permission over that version.

  • If a project has a robots.txt file on the default version, it’s served if the user has permission over that version.

  • A sitemap.xml file is served if the user has at least one public version. And it will only list public versions.

Migration

To differentiate between allowing or not privacy levels, we need to add a setting RTD_ALLOW_PRIVACY_LEVELS (False by default).

For the community and commercial site, we need to:

  • Remove/change code that depends on the project’s privacy level. Use the global setting RTD_ALLOW_PRIVACY_LEVELS and default version’s privacy level instead.

    • Display robots.txt

    • Serve 404.html page

    • Display sitemap.xml

    • Querysets

  • Remove Project.privacy_level field

  • Migrate all protected versions to have the attribute hidden = True (data migration), and set their privacy level to public for the community site and private for the commercial site.

  • Change all querysets used to list versions on the footer and on search to use the hidden attribute.

  • Update docs

For the community site:

  • Hide all privacy level related settings from the version form.

  • Don’t expose privacy levels on API v3.

  • Mark all versions as public.

For the commercial site:

  • Always hide the dashboard

  • Show links to the dashboard (downloads, builds, project home) on the footer only to admin users.

Upgrade path overview

Community site

The default privacy level for the community site is public for versions and the dashboard is always public.

Public project (community)
  • Public version: Normal use case, no changes required.

  • Protected version: Users didn’t want to list this version on the footer, but also not deactivate it. We can do a data migration of those versions to the new hidden setting and make them public.

  • Private version: Users didn’t want to show this version to their users yet or they were testing something. This can be solved with the pull request builder feature and the hidden setting. We migrate those to public with the hidden setting. If we are worried about leaking anything from the version, we can email users before doing the change.

Protected project (community)

Protected projects are not listed publicly. Probably users were hosting a WIP project, or personal public project. A public project should work for them, as we are removing listing all projects publicly (except for search).

The migration path for versions of protected projects is the same as a public project.

Private project (community)

Probably these users want to use our enterprise solution instead. Or they were hosting a personal project.

The migration path for versions of private projects is the same as a public project.

If we are worried about leaking anything from the dashboard or build page, we can email users before doing the change.

Commercial site

The default privacy level for the commercial site is private for versions and the dashboard is show only to admin users.

Private project (commercial)
  • Private version: Normal usa case, not changes required.

  • Protected version: Users didn’t want to list this version on the footer, but also not deactivate it. This can be solved by using the new hidden setting. We can do a data migration of those versions to the new hidden setting and make them private.

  • Public version: User has private code, but want to make public their docs. No changes required.

Protected project (commercial)

I can’t think of a use case for protected projects, since they aren’t listed publicly on the commercial site.

The migration path for versions of protected projects is the same as a private project.

Public project (commercial)

Currently we show links back to project dashboard if the project is public, which probably users shouldn’t see. With the implementation of this design doc, public versions don’t have links to the project dashboard (except for admin users) and the dashboard is always under login.

  • Private versions: Users under the organization can see links to the dashboard. Not changes required.

  • Protected versions: Users under the organization can see links to the dashboard. We can do a data migration of those versions to the new hidden setting and make them private.

  • Public versions: All users can see links to the dashboard. Probably they have an open source project, but they still want to manage access using the same teams of the organization. Not changes are required.

A breaking change here is: users outside the organization would not be able to see the dashboard of the project.

Improving redirects

Redirects are a core feature of Read the Docs, they allow users to keep old URLs working when they rename or move a page.

The current implementation lacks some features and has some undefined/undocumented behaviors.

Goals

  • Improve the user experience when creating redirects.

  • Improve the current implementation without big breaking changes.

Non-goals

  • Replicate every feature of other services without having a clear use case for them.

  • Improve the performance of redirects. This can be discussed in an issue or pull request. Performance should be considered when implementing new improvements.

  • Allow importing redirects. We should push users to use our API instead.

  • Allow specifying redirects in the RTD config file. We have had several discussions around this, but we haven’t reached a consensus.

Current implementation

We have five types of redirects:

Prefix redirect:

Allows to redirect all the URLs that start with a prefix to a new URL using the default version and language of the project. For example: a prefix redirect with the value /prefix/ will redirect /prefix/foo/bar to /en/latest/foo/bar.

They are basically the same as an exact redirect with a wildcard at the end. They are a shortcut for a redirect like:

From:

/prefix/$rest

To:

/en/latest/

Or maybe we could use a prefix redirect to replace the exact redirect with a wildcard?

Page redirect:

Allows to redirect a single page to a new URL using the current version and language. For example: a page redirect with the value /old/page.html will redirect /en/latest/old/page.html to /en/latest/new/page.html.

Cross domain redirects are not allowed in page redirects. They apply to all versions, if you want it to apply only to a specific version you can use an exact redirect.

A whole directory can’t be redirected with a page redirect, an exact redirect with a wildcard at the end needs to be used instead.

A page redirect on a single version project is the same as an exact redirect.

Exact redirect:

Allows to redirect an exact URL to a new URL, it allows a wildcard at the end to redirect. For example: an exact redirect with the value /en/latest/page.html will redirect /en/latest/page.html to the new URL.

If an exact redirect with the value /en/latest/dir/$rest is created, it will redirect all paths that start with /en/latest/dir/, the rest of the path will be added to the new URL automatically.

  • Cross domain redirects are allowed in exact redirects.

  • They apply to all versions.

  • A wildcard is allowed at the end of the URL.

  • If a wildcard is used, the rest of the path will be added to the new URL automatically.

Sphinx HTMLDir to HTML:

Allows to redirect clean-URLs to HTML URLs. Useful in case a project changed the style of their URLs.

They apply to all projects, not just Sphinx projects.

Sphinx HTML to HTMLDir:

Allows to redirect HTML URLs to clean-URLs. Useful in case a project changed the style of their URLs.

They apply to all projects, not just Sphinx projects.

How other services implement redirects

  • Gitbook implementation is very basic, they only allow page redirects.

    https://docs.gitbook.com/integrations/git-sync/content-configuration#redirects

  • Cloudflare pages allow to capture placeholders and one wildcard (in any part of the URL). They also allow you to set the status code of the redirect, and redirects can be specific in a _redirects file.

    https://developers.cloudflare.com/pages/platform/redirects/

    They have a limit of 2100 redirects. In case of multiple matches, the topmost redirect will be used.

  • Netlify allows to capture placeholders and a wildcard (only allowed at the end). They also allow you to set the status code of the redirect, and redirects can be specific in a _redirects file.

    • Forced redirects

    • Match query arguments

    • Match by country/language and cookies

    • Per-domain and protocol redirects

    • In case of multiple matches, the topmost redirect will be used.

    • Rewrites, serve a different file without redirecting.

    https://docs.netlify.com/routing/redirects/

  • GitLab pages supports the same syntax as Netlify, and supports a subset of their features:

    • _redirects config file

    • Status codes

    • Rewrites

    • Wildcards (splats)

    • Placeholders

    https://docs.gitlab.com/ee/user/project/pages/redirects.html

Improvements

General improvements

The following improvements will be applied to all types of redirects.

  • Allow choosing the status code of the redirect. We already have a field for this, but it’s not exposed to users.

  • Allow to explicitly define the order of redirects. This will be similar to the automation rules feature, where users can reorder the rules so the most specific ones are first. We currently rely on the implicit order of the redirects (updated_at).

  • Allow to disable redirects. It’s useful when testing redirects, or when debugging a problem. Instead of having to re-create the redirect, we can just disable it and re-enable it later.

  • Allow to add a short description. It’s useful to document why the redirect was created.

Don’t run redirects on domains from pull request previews

We currently run redirects on domains from pull request previews, this is a problem when moving a whole project to a new domain.

We don’t the need to run redirects on external domains, they should be treated as temporary domains.

Normalize paths with trailing slashes

Currently, if users want to redirect a path with a trailing slash and without it, they need to create two separate redirects (/page/ and /page).

We can simplify this by normalizing the path before matching it, or before saving it.

For example:

From:

/page/

To:

/new/page

The from path will be normalized to /page, and the filename to match will also be normalized before matching it. This is similar to what Netlify does: https://docs.netlify.com/routing/redirects/redirect-options/#trailing-slash.

Page and exact redirects without a wildcard at the end will be normalized, all other redirects need to be matched as is.

This makes it impossible to match a path with a trailing slash.

Use * and :splat for wildcards

Currently we are using $rest at the end of the From URL to indicate that the rest of the path should be added to the target URL.

A similar feature is implemented in other services using * and :splat.

Instead of using $rest in the URL for the suffix wildcard, we now will use *, and :splat as a placeholder in the target URL to be more consistent with other services. Existing redirects can be migrated automatically.

Explicit :splat placeholder

Explicitly place the :splat placeholder in the target URL, instead of adding it automatically.

Some times users want to redirect to a different path, we have been adding a query parameter in the target URL to prevent the old path from being added in the final path. For example /new/path/?_=.

Instead of adding the path automatically, users have to add the :splat placeholder in the target URL. For example:

From:

/old/path/*

To:

/new/path/:splat

From:

/old/path/*

To:

/new/path/?page=:splat&foo=bar

Improving page redirects
  • Allow to redirect to external domains. This can be useful to apply a redirect of a well known path in all versions to another domain.

    For example, /security/ to a their security policy page in another domain.

    This new feature isn’t strictly needed, but it will be useful to simplify the explanation of the feature (one less restriction to explain).

    Example:

    From:

    /security/

    To:

    https://example.com/security/

  • Allow a wildcard at the end of the from path. This will allow users to migrate a whole directory to a new path without having to create an exact redirect for each version.

    Similar to exact redirects, users need to add the :splat placeholder explicitly. This means that that page redirects are the same as exact redirects, with the only difference that they apply to all versions.

    Example:

    From:

    /old/path/*

    To:

    /new/path/:splat

Merge prefix redirects with exact redirects

Prefix redirects are the same as exact redirects with a wildcard at the end. We will migrate all prefix redirects to exact redirects with a wildcard at the end.

For example:

From:

/prefix/

Will be migrated to:

From:

/prefix/*

To:

/en/latest/:splat

Where /en/latest is the default version and language of the project. For single version projects, the redirect will be:

From:

/prefix/*

To:

/:splat

Improving Sphinx redirects

These redirects are useful, but we should rename them to something more general, since they apply to all types of projects, not just Sphinx projects.

Proposed names:

  • HTML URL to clean URL redirect (file.html to file/)

  • Clean URL to HTML URL redirect (file/ to file.html)

Other ideas to improve redirects

The following improvements will not be implemented in the first iteration.

  • Run forced redirects before built-in redirects. We currently run built-in redirects before forced redirects, this is a problem when moving a whole project to a new domain. For example, a forced redirect like /$rest, won’t work for the root URL of the project, since / will first redirect to /en/latest/.

    But shouldn’t be a real problem, since users will still need to handle the /en/latest/file/ paths.

  • Run redirects on the edge. Cloudflare allow us to create redirects on the edge, but they have some limitations around the number of redirect rules that can be created.

    And they will be useful for forced exact redirects only, since we can’t match a redirect based on the response of the origin server.

  • Merge all redirects into a single type. This may simplify the implementation, but it will make it harder to explain the feature to users. And to replace some redirects we need to implement some new features.

  • Placeholders. I haven’t seen users requesting this feature. We can consider adding it in the future. Maybe we can expose the current language and version as placeholders.

  • Per-protocol redirects. We should push users to always use HTTPS.

  • Allow a prefix wildcard. We currently only allow a suffix wildcard, adding support for a prefix wildcard should be easy. But do users need this feature?

  • Per-domain redirects. The main problem that originated this request was that we were applying redirects on external domains, if we stop doing that, there is no need for this feature. We can also try to improve how our built-in redirects work (specially our canonical domain redirect).

Allow matching query arguments

We can do this in three ways:

  • At the DB level with some restrictions. If done at the DB level, we would need to have a different field with just the path, and other with the query arguments normalized and sorted.

    For example, if we have a redirect with the value /foo?blue=1&yellow=2&red=3, if would be normalized in the DB as /foo and blue=1&red=3&yellow=2. This implies that the URL to be matched must have the exact same query arguments, it can’t have more or less.

    I believe the implementation described here is the same being used by Netlify, since they have that same restriction.

    If the URL contains other parameters in addition to or instead of id, the request doesn’t match that rule.

    https://docs.netlify.com/routing/redirects/redirect-options/#query-parameters

  • At the DB level using a JSONField. All query arguments will be saved normalized as a dictionary. When matching the URL, we will need to normalize the query arguments, and use some a combination of has_keys and contained_by to match the exact number of query arguments.

  • At the Python level. If done at the DB level, we would need to have a different field with just the path, and other with query arguments.

    The matching of the path would be done at the DB level, and the matching of the query arguments would be done at the Python level. Here we can be more flexible, allowing any query arguments in the matched URL.

    We had some performance problems in the past, but I believe it was mainly due to the use of regex instead of using string operations. And matching the path is still done at the DB level. We could limit the number of redirects that can be created with query arguments, or the number of redirects in general.

We hava had only one user requesting this feature, so this is not a priority.

Migration

Most of the proposed improvements are backwards compatible, and just need a data migration to normalize existing redirects.

For the exception of adding the $rest placeholder in the target URL explicitly, that needs users to re-learn how this feature works, i.e, they may be expecting to have the path added automatically in the target URL.

We can create a small blog post explaining the changes.

Refactor RemoteRepository object

This document describes the current usage of RemoteRepository objects and proposes a new normalized modeling.

Goals

  • De-duplicate data stored in our database.

  • Save only one RemoteRepository per GitHub repository.

  • Use an intermediate table between RemoteRepository and User to store associated remote data for the specific user.

  • Make this model usable from our SSO implementation (adding remote_id field in Remote objects).

  • Use Post JSONField to store associated json remote data.

  • Make Project connect directly to RemoteRepository without being linked to a specific User.

  • Do not disconnect Project and RemoteRepository when a user delete/disconnects their account.

Non-goals

  • Keep RemoteRepository in sync with GitHub repositories.

  • Delete RemoteRepository objects deleted from GitHub.

  • Listen to GitHub events to detect full_name changes and update our objects.

Note

We may need/want some of these non-goals in the future. They are just outside the scope of this document.

Current implementation

When a user connect their account to a social account, we create a

  • allauth.socialaccount.models.SocialAccount * basic information (provider, last login, etc) * provider’s specific data saved in a JSON under extra_data

  • allauthsocialaccount.models.SocialToken * token to hit the API on behalf the user

We don’t create any RemoteRepository at this point. They are created when the user jumps into “Import Project” page and hit the circled arrows. It triggers sync_remote_repostories task in background that updates or creates RemoteRepositories, but it does not delete them (after #7183 and #7310 got merged, they will be deleted). One RemoteRepository is created per repository the User has access to.

Note

In corporate, we are automatically syncing RemoteRepository and RemoteOganization at signup (foreground) and login (background) via a signal. We should eventually move these to community.

Where RemoteRepository is used?

  • List of available repositories to import under “Import Project”

  • Show a “+”, “External Arrow” or a “Lock” sign next to the element in the list * +: it’s available to be imported * External Arrow: the repository is already imported (see RemoteRepository.matches method) * Lock: user doesn’t have (admin) permissions to import this repository (uses RemoteRepository.private and RemoteRepository.admin)

  • Avatar URL in the list of project available to import

  • Update webhook when user clicks “Resync webhook” from the Admin > Integrations tab

  • Send build status when building Pull Requests

New normalized implementation

The ManyToMany relation RemoteRepository.users will be changed to be ManyToMany(through='RemoteRelation') to add extra fields in the relation that are specific only for the User. Allows us to have only one RemoteRepository per GitHub repository with multiple relationships to User.

With this modeling, we can avoid the disconnection Project and RemoteRepository only by removing the RemoteRelation.

Note

All the points mentioned in the previous section may need to be adapted to use the new normalized modeling. However, it may be only field renaming or small query changes over new fields.

Use this modeling for SSO

We can get the list of Project where a user as access:

admin_remote_repositories = RemoteRepository.objects.filter(
    users__contains=request.user,
    users__remoterelation__admin=True,  # False for read-only access
)
Project.objects.filter(remote_repository__in=admin_remote_repositories)

Rollout plan

Due the constraints we have in the RemoteRepository table and its size, we can’t just do the data migration at the same time of the deploy. Because of this we need to be more creative here and find a way to re-sync the data from VCS providers, while the site continue working.

To achieve this, we thought on following this steps:

1. modify all the Python code to use the new modeling in .org and .com (will help us to find out bugs locally in an easier way) 1. QA this locally with test data 1. enable Django signal to re-sync RemoteRepository on login async (we already have this in .com). New active users will have updated data immediately 1. spin up a new instance with the new refactored code 1. run migrations to create a new table for RemoteRepository 1. re-sync everything from VCS providers into the new table for 1-week or so 1. dump-n-load Project - RemoteRepository relations 1. create a migration to use the new table with synced data 1. deploy new code once the sync is finished

See these issues for more context: * https://github.com/readthedocs/readthedocs.org/pull/7536#issuecomment-724102640 * https://github.com/readthedocs/readthedocs.org/pull/7675#issuecomment-732756118

Secure API access from builders

Goals

  • Provide a secure way for builders to access the API.

  • Limit the access of the tokens to the minimum required.

Non-goals

  • Migrate builds to use API V3

  • Implement this mechanism in API V3

  • Expose it to users

All these changes can be made in the future, if needed.

Current state

Currently, we access the API V2 from the builders using the credentials of the “builder” user. This user is a superuser, it has access to all projects, write access to the API, access to restricted endpoints, and restricted fields.

The credentials are hardcoded in our settings file, so if there is a vulnerability that allows users to have access to the settings file, the attacker will have access to the credentials of the “builder” user, giving them full access to the API and all projects.

Proposed solution

Instead of using the credential of a super user to access the API, we will create a temporal token attached to a project, and one of the owners of the project. This way this token will have access to the given project only for a limited period of time.

This token will be generated from the webs, and passed to the builders via the celery task, where it can be used to access the API. Once the build has finished, this token will be revoked.

Technical implementation

We will use the rest-knox package, this package is recommended by the DRF documentation, since the default token implementation of DRF is very basic, some relevant features of knox are:

  • Support for several tokens per user.

  • Tokens are stored in a hashed format in the database. We don’t have access the tokens after they are created.

  • Tokens can have an expiration date.

  • Tokens can be created with a prefix (rtd_xxx) (unreleased)

  • Support for custom token model (unreleased)

We won’t expose the token creation view directly, since we can create the tokens from the webs, and this isn’t exposed to users.

The view to revoke the token will be exposed, since we need it to revoke the token once the build has finished.

From the API, we just need to add the proper permission and authentication classes to the views we want to support.

To differentiate from a normal user and a token authed user, we will have access to the token via the request.auth attribute in the API views, this will also be used to get the attached projects to filter the querysets.

The knox package allows us to provide our own token model, this will be useful to add our own fields to the token model. Fields like the projects attached to the token, or access to all projects the user has access to, etc.

Flow

The flow of creation and usage of the token will be:

  • Create a token from the webs when a build is triggered. The triggered project will be attached to the token, if the build was triggered by a user, that user will be attached to the token, otherwise the token will be attached to one of the owners of the project.

  • The token will be created with an expiration date of 3 hours, this should be enough for the build to finish. We could also make this dynamic depending of the project.

  • Pass the token to the builder via the celery task.

  • Pass the token to all places where the API is used.

  • Revoke the token when the build has finished. This is done by hitting the revoke endpoint.

  • In case the revoke endpoint fails, the token will expire in 3 hours.

Why attach tokens to users?

Attaching tokens to users will ease the implementation, since we can re-use the code from knox package.

Attaching tokens to projects only is possible, but it will require to manage the authentication manually. This is since Knox requires a user to be attached to the token, and this user is used in their TokenAuthentication class. An alternative is to use the DRF API key package, which doesn’t require a user, but then if we wanted to extend this functionality to our normal APIs, we will have to implement the authentication manually.

Kepping backwards compatibility

Access to write API V2 is restricted to superusers, and was used only from the builders. So we don’t need to keep backwards compatibility for authed requests, but we need to keep the old implementation working while we deploy the new one.

Possible issues

Some of the features that we may need are not released yet, we need the custom token model feature, specially.

There is a race condition when using the token, and the user that is attached to that token is removed from the project. This is, if the user is removed while the build is running, the builders won’t be able to access the API. We could avoid this by not relying on the user attached to the token, only on the projects attached to it (this would be for our build APIs only).

Alternative implementation with Django REST Framework API Key

Instead of using knox, we can use DRF API key, it has the same features as knox, with the exception of:

  • It is only used for authorization, it can’t be used for authentication (or it can’t be out of the box).

  • It doesn’t expose views to revoke the tokens (but this should be easy to manually implement)

  • Changing the behaviour of some things require sub-classing instead of defining settings.

  • It supports several token models (not just one like knox).

  • All features that we need are already released.

The implementation will be very similar to the one described for knox, with the exception that tokens won’t be attached to users, but just a project. And we won’t be needing to handle authentication, since the token itself will grant access to the projects.

To avoid breaking builders, we need to be able to make the old and the new implementation work together, this is, allow authentication and handle tokens at the same time. This means passing valid user credentials together with the token, this “feature” can be removed in the next deploy (with knox we also need to handle both implementations, but it doesn’t require passing credentials with the token, since it also handles authentication).

Decision

Due to the fact that the required featues from knox are not released yet, we have decided to use DRF API key instead.

Future work

This work can be extended to API V3, and be exposed to users in the future. We only need to take into consideration that the token model will be shared by both, API V2 and API V3 if using knox, if we use API key, we can have different token models for each use case.

sphinxcontrib-jquery

jQuery will be removed from Sphinx 6.0.0. We can expect 6.0.0 to ship in late 2022.

This is a “request for comments” for a community-owned Sphinx extension that bundles jQuery.

Overview

Comment deadline:

November 1st, 2022

Package-name:

sphinxcontrib-jquery

Python package:

sphinxcontrib.jquery

Dependencies:

Python 3+, Sphinx 1.8+ (or perhaps no lower bound?)

Ownership:

Read the Docs core team will implement the initial releases of an otherwise community-owned package that lives in https://github.com/sphinx-contrib/jquery

Functionality:

sphinxcontrib-jquery is a Sphinx extension that provides a simple mechanism for other Sphinx extensions and themes to ensure that jQuery is included into the HTML build outputs and loaded in the HTML DOM itself. More specifically, the extension ensures that jQuery is loaded exactly once no matter how many themes and extensions that request to include jQuery nor the version of Sphinx.

Scope:

This extension assumes that it’s enough to provide a single version of jQuery for all of its dependent extensions and themes. As the name implies, this extension is built to handle jQuery only. It’s not a general asset manager and it’s not looking to do dependency resolution of jQuery versions.

Usage

The primary users of this package are theme and extension developers and documentation project owners.

Theme and extension developers

The following 2 steps need to be completed:

  1. A Sphinx theme or extension should depend on the python package sphinxcontrib-jquery.

  2. In your extension’s or theme’s setup(app), call app.setup_extension("sphinxcontrib.jquery").

In addition to this, we recommend extension and theme developers to log to the browser’s console.error in case jQuery isn’t found. The log message could for instance say:

if (typeof $ == "undefined") console.error("<package-name> depends on sphinxcontrib-jquery. Please ensure that <package-name>.setup(app) is called or add 'sphinxcontrib-jquery' to your conf.py extensions setting.")
Documentation project owners

If you are depending on a theme or extension that did not itself address the removal of jQuery from Sphinx 6, you can patch up your project like this:

  1. Add sphinxcontrib-jquery to your installed dependencies.

  2. Add sphinxcontrib.jquery to your extensions setting in conf.py.

Calling app.setup_extension("sphinxcontrib.jquery")

When a Sphinx theme or extension calls setup_extension(), a call to sphinxcontrib.jquery.setup(app) will happen. Adding sphinxcontrib.jquery to a documentation project’s conf.extensions will also call sphinxcontrib.jquery.setup(app) (at most once).

In sphinxcontrib.jquery.setup(app), jQuery is added. The default behaviour is to detect the Sphinx version and include jQuery via app.add_js_file when Sphinx is from version 6 and up. jQuery is added at most once.

Config value: jquery_force_enable

When setting jquery_force_enable=True, jQuery is added no matter the Sphinx version, but at most once. This is useful if you want to handle alternative conditions for adding jQuery.

Warning

If you set jquery_force_enable=True, you most likely should also add Sphinx>=6 to your theme’s/extension’s dependencies since versions before this already bundles jQuery!

jQuery version and inclusion

jQuery should be be shipped together with the Python package and not be referenced from a CDN.

Sphinx has kept relatively up to date with jQuery, and this package intends to follow. The most recently bundled jQuery version was v3.5.1 and only two releases have happened since: 3.6.0 and 3.6.1. The 3.6.0 release had a very small backwards incompatibility which illustrates how harmless these upgrades are for the general purpose Sphinx package.

Therefore, we propose to start the release of sphinxcontrib-jquery at 3.5.1 (the currently shipped version) and subsequently release 3.6.1 in an update. This will give users that need 3.5.1 a choice of a lower version.

The bundled jQuery version will be NPM pre-minified and distributed together with the PyPI package.

The minified jQuery JS file is ultimately included by calling app.add_js_file, which is passed the following arguments:

app.add_js_file(
    get_jquery_url_path(),
    loading_method="defer",
    priority=200,
    integrity="sha256-{}".format(get_jquery_sha256_checksum()),
)

Note

It’s possible to include jQuery in other ways, but this ultimately doesn’t require this extension and is therefore not supported.

Allow installation of system packages

Currently we don’t allow executing arbitrary commands in the build process. The more common use case is to install extra dependencies.

Current status

There is a workaround when using Sphinx to run arbitrary commands, this is executing the commands inside the conf.py file. There isn’t a workaround for MkDocs, but this problem is more common in Sphinx, since users need to install some extra dependencies in order to use autodoc or build Jupyter Notebooks.

However, installation of some dependencies require root access, or are easier to install using apt. Most of the CI services allow to use apt or execute any command with sudo, so users are more familiar with that workflow.

Some users use Conda instead of pip to install dependencies in order to avoid these problems, but not all pip users are familiar with Conda, or want to migrate to Conda just to use Read the Docs.

Security concerns

Builds are run in a Docker container, but the app controlling that container lives in the same server. Allowing to execute arbitrary commands with super user privileges may introduce some security issues.

Exposing apt install

For the previous reasons we won’t allow to execute arbitrary commands with root (yet), but instead allow only to install extra packages using apt.

We would expose this through the config file. Users will provide a list of packages to install, and under the hook we would run:

  • apt update -y

  • apt install -y {packages}

These commands will be run before the Python setup step and after the clone step.

Note

All package names must be validated to avoid injection of extra options (like -v).

Using docker exec

Currently we use docker exec to execute commands in a running container. This command also allows to pass a user which is used to run the commands (#8058). We can run the apt commands in our current containers using a super user momentarily.

Config file

The config file can add an additional mapping build.apt_packages to a list of packages to install.

version: 2

build:
  apt_packages:
     - cmatrix
     - mysql-server

Note

Other names that were considered were:

  • build.packages

  • build.extra_packages

  • build.system_packages

These were rejected to avoid confusion with existing keys, and to be explicit about the type of package.

Possible problems

  • Some users may require to pass some additional flags or install from a ppa.

  • Some packages may require some additional setup after installation.

Other possible solutions

  • We can allow to run the containers as root doing something similar to what Travis does: They have one tool to convert the config file to a shell script (travis-build), and another that spins a docker container, executes that shell script and streams the logs back (travis-worker).

  • A similar solution could be implemented using AWS Lambda.

This of course would require a large amount of work, but may be useful for the future.

Collect data about builds

We may want to take some decisions in the future about deprecations and supported versions. Right now we don’t have data about the usage of packages and their versions on Read the Docs to be able to make an informed decision.

Tools

Kibana:
Superset:
  • https://superset.apache.org/

  • We can import data from several DBs (including postgres and ES).

  • Easy to setup locally, but doesn’t look like there is cloud provider for it.

Metabase:

Summary: We have several tools that can inspect data form a postgres DB, and we also have Kibana that works only with ElasticSearch. The data to be collected can be saved in a postgres or ES database. Currently, we are making use of Metabase to get other information, so it’s probably the right choice for this task.

Data to be collected

The following data can be collected after installing all dependencies.

Configuration file

We are saving the config file in our database, but to save some space we are saving it only if it’s different than the one from a previous build (if it’s the same we save a reference to it).

The config file being saved isn’t the original one used by the user, but the result of merging it with its default values.

We may also want to have the original config file, so we know which settings users are using.

PIP packages

We can get a json with all and root dependencies with pip list. This will allow us to have the name of the packages and their versions used in the build.

$ pip list --pre --local --format json | jq
# and
$ pip list --pre --not-required --local --format json | jq
[
   {
      "name": "requests-mock",
      "version": "1.8.0"
   },
   {
      "name": "requests-toolbelt",
      "version": "0.9.1"
   },
   {
      "name": "rstcheck",
      "version": "3.3.1"
   },
   {
      "name": "selectolax",
      "version": "0.2.10"
   },
   {
      "name": "slumber",
      "version": "0.7.1"
   },
   {
      "name": "sphinx-autobuild",
      "version": "2020.9.1"
   },
   {
      "name": "sphinx-hoverxref",
      "version": "0.5b1"
   },
]

With the --not-required option, pip will list only the root dependencies.

Conda packages

We can get a json with all dependencies with conda list --json. That command gets all the root dependencies and their dependencies (there is no way to list only the root dependencies), so we may be collecting some noise, but we can use pip list as a secondary source.

$ conda list --json --name conda-env

[
   {
      "base_url": "https://conda.anaconda.org/conda-forge",
      "build_number": 0,
      "build_string": "py_0",
      "channel": "conda-forge",
      "dist_name": "alabaster-0.7.12-py_0",
      "name": "alabaster",
      "platform": "noarch",
      "version": "0.7.12"
   },
   {
      "base_url": "https://conda.anaconda.org/conda-forge",
      "build_number": 0,
      "build_string": "pyh9f0ad1d_0",
      "channel": "conda-forge",
      "dist_name": "asn1crypto-1.4.0-pyh9f0ad1d_0",
      "name": "asn1crypto",
      "platform": "noarch",
      "version": "1.4.0"
   },
   {
      "base_url": "https://conda.anaconda.org/conda-forge",
      "build_number": 3,
      "build_string": "3",
      "channel": "conda-forge",
      "dist_name": "python-3.5.4-3",
      "name": "python",
      "platform": "linux-64",
      "version": "3.5.4"
   }
]
APT packages

We can get the list from the config file, or we can list the packages installed with dpkg --get-selections. That command would list all pre-installed packages as well, so we may be getting some noise.

$ dpkg --get-selections

adduser                                         install
apt                                             install
base-files                                      install
base-passwd                                     install
bash                                            install
binutils                                        install
binutils-common:amd64                           install
binutils-x86-64-linux-gnu                       install
bsdutils                                        install
build-essential                                 install

We can get the installed version with:

$ dpkg --status python3

Package: python3
Status: install ok installed
Priority: optional
Section: python
Installed-Size: 189
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: allowed
Source: python3-defaults
Version: 3.8.2-0ubuntu2
Replaces: python3-minimal (<< 3.1.2-2)
Provides: python3-profiler
Depends: python3.8 (>= 3.8.2-1~), libpython3-stdlib (= 3.8.2-0ubuntu2)
Pre-Depends: python3-minimal (= 3.8.2-0ubuntu2)
Suggests: python3-doc (>= 3.8.2-0ubuntu2), python3-tk (>= 3.8.2-1~), python3-venv (>= 3.8.2-0ubuntu2)
Description: interactive high-level object-oriented language (default python3 version)
Python, the high-level, interactive object oriented language,
includes an extensive class library with lots of goodies for
network programming, system administration, sounds and graphics.
.
This package is a dependency package, which depends on Debian's default
Python 3 version (currently v3.8).
Homepage: https://www.python.org/
Original-Maintainer: Matthias Klose <doko@debian.org>

Or with

$ apt-cache policy python3

Installed: 3.8.2-0ubuntu2
Candidate: 3.8.2-0ubuntu2
Version table:
*** 3.8.2-0ubuntu2 500
      500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages
      100 /var/lib/dpkg/status
Python

We can get the Python version from the config file when using a Python environment, and from the conda list output when using a Conda environment.

OS

We can infer the OS version from the build image used in the config file, but since it changes with time, we can get it from the OS itself:

$ lsb_release --description
Description:    Ubuntu 18.04.5 LTS
# or
$ cat /etc/issue
Ubuntu 18.04.5 LTS \n \l
Format

The final information to be saved would consist of:

  • organization: the organization id/slug

  • project: the project id/slug

  • version: the version id/slug

  • build: the build id, date, length, status.

  • user_config: Original user config file

  • final_config: Final configuration used (merged with defaults)

  • packages.pip: List of pip packages with name and version

  • packages.conda: List of conda packages with name, channel, and version

  • packages.apt: List of apt packages

  • python: Python version used

  • os: Operating system used

{
  "organization": {
    "id": 1,
    "slug": "org"
  },
  "project": {
    "id": 2,
    "slug": "docs"
  },
  "version": {
    "id": 1,
    "slug": "latest"
  },
  "build": {
    "id": 3,
    "date/start": "2021-04-20-...",
    "length": "00:06:34",
    "status": "normal",
    "success": true,
    "commit": "abcd1234"
  },
  "config": {
    "user": {},
    "final": {}
  },
  "packages": {
     "pip": [{
        "name": "sphinx",
        "version": "3.4.5"
     }],
     "pip_all": [
       {
          "name": "sphinx",
          "version": "3.4.5"
       },
       {
          "name": "docutils",
          "version": "0.16.0"
       }
     ],
     "conda": [{
        "name": "sphinx",
        "channel": "conda-forge",
        "version": "0.1"
     }],
     "apt": [{
        "name": "python3-dev",
        "version": "3.8.2-0ubuntu2"
     }],
  },
  "python": "3.7",
  "os": "ubuntu-18.04.5"
}

Storage

All this information can be collected after the build has finished, and we can store it in a dedicated database (telemetry), using Django’s models.

Since this information isn’t sensitive, we should be fine saving this data even if the project/version is deleted. As we don’t care about historical data, we can save the information per-version and from their latest build only. And delete old data if it grows too much.

Should we make heavy use of JSON fields? Or try to avoid nesting structures as possible? Like config.user/config.final vs user_config/final_config. Or having several fields in our model instead of just one big json field?

Read the Docs data passed to Sphinx build context

Before calling sphinx-build to render your docs, Read the Docs injects some extra context in the templates by using the html_context Sphinx setting in the conf.py file. This extra context can be used to build some awesome features in your own theme.

Warning

This design document details future features that are not yet implemented. To discuss this document, please get in touch in the issue tracker.

Note

The Read the Docs Sphinx Theme uses this context to add additional features to the built documentation.

Context injected

Here is the full list of values injected by Read the Docs as a Python dictionary. Note that this dictionary is injected under the main key readthedocs:

{
    "readthedocs": {
        "v1": {
            "version": {
                "id": int,
                "slug": str,
                "verbose_name": str,
                "identifier": str,
                "type": str,
                "build_date": str,
                "downloads": {"pdf": str, "htmlzip": str, "epub": str},
                "links": [
                    {
                        "href": "https://readthedocs.org/api/v2/version/{id}/",
                        "rel": "self",
                    }
                ],
            },
            "project": {
                "id": int,
                "name": str,
                "slug": str,
                "description": str,
                "language": str,
                "canonical_url": str,
                "subprojects": [
                    {
                        "id": int,
                        "name": str,
                        "slug": str,
                        "description": str,
                        "language": str,
                        "canonical_url": str,
                        "links": [
                            {
                                "href": "https://readthedocs.org/api/v2/project/{id}/",
                                "rel": "self",
                            }
                        ],
                    }
                ],
                "links": [
                    {
                        "href": "https://readthedocs.org/api/v2/project/{id}/",
                        "rel": "self",
                    }
                ],
            },
            "sphinx": {"html_theme": str, "source_suffix": str},
            "analytics": {"user_analytics_code": str, "global_analytics_code": str},
            "vcs": {
                "type": str,  # 'bitbucket', 'github', 'gitlab' or 'svn'
                "user": str,
                "repo": str,
                "commit": str,
                "version": str,
                "display": bool,
                "conf_py_path": str,
            },
            "meta": {
                "API_HOST": str,
                "MEDIA_URL": str,
                "PRODUCTION_DOMAIN": str,
                "READTHEDOCS": True,
            },
        }
    }
}

Warning

Read the Docs passes information to sphinx-build that may change in the future (e.g. at the moment of building the version 0.6 this was the latest but then 0.7 and 0.8 were added to the project and also built under Read the Docs) so it’s your responsibility to use this context in a proper way.

In case you want fresh data at the moment of reading your documentation, you should consider using the Read the Docs Public API via Javascript.

Using Read the Docs context in your theme

In case you want to access to this data from your theme, you can use it like this:

{% if readthedocs.v1.vcs.type == 'github' %}
    <a href="https://github.com/{{ readthedocs.v1.vcs.user }}/{{ readthedocs.v1.vcs.repo }}
    /blob/{{ readthedocs.v1.vcs.version }}{{ readthedocs.v1.vcs.conf_py_path }}{{ pagename }}.rst">
    Show on GitHub</a>
{% endif %}

Note

In this example, we are using pagename which is a Sphinx variable representing the name of the page you are on. More information about Sphinx variables can be found in the Sphinx documentation.

Customizing the context

In case you want to add some extra context you will have to declare your own html_context in your conf.py like this:

html_context = {
    "author": "My Name",
    "date": datetime.date.today().strftime("%d/%m/%y"),
}

and use it inside your theme as:

<p>This documentation was written by {{ author }} on {{ date }}.</p>

Warning

Take into account that the Read the Docs context is injected after your definition of html_context so, it’s not possible to override Read the Docs context values.

YAML configuration file

Background

The current YAML configuration file is in beta state. There are many options and features that it doesn’t support yet. This document will serve as a design document for discuss how to implement the missing features.

Scope

  • Finish the spec to include all the missing options

  • Have consistency around the spec

  • Proper documentation for the end user

  • Allow to specify the spec’s version used on the YAML file

  • Collect/show metadata about the YAML file and build configuration

  • Promote the adoption of the configuration file

RTD settings

No all the RTD settings are applicable to the YAML file, others are applicable for each build (or version), and others for the global project.

Not applicable settings

Those settings can’t be on the YAML file because: may depend for the initial project setup, are planned to be removed, security and privacy reasons.

  • Project Name

  • Repo URL

  • Repo type

  • Privacy level (this feature is planned to be removed [1])

  • Project description (this feature is planned to be removed [2])

  • Single version

  • Default branch

  • Default version

  • Domains

  • Active versions

  • Translations

  • Subprojects

  • Integrations

  • Notifications

  • Language

  • Programming Language

  • Project homepage

  • Tags

  • Analytics code

  • Global redirects

Global settings

To keep consistency with the per-version settings and avoid confusion, this settings will not be stored in the YAML file and will be stored in the database only.

Local settings

Those configurations will be read from the YAML file in the current version that is being built.

Several settings are already implemented and documented on https://docs.readthedocs.io/en/latest/yaml-config.html. So, they aren’t covered with much detail here.

  • Documentation type

  • Project installation (virtual env, requirements file, sphinx configuration file, etc)

  • Additional builds (pdf, epub)

  • Python interpreter

  • Per-version redirects

Configuration file

Format

The file format is based on the YAML spec 1.2 [3] (latest version on the time of this writing).

The file must be on the root directory of the repository, and must be named as:

  • readthedocs.yml

  • readthedocs.yaml

  • .readthedocs.yml

  • .readthedocs.yaml

Conventions

The spec of the configuration file must use this conventions.

  • Use [] to indicate an empty list

  • Use null to indicate a null value

  • Use all (internal string keyword) to indicate that all options are included on a list with predetermined choices.

  • Use true and false as only options on boolean fields

Spec

The current spec is documented on https://docs.readthedocs.io/en/latest/yaml-config.html. It will be used as base for the future spec. The spec will be written using a validation schema such as https://json-schema-everywhere.github.io/yaml.

Versioning the spec

The version of the spec that the user wants to use will be specified on the YAML file. The spec only will have mayor versions (1.0, not 1.2) [4]. For keeping compatibility with older projects using a configuration file without a version, the latest compatible version will be used (1.0).

Adoption of the configuration file

When a user creates a new project or it’s on the settings page, we could suggest her/him an example of a functional configuration file with a minimal setup. And making clear where to put global configurations.

For users that already have a project, we can suggest him/her a configuration file on each build based on the current settings.

Configuration file and database

The settings used in the build from the configuration file (and other metadata) needs to be stored in the database, this is for later usage only, not to populate existing fields.

The build process

  • The repository is updated

  • Checkout to the current version

  • Retrieve the settings from the database

  • Try to parse the YAML file (the build fails if there is an error)

  • Merge the both settings (YAML file and database)

  • The version is built according to the settings

  • The settings used to build the documentation can be seen by the user

Dependencies

Current repository which contains the code related to the configuration file: https://github.com/readthedocs/readthedocs-build

Footnotes

Development installation

These are development setup and standards that are followed to by the core development team. If you are a contributor to Read the Docs, it might a be a good idea to follow these guidelines as well.

Requirements

A development setup can be hosted by your laptop, in a VM, on a separate server etc. Any such scenario should work fine, as long as it can satisfy the following:

  • Is Unix-like system (Linux, BSD, Mac OSX) which supports Docker. Windows systems should have WSL+Docker or Docker Desktop.

  • Has 10 GB or more of free disk space on the drive where Docker’s cache and volumes are stored. If you want to experiment with customizing Docker containers, you’ll likely need more.

  • Can spare 2 GB of system memory for running Read the Docs, this typically means that a development laptop should have 8 GB or more of memory in total.

  • Your system should ideally match the production system which uses the latest official+stable Docker distribution for Ubuntu (the docker-ce package). If you are on Windows or Mac, you may also want to try Docker Desktop.

Note

Take into account that this setup is intended for development purposes. We do not recommend to follow this guide to deploy an instance of Read the Docs for production.

Install external dependencies (Docker, Docker Compose, gVisor)

  1. Install Docker by following the official guide.

  2. Install Docker Compose with the official instructions.

  3. Install and set up gVisor following gVisor installation.

Set up your environment

  1. Clone the readthedocs.org repository:

    git clone --recurse-submodules https://github.com/readthedocs/readthedocs.org/
    
  2. Install or clone additional repositories:

    Note

    This step is only required for Read the Docs core team members.

    Core team should at very least have all required packages installed in their development image. To install these packages you must define a GitHub token before building your image:

    export GITHUB_TOKEN="..."
    export GITHUB_USER="..."
    

    In order to make development changes on any of our private repositories, such as readthedocs-ext or ext-theme, you will also need to check these repositories out:

    git clone --recurse-submodules https://github.com/readthedocs/readthedocs-ext/
    
  3. Install the requirements from common submodule:

    pip install -r common/dockerfiles/requirements.txt
    
  4. Build the Docker image for the servers:

    Warning

    This command could take a while to finish since it will download several Docker images.

    inv docker.build
    
  5. Pull down Docker images for the builders:

    inv docker.pull
    
  6. Start all the containers:

    inv docker.up  --init  # --init is only needed the first time
    
  7. Go to http://devthedocs.org to access your local instance of Read the Docs.

Check that everything works

  1. Visit http://devthedocs.org

  2. Login as admin / admin and verify that the project list appears.

  3. Go to the “Read the Docs” project, under section Build a version, click on the Build version button selecting “latest”, and wait until it finishes (this can take several minutes).

Warning

Read the Docs will compile the Python/Node.js/Rust/Go version on-the-fly each time when building the documentation. To speed things up, you can pre-compile and cache all these versions by using inv docker.compilebuildtool command. We strongly recommend to pre-compile these versions if you want to build documentation on your development instance.

  1. Click on the “View docs” button to browse the documentation, and verify that it shows the Read the Docs documentation page.

Working with Docker Compose

We wrote a wrapper with invoke around docker-compose to have some shortcuts and save some work while typing docker compose commands. This section explains these invoke commands:

inv docker.build

Builds the generic Docker image used by our servers (web, celery, build and proxito).

inv docker.up

Starts all the containers needed to run Read the Docs completely.

  • --no-search can be passed to disable search

  • --init is used the first time this command is ran to run initial migrations, create an admin user, etc

  • --no-reload makes all celery processes and django runserver to use no reload and do not watch for files changes

  • --no-django-debug runs all containers with DEBUG=False

  • --http-domain configures an external domain for the environment (useful for Ngrok or other http proxy). Note that https proxies aren’t supported. There will also be issues with “suspicious domain” failures on Proxito.

  • --ext-theme to use the new dashboard templates

  • --webpack to start the Webpack dev server for the new dashboard templates

inv docker.shell

Opens a shell in a container (web by default).

  • --no-running spins up a new container and open a shell

  • --container specifies in which container the shell is open

inv docker.manage {command}

Executes a Django management command in a container.

Tip

Useful when modifying models to run makemigrations.

inv docker.down

Stops and removes all containers running.

  • --volumes will remove the volumes as well (database data will be lost)

inv docker.restart {containers}

Restarts the containers specified (automatically restarts NGINX when needed).

inv docker.attach {container}

Grab STDIN/STDOUT control of a running container.

Tip

Useful to debug with pdb. Once the program has stopped in your pdb line, you can run inv docker.attach web and jump into a pdb session (it also works with ipdb and pdb++)

Tip

You can hit CTRL-p CTRL-p to detach it without stopping the running process.

inv docker.test

Runs all the test suites inside the container.

  • --arguments will pass arguments to Tox command (e.g. --arguments "-e py310 -- -k test_api")

inv docker.pull

Downloads and tags all the Docker images required for builders.

  • --only-required pulls only the image ubuntu-20.04.

inv docker.buildassets

Build all the assets and “deploy” them to the storage.

inv docker.compilebuildtool

Pre-compile and cache tools that can be specified in build.tools to speed up builds. It requires inv docker.up running in another terminal to be able to upload the pre-compiled version to the cache.

Adding a new Python dependency

The Docker image for the servers is built with the requirements defined in the current checked out branch. In case you need to add a new Python dependency while developing, you can use the common/dockerfiles/entrypoints/common.sh script as shortcut.

This script is run at startup on all the servers (web, celery, builder, proxito) which allows you to test your dependency without re-building the whole image. To do this, add the pip command required for your dependency in common.sh file:

# common.sh
pip install my-dependency==1.2.3

Once the PR that adds this dependency was merged, you can rebuild the image so the dependency is added to the Docker image itself and it’s not needed to be installed each time the container spins up.

Debugging Celery

In order to step into the worker process, you can’t use pdb or ipdb, but you can use celery.contrib.rdb:

from celery.contrib import rdb

rdb.set_trace()

When the breakpoint is hit, the Celery worker will pause on the breakpoint and will alert you on STDOUT of a port to connect to. You can open a shell into the container with inv docker.shell celery (or build) and then use telnet or netcat to connect to the debug process port:

nc 127.0.0.1 6900

The rdb debugger is similar to pdb, there is no ipdb for remote debugging currently.

Configuring connected accounts

These are optional steps to setup the connected accounts (GitHub, Bitbucket, and GitLab) in your development environment. This will allow you to login to your local development instance using your GitHub, Bitbucket, or GitLab credentials and this makes the process of importing repositories easier.

However, because these services will not be able to connect back to your local development instance, incoming webhooks will not function correctly. For some services, the webhooks will fail to be added when the repository is imported. For others, the webhook will simply fail to connect when there are new commits to the repository.

_images/bitbucket-oauth-setup.png

Configuring an OAuth consumer for local development on Bitbucket

  • Configure the applications on GitHub, Bitbucket, and GitLab. For each of these, the callback URI is http://devthedocs.org/accounts/<provider>/login/callback/ where <provider> is one of github, gitlab, or bitbucket_oauth2. When setup, you will be given a “Client ID” (also called an “Application ID” or just “Key”) and a “Secret”.

  • Take the “Client ID” and “Secret” for each service and enter it in your local Django admin at: http://devthedocs.org/admin/socialaccount/socialapp/. Make sure to apply it to the “Site”.

Troubleshooting

Warning

The environment is developed and mainly tested on Docker Compose v1.x. If you are running Docker Compose 2.x, please make sure you have COMPOSE_COMPATIBILITY=true set. This is automatically loaded via the .env file. If you want to ensure that the file is loaded, run:

source .env

Builds fail with a generic error

There are projects that do not use the default Docker image downloaded when setting up the development environment. These extra images are not downloaded by default because they are big and they are not required in all cases. However, if you are seeing the following error

_images/read-the-docs-build-failing.png

Build failing with a generic error

and in the console where the logs are shown you see something like BuildAppError: No such image: readthedocs/build:ubuntu-22.04, that means the application wasn’t able to find the Docker image required to build that project and it failed.

In this case, you can run a command to download all the optional Docker images:

inv docker.pull

However, if you prefer to download only the specific image required for that project and save some space on disk, you have to follow these steps:

  1. go to https://hub.docker.com/r/readthedocs/build/tags

  2. find the latest tag for the image shown in the logs (in this example is readthedocs/build:ubuntu-22.04, which the current latest tag on that page is ubuntu-22.04-2022.03.15)

  3. run the Docker command to pull it:

    docker pull readthedocs/build:ubuntu-22.04-2022.03.15
    
  4. tag the downloaded Docker image for the app to findit:

    docker tag readthedocs/build:ubuntu-22.04-2022.03.15 readthedocs/build:ubuntu-22.04
    

Once this is done, you should be able to trigger a new build on that project and it should succeed.

Core team standards

Core team members expect to have a development environment that closely approximates our production environment, in order to spot bugs and logical inconsistencies before they make their way to production.

This solution gives us many features that allows us to have an environment closer to production:

Celery runs as a separate process

Avoids masking bugs that could be introduced by Celery tasks in a race conditions.

Celery runs multiple processes

We run celery with multiple worker processes to discover race conditions between tasks.

Docker for builds

Docker is used for a build backend instead of the local host build backend. There are a number of differences between the two execution methods in how processes are executed, what is installed, and what can potentially leak through and mask bugs – for example, local SSH agent allowing code check not normally possible.

Serve documentation under a subdomain

There are a number of resolution bugs and cross-domain behavior that can only be caught by using a PUBLIC_DOMAIN setting different from the PRODUCTION_DOMAIN setting.

PostgreSQL as a database

It is recommended that Postgres be used as the default database whenever possible, as SQLite has issues with our Django version and we use Postgres in production. Differences between Postgres and SQLite should be masked for the most part however, as Django does abstract database procedures, and we don’t do any Postgres-specific operations yet.

Celery is isolated from database

Celery workers on our build servers do not have database access and need to be written to use API access instead.

Use NGINX as web server

All the site is served via NGINX with the ability to change some configuration locally.

MinIO as Django storage backend

All static and media files are served using Minio –an emulator of S3, which is the one used in production.

Serve documentation via El Proxito

El Proxito is a small application put in front of the documentation to serve files from the Django Storage Backend.

Use Cloudflare Wrangler

Documentation pages are proxied by NGINX to Wrangler, who executes a JavaScript worker to fetch the response from El Proxito and injects HTML tags (for addons) based on HTTP headers.

Search enabled by default

Elasticsearch is properly configured and enabled by default. All the documentation indexes are updated after a build is finished.

Development guides

These are guides to aid local development and common development procedures.

gVisor installation

You can mostly get by just following installation instructions in the gVisor Docker Quick Start guide.

There are a few caveats to installation, which likely depend on your local environment. For systemd based OS, you do need to configure the Docker daemon to avoid systemd cgroups.

Follow the installation and quick start directions like normal:

% yay -S gvisor-bin
...
% sudo runsc install

You do need to instruct Docker to avoid systemd cgroups. You will need to make further changes to /etc/docker/daemon.json and restart the Docker service:

{
    "runtimes": {
        "runsc": {
            "path": "/usr/bin/runsc"
        }
    },
    "exec-opts": ["native.cgroupdriver=cgroupfs"]
}

Docker is correctly configured when you can run this command from the quick start guide:

% docker run --rm -ti --runtime=runsc readthedocs/build dmesg
[    0.000000] Starting gVisor...
...

Testing gVisor

You can enable the gVisor feature flag on a project and you should see the container created with runtime=runsc now.

Designing Read the Docs

So you’re thinking of contributing some of your time and design skills to Read the Docs? That’s awesome. This document will lead you through a few features available to ease the process of working with Read the Doc’s CSS and static assets.

To start, you should follow the Development installation instructions to get a working copy of the Read the Docs repository locally.

Style catalog

Once you have RTD running locally, you can open http://localhost:8000/style-catalog/ for a quick overview of the currently available styles.

_images/headers.png

This way you can quickly get started writing HTML – or if you’re modifying existing styles you can get a quick idea of how things will change site-wide.

Readthedocs.org changes

Styles for the primary RTD site are located in media/css directory.

These styles only affect the primary site – not any of the generated documentation using the default RTD style.

Contributing

Contributions should follow the Contributing to Read the Docs guidelines where applicable – ideally you’ll create a pull request against the Read the Docs GitHub project from your forked repo and include a brief description of what you added / removed / changed, as well as an attached image (you can just take a screenshot and drop it into the PR creation form) of the effects of your changes.

There’s not a hard browser range, but your design changes should work reasonably well across all major browsers, IE8+ – that’s not to say it needs to be pixel-perfect in older browsers! Just avoid making changes that render older browsers utterly unusable (or provide a sane fallback).

Brand guidelines

Find our branding guidelines in our guidelines documentation: https://read-the-docs-guidelines.readthedocs-hosted.com.

Building and contributing to documentation

As one might expect, the documentation for Read the Docs is built using Sphinx and hosted on Read the Docs. The docs are kept in the docs/ directory at the top of the source tree, and are divided into developer and user-facing documentation.

Contributing through the Github UI

If you’re making small changes to the documentation, you can verify those changes through the documentation generated when you open a PR and can be accessed using the Github UI.

  1. click the checkmark next to your commit and it will expand to have multiple options

  2. click the “details” link next to the “docs/readthedocs.org:docs” item

    _images/details_link.png
  3. navigate to the section of the documentation you worked on to verify your changes

Contributing from your local machine

If you’re making large changes to the documentation, you may want to verify those changes locally before pushing upstream.

  1. clone the readthedocs.org repository:

    $ git clone --recurse-submodules https://github.com/readthedocs/readthedocs.org/
    
  2. create a virtual environment with Python 3.8 (preferably the latest release, 3.8.12 at the time of writing), activate it, and upgrade pip:

    $ cd readthedocs.org
    $ python3.8 -m venv .venv
    $ source .venv/bin/activate
    (.venv) $ python -m pip install -U pip
    
  3. install documentation requirements

    (.venv) $ pip install -r requirements/testing.txt
    (.venv) $ pip install -r requirements/docs.txt
    
  4. build the documents

    To build the user-facing documentation:

    (.venv) $ cd docs
    (.venv) $ make livehtml
    

    To build the developer documentation:

    (.venv) $ cd docs
    (.venv) $ RTD_DOCSET=dev make livehtml
    
  5. the documents will be available at http://127.0.0.1:4444/ and will rebuild each time you edit and save a file.

Documentation style guide

This document will serve as the canonical place to define how we write documentation at Read the Docs. The goal is to have a shared understanding of how things are done, and document the conventions that we follow.

Let us know if you have any questions or something isn’t clear.

The brand

We are called Read the Docs. The the is not capitalized.

We do however use the acronym RTD.

Titles

For page titles we use sentence case. This means only proper nouns and the first word are capitalized:

# Good ✅
How we handle support on Read the Docs.

# Bad 🔴
How we Handle Support on Read the Docs

If the page includes multiple sub-headings (H2, H3), we use sentence case there as well.

Content

  • Use :menuselection: when referring to an item or sequence of items in navigation.

  • Use :guilabel: when referring to a visual element on the screen - such as a button, drop down or input field.

  • Use **bold text** when referring to a non-interactive text element, such as a header.

  • Do not break the content across multiple lines at 80 characters, but rather break them on semantic meaning (e.g. periods or commas). Read more about this here.

  • If you are cross-referencing to a different page within our website, use the doc role and not a hyperlink.

  • If you are cross-referencing to a section within our website, use the ref role with the label from the autosectionlabel extension.

  • Use <abstract concept> and <variable> as placeholders in code and URLs. For instance:

    • https://<slug>.readthedocs.io

    • :guilabel:`<your username>` dropdown

  • Make sure that all bullet list items end with a period, and don’t mix periods with no periods.

Word list

We have a specific way that we write common words:

  • build command is the name of each step in the file. We try to avoid confusion with pipelines, jobs and steps from other CIs, as we do not have a multi-dimentional build sequence.

  • build job is the name of the general and pre-defined steps that can be overridden. They are similar to “steps” in pipelines, but on Read the Docs they are pre-defined. So it’s important to have a unique name.

  • Git should be upper case. Except when referring to the git command, then it should be written as :program:`git`.

  • Git repository for the place that stores Git repos. We used to use VCS, but this is deprecated.

  • Git provider for generic references to GitHub/Bitbucket/GitLab/Gitea etc. We avoid “host” and “platform” because they are slightly more ambiguous.

  • how to do the thing is explained in a how-to guide (notice hyphen and spaces).

  • lifecycle is spelled without hyphen nor space.

  • open source should be lower case, unless you are definitely referring to OSI's Open Source Definition.

  • .readthedocs.yaml is the general name of the build configuration file. Even though we allow custom paths to the config file, we only validate .readthedocs.yaml as the file name. Older variations of the name are considered legacy. We do not refer to it in general terms.

Substitutions

The following substitutions are used in our documentation to guarantee consistency and make it easy to apply future changes.

  • |org_brand| is used for mentioning of .org: Example: Read the Docs Community

  • |com_brand| is used for mentioning of .com. Example: Read the Docs for Business

  • |git_providers_and| is used to mention currently support Git providers with “and”. Example: GitHub, Bitbucket, and GitLab

  • |git_providers_or| is used to mention currently support Git providers with “or”. Example: GitHub, Bitbucket, or GitLab

Glossary

Since the above Word List is for internal reference, we also maintain a Glossary with terms that have canonical definitions in our docs. Terms that can otherwise have multiple definitions or have a particular meaning in Read the Docs context should always be added to the Glossary and referenced using the :term: role.

Using a glossary helps us (authors) to have consistent definitions but even more importantly, it helps and includes readers by giving them quick and easy access to terms that they may be unfamiliar with.

Use an external link or Intersphinx reference when a term is clearly defined elsewhere.

Cross-references

Cross-references are great to have as inline links. Because of sphinx-hoverxref, inline links also have a nice tooltip displayed.

We like to cross-reference other articles with a definition list inside a seealso:: admonition box. It looks like this:

.. seealso::

   :doc:`/other/documentation/article`
     You can learn more about <concept> in this (how-to/description/section/article)

Differentiating .org and .com

When there are differences on .org and .com, you can use a note:: admonition box with a definition list. Notice the use of substitutions in the example:

.. note::

   |org_brand|
      You need to be *maintainer* of a subproject in order to choose it from your main project.

   |com_brand|
      You need to have *admin access* to the subproject in order to choose it from your main project.

If the contents aren’t suitable for a note::, you can also use tabs::. We are using sphinx-tabs, however since sphinx-design also provides tabs, it should be noted that we don’t use that feature of sphinx-design.

Headlines

Sphinx is very relaxed about how headlines are applied and will digest different notations. We try to stick to the following:

Header 1
========

Header 2
--------

Header 3
~~~~~~~~

Header 4
^^^^^^^^

In the above, Header 1 is the title of the article.

Diátaxis Framework

We apply the methodology and concepts of the Diátaxis Framework. This means that both content and navigation path for all sections should fit a single category of the 4 Diátaxis categories:

  • Tutorial

  • Explanation

  • How-to

  • Reference

See also

https://diataxis.fr/

The official website of Diátaxis is the main resource. It’s best to check this out before guessing what the 4 categories mean.

Warning

Avoid minimal changes

If your change has a high coherence with another proposed or planned change, propose the changes in the same PR.

By multi-tasking on several articles about the same topic, such as an explanation and a how-to, you can easily design your content to end up in the right place Diátaxis-wise. This is great for the author and the reviewers and it saves coordination work.

Minimal or isolated changes generally raise more questions and concerns than changes that seek to address a larger perspective.

Explanation

  • Title convention: Use words indicating explanation in the title. Like Understanding <subject>, Dive into <subject>, Introduction to <subject> etc.

  • Introduce the scope in the first paragraph: “This article introduces …”. Write this as the very first thing, then re-read it and potentially shorten it later in your writing process.

  • Cross-reference the related How-to Guide. Put a seealso:: somewhere visible. It should likely be placed right after the introduction, and if the article is very short, maybe at the bottom.

  • Consider adding an Examples section.

  • Can you add screenshots or diagrams?

How-to guides

  • Title should begin with “How to …”. If the how-to guide is specific for a tool, make sure to note it in the title.

  • Navigation titles should not contain the “How to” part. Navigation title for “How to create a thing” is Creating a thing.

  • Introduce the scope: “In this guide, we will…”

    • Introduction paragraph suggestions:

      • “This guide shows <something>. <motivation>”

      • “<motivation>. This guide shows you how.”

  • Cross-reference related explanation. Put a seealso:: somewhere visible, It should likely be placed right after the introduction and if the article is very short, maybe at the bottom.

  • Try to avoid a “trivial” how-to, i.e. a step-by-step guide that just states what is on a page without further information. You can ask questions like:

    • Can this how-to contain recommendations and practical advice without breaking the how-to format?

    • Can this how-to be expanded with relevant troubleshooting?

    • Worst-case: Is this how-to describing a task that’s so trivial and self-evident that we might as well remove it?

  • Consider if an animation can be embedded: Here is an article about ‘gif-to-video’

Reference

We have not started organizing the Reference section yet, guidelines pending.

Tutorial

Note

We don’t really have tutorials targeted in the systematic refactor, so this checklist isn’t very important right now.

  • “Getting started with <subject>” is likely a good start!

  • Cross-reference related explanation and how-to.

  • Try not to explain things too much, and instead link to the explanation content.

  • Refactor other resources so you can use references instead of disturbing the flow of the tutorial.

Front-end development

Background

Note

This information is for the current dashboard templates and JavaScript source files and will soon be replaced by the new dashboard templates. This information will soon be mostly out of date.

Our modern front end development stack includes the following tools:

We use the following libraries:

Previously, JavaScript development has been done in monolithic files or inside templates. jQuery was added as a global object via an include in the base template to an external source. There are no standards currently to JavaScript libraries, this aims to solve that.

The requirements for modernizing our front end code are:

  • Code should be modular and testable. One-off chunks of JavaScript in templates or in large monolithic files are not easily testable. We currently have no JavaScript tests.

  • Reduce code duplication.

  • Easy JavaScript dependency management.

Modularizing code with Browserify is a good first step. In this development workflow, major dependencies commonly used across JavaScript includes are installed with Bower for testing, and vendorized as standalone libraries via Gulp and Browserify. This way, we can easily test our JavaScript libraries against jQuery/etc, and have the flexibility of modularizing our code. See JavaScript Bundles for more information on what and how we are bundling.

To ease deployment and contributions, bundled JavaScript is checked into the repository for now. This ensures new contributors don’t need an additional front end stack just for making changes to our Python code base. In the future, this may change, so that assets are compiled before deployment, however as our front end assets are in a state of flux, it’s easier to keep absolute sources checked in.

Getting started

You will need to follow our guide to install a development Read the Docs instance first.

The sources for our bundles are found in the per-application path static-src, which has the same directory structure as static. Files in static-src are compiled to static for static file collection in Django. Don’t edit files in static directly, unless you are sure there isn’t a source file that will compile over your changes.

To compile your changes and make them available in the application you need to run:

inv docker.buildassets

Once you are happy with your changes, make sure to check in both files under static and static-src, and commit those.

Making changes

If you are creating a new library, or a new library entry point, make sure to define the application source file in gulpfile.js, this is not handled automatically right now.

If you are bringing in a new vendor library, make sure to define the bundles you are going to create in gulpfile.js as well.

Tests should be included per-application, in a path called tests, under the static-src/js path you are working in. Currently, we still need a test runner that accumulates these files.

Deployment

If merging several branches with JavaScript changes, it’s important to do a final post-merge bundle. Follow the steps above to rebundle the libraries, and check in any changed libraries.

JavaScript bundles

There are several components to our bundling scheme:

Vendor library

We repackage these using Browserify, Bower, and Debowerify to make these libraries available by a require statement. Vendor libraries are packaged separately from our JavaScript libraries, because we use the vendor libraries in multiple locations. Libraries bundled this way with Browserify are available to our libraries via require and will back down to finding the object on the global window scope.

Vendor libraries should only include libraries we are commonly reusing. This currently includes jQuery and Knockout. These modules will be excluded from libraries by special includes in our gulpfile.js.

Minor third party libraries

These libraries are maybe used in one or two locations. They are installed via Bower and included in the output library file. Because we aren’t reusing them commonly, they don’t require a separate bundle or separate include. Examples here would include jQuery plugins used on one off forms, such as jQuery Payments.

Our libraries

These libraries are bundled up excluding vendor libraries ignored by rules in our gulpfile.js. These files should be organized by function and can be split up into multiple files per application.

Entry points to libraries must be defined in gulpfile.js for now. We don’t have a defined directory structure that would make it easy to imply the entry point to an application library.

Internationalization

This document covers the details regarding internationalization and localization that are applied in Read the Docs. The guidelines described are mostly based on Kitsune’s localization documentation.

As with most of the Django applications out there, Read the Docs’ i18n/l10n framework is based on GNU gettext. Crowd-sourced localization is optionally available at Transifex.

For more information about the general ideas, look at this document: http://www.gnu.org/software/gettext/manual/html_node/Concepts.html

Making strings localizable

Making strings in templates localizable is exceptionally easy. Making strings in Python localizable is a little more complicated. The short answer, though, is to just wrap the string in _().

Interpolation

A string is often a combination of a fixed string and something changing, for example, Welcome, James is a combination of the fixed part Welcome,, and the changing part James. The naive solution is to localize the first part and then follow it with the name:

_('Welcome, ') + username

This is wrong!

In some locales, the word order may be different. Use Python string formatting to interpolate the changing part into the string:

_('Welcome, {name}').format(name=username)

Python gives you a lot of ways to interpolate strings. The best way is to use Py3k formatting and kwargs. That’s the clearest for localizers.

Localization comments

Sometimes, it can help localizers to describe where a string comes from, particularly if it can be difficult to find in the interface, or is not very self-descriptive (e.g. very short strings). If you immediately precede the string with a comment that starts with Translators:, the comment will be added to the PO file, and visible to localizers.

Example:

DEFAULT_THEME_CHOICES = (
    # Translators: This is a name of a Sphinx theme.
    (THEME_DEFAULT, _('Default')),
    # Translators: This is a name of a Sphinx theme.
    (THEME_SPHINX, _('Sphinx Docs')),
    # Translators: This is a name of a Sphinx theme.
    (THEME_TRADITIONAL, _('Traditional')),
    # Translators: This is a name of a Sphinx theme.
    (THEME_NATURE, _('Nature')),
    # Translators: This is a name of a Sphinx theme.
    (THEME_HAIKU, _('Haiku')),
)

Adding context with msgctxt

Strings may be the same in English, but different in other languages. English, for example, has no grammatical gender, and sometimes the noun and verb forms of a word are identical.

To make it possible to localize these correctly, we can add “context” (known in gettext as msgctxt) to differentiate two otherwise identical strings. Django provides a pgettext() function for this.

For example, the string Search may be a noun or a verb in English. In a heading, it may be considered a noun, but on a button, it may be a verb. It’s appropriate to add a context (like button) to one of them.

Generally, we should only add context if we are sure the strings aren’t used in the same way, or if localizers ask us to.

Example:

from django.utils.translation import pgettext

month = pgettext("text for the search button on the form", "Search")

Plurals

You have 1 new messages grates on discerning ears. Fortunately, gettext gives us a way to fix that in English and other locales, the ngettext() function:

ngettext('singular sentence', 'plural sentence', count)

A more realistic example might be:

ngettext('Found {count} result.',
         'Found {count} results',
         len(results)).format(count=len(results))

This method takes three arguments because English only needs three, i.e., zero is considered “plural” for English. Other languages may have different plural rules, and require different phrases for, say 0, 1, 2-3, 4-10, >10. That’s absolutely fine, and gettext makes it possible.

Strings in templates

When putting new text into a template, all you need to do is wrap it in a {% trans %} template tag:

<h1>{% trans "Heading" %}</h1>

Context can be added, too:

<h1>{% trans "Heading" context "section name" %}</h1>

Comments for translators need to precede the internationalized text and must start with the Translators: keyword.:

{# Translators: This heading is displayed in the user's profile page #}
<h1>{% trans "Heading" %}</h1>

To interpolate, you need to use the alternative and more verbose {% blocktrans %} template tag — it’s actually a block:

{% blocktrans %}Welcome, {{ name }}!{% endblocktrans %}

Note that the {{ name }} variable needs to exist in the template context.

In some situations, it’s desirable to evaluate template expressions such as filters or accessing object attributes. You can’t do that within the {% blocktrans %} block, so you need to bind the expression to a local variable first:

{% blocktrans trimmed with revision.created_date|timesince as timesince %}
{{ revision }} {{ timesince }} ago
{% endblocktrans %}

{% blocktrans with project.name as name %}Delete {{ name }}?{% endblocktrans %}

{% blocktrans %} also provides pluralization. For that you need to bind a counter with the name count and provide a plural translation after the {% plural %} tag:

{% blocktrans trimmed with amount=article.price count years=i.length %}
That will cost $ {{ amount }} per year.
{% plural %}
That will cost $ {{ amount }} per {{ years }} years.
{% endblocktrans %}

Note

The previous multi-lines examples also use the trimmed option. This removes newline characters and replaces any whitespace at the beginning and end of a line, helping translators when translating these strings.

Strings in Python

Note

Whenever you are adding a string in Python, ask yourself if it really needs to be there, or if it should be in the template. Keep logic and presentation separate!

Strings in Python are more complex for two reasons:

  1. We need to make sure we’re always using Unicode strings and the Unicode-friendly versions of the functions.

  2. If you use the gettext() function in the wrong place, the string may end up in the wrong locale!

Here’s how you might localize a string in a view:

from django.utils.translation import gettext as _

def my_view(request):
    if request.user.is_superuser:
        msg = _(u'Oh hi, staff!')
    else:
        msg = _(u'You are not staff!')

Interpolation is done through normal Python string formatting:

msg = _(u'Oh, hi, {user}').format(user=request.user.username)

Context information can be supplied by using the pgettext() function:

msg = pgettext('the context', 'Search')

Translator comments are normal one-line Python comments:

# Translators: A message to users.
msg = _(u'Oh, hi there!')

If you need to use plurals, import the ungettext() function:

from django.utils.translation import ungettext

n = len(results)
msg = ungettext('Found {0} result', 'Found {0} results', n).format(n)

Lazily translated strings

You can use gettext() or ungettext() only in views or functions called from views. If the function will be evaluated when the module is loaded, then the string may end up in English or the locale of the last request!

Examples include strings in module-level code, arguments to functions in class definitions, strings in functions called from outside the context of a view. To internationalize these strings, you need to use the _lazy versions of the above methods, gettext_lazy() and ungettext_lazy(). The result doesn’t get translated until it is evaluated as a string, for example by being output or passed to unicode():

from django.utils.translation import gettext_lazy as _

class UserProfileForm(forms.ModelForm):
    first_name = CharField(label=_('First name'), required=False)
    last_name = CharField(label=_('Last name'), required=False)

In case you want to provide context to a lazily-evaluated gettext string, you will need to use pgettext_lazy().

Administrative tasks

Updating localization files

To update the translation source files (eg if you changed or added translatable strings in the templates or Python code) you should run python manage.py makemessages -l <language> in the project’s root directory (substitute <language> with a valid language code).

The updated files can now be localized in a PO editor or crowd-sourced online translation tool.

Compiling to MO

Gettext doesn’t parse any text files, it reads a binary format for faster performance. To compile the latest PO files in the repository, Django provides the compilemessages management command. For example, to compile all the available localizations, just run:

python manage.py compilemessages -a

You will need to do this every time you want to push updated translations to the live site.

Also, note that it’s not a good idea to track MO files in version control, since they would need to be updated at the same pace PO files are updated, so it’s silly and not worth it. They are ignored by .gitignore, but please make sure you don’t forcibly add them to the repository.

Transifex integration

To push updated translation source files to Transifex, run tx push -s (for English) or tx push -t <language> (for non-English).

To pull changes from Transifex, run tx pull -a. Note that Transifex does not compile the translation files, so you have to do this after the pull (see the Compiling to MO section).

For more information about the tx command, read the Transifex client’s help pages.

Note

For the Read the Docs community site, we use Invoke with a tasks.py file to follow this process:

  1. Update files and push sources (English) to Transifex:

    invoke l10n.push
    
  2. Pull the updated translations from Transifex:

    invoke l10n.pull
    

Database migrations

We use Django migrations to manage database schema changes, and the django-safemigrate package to ensure that migrations are run in a given order to avoid downtime.

To make sure that migrations don’t cause downtime, the following rules should be followed for each case.

Adding a new field

When adding a new field to a model, it should be nullable. This way, the database can be migrated without downtime, and the field can be populated later. Don’t forget to make the field non-nullable in a separate migration after the data has been populated. You can achieve this by following these steps:

  1. Set the new field as null=True and blank=True in the model.

    class MyModel(models.Model):
        new_field = models.CharField(
            max_length=100, null=True, blank=True, default="default"
        )
    
  2. Make sure that the field is always populated with a proper value in the new code, and the code handles the case where the field is null.

    if my_model.new_field in [None, "default"]:
        pass
    
    
    # If it's a boolean field, make sure that the null option is removed from the form.
    class MyModelForm(forms.ModelForm):
        def __init__(self, *args, **kwargs):
            super().__init__(*args, **kwargs)
            self.fields["new_field"].widget = forms.CheckboxInput()
            self.fields["new_field"].empty_value = False
    
  3. Create the migration file (let’s call this migration app 0001), and mark it as Safe.before_deploy.

    from django.db import migrations, models
    from django_safemigrate import Safe
    
    
    class Migration(migrations.Migration):
        safe = Safe.before_deploy
    
  4. Create a data migration to populate all null values of the new field with a proper value (let’s call this migration app 0002), and mark it as Safe.after_deploy.

    from django.db import migrations
    
    
    def migrate(apps, schema_editor):
        MyModel = apps.get_model("app", "MyModel")
        MyModel.objects.filter(new_field=None).update(new_field="default")
    
    
    class Migration(migrations.Migration):
        safe = Safe.after_deploy
    
        operations = [
            migrations.RunPython(migrate),
        ]
    
  5. After the deploy has been completed, create a new migration to set the field as non-nullable (let’s call this migration app 0003). Run this migration on a new deploy, you can mark it as Safe.before_deploy or Safe.always.

  6. Remove any handling of the null case from the code.

At the end, the deploy should look like this:

  • Deploy web-extra.

  • Run django-admin safemigrate to run the migration app 0001.

  • Deploy the webs

  • Run django-admin migrate to run the migration app 0002.

  • Create a new migration to set the field as non-nullable, and apply it on the next deploy.

Removing a field

When removing a field from a model, all usages of the field should be removed from the code before the field is removed from the model, and the field should be nullable. You can achieve this by following these steps:

  1. Remove all usages of the field from the code.

  2. Set the field as null=True and blank=True in the model.

    class MyModel(models.Model):
        field_to_delete = models.CharField(max_length=100, null=True, blank=True)
    
  3. Create the migration file (let’s call this migration app 0001), and mark it as Safe.before_deploy.

    from django.db import migrations, models
    from django_safemigrate import Safe
    
    
    class Migration(migrations.Migration):
        safe = Safe.before_deploy
    
  4. Create a migration to remove the field from the database (let’s call this migration app 0002), and mark it as Safe.after_deploy.

    from django.db import migrations, models
    from django_safemigrate import Safe
    
    
    class Migration(migrations.Migration):
        safe = Safe.after_deploy
    

At the end, the deploy should look like this:

  • Deploy web-extra.

  • Run django-admin safemigrate to run the migration app 0001.

  • Deploy the webs

  • Run django-admin migrate to run the migration app 0002.

Server side search integration

Read the Docs provides server side search (SSS) in replace of the default search engine of your site. To accomplish this, Read the Docs parses the content directly from your HTML pages [*].

If you are the author of a theme or a static site generator you can read this document, and follow some conventions in order to improve the integration of SSS with your theme/site.

Indexing

The content of the page is parsed into sections, in general, the indexing process happens in three steps:

  1. Identify the main content node.

  2. Remove any irrelevant content from the main node.

  3. Parse all sections inside the main node.

Read the Docs makes use of ARIA roles and other heuristics in order to process the content.

Tip

Following the ARIA conventions will also improve the accessibility of your site. See also https://webaim.org/techniques/semanticstructure/.

Main content node

The main content should be inside a <main> tag or an element with role=main, and there should only be one per page. This node is the one that contains all the page content to be indexed. Example:

<html>
   <head>
      ...
   </head>
   <body>
      <div>
         This content isn't processed
      </div>

      <div role="main">
         All content inside the main node is processed
      </div>

      <footer>
         This content isn't processed
      </footer>
   </body>
</html>

If a main node isn’t found, we try to infer the main node from the parent of the first section with a h1 tag. Example:

<html>
   <head>
      ...
   </head>
   <body>
      <div>
         This content isn't processed
      </div>

      <div id="parent">
         <h1>First title</h1>
         <p>
            The parent of the h1 title will
            be taken as the main node,
            this is the div tag.
         </p>

         <h2>Second title</h2>
         <p>More content</p>
      </div>
   </body>
</html>

If a section title isn’t found, we default to the body tag. Example:

<html>
   <head>
      ...
   </head>
   <body>
      <p>Content</p>
   </body>
</html>

Irrelevant content

If you have content inside the main node that isn’t relevant to the page (like navigation items, menus, or search box), make sure to use the correct role or tag for it.

Roles to be ignored:

  • navigation

  • search

Tags to be ignored:

  • nav

Special rules that are derived from specific documentation tools applied in the generic parser:

  • .linenos, .lineno (line numbers in code-blocks, comes from both MkDocs and Sphinx)

  • .headerlink (added by Sphinx to links in headers)

  • .toctree-wrapper (added by Sphinx to the table of contents generated from the toctree directive)

Example:

<div role="main">
   ...
   <nav role="navigation">
      ...
   </nav>
   ...
</div>

Sections

Sections are stored in a dictionary composed of an id, title and content key.

Sections are defined as:

  • h1-h7, all content between one heading level and the next header on the same level is used as content for that section.

  • dt elements with an id attribute, we map the title to the dt element and the content to the dd element.

All sections have to be identified by a DOM container’s id attribute, which will be used to link to the section. How the id is detected varies with the type of element:

  • h1-h7 elements use the id attribute of the header itself if present, or its section parent (if exists).

  • dt elements use the id attribute of the dt element.

To avoid duplication and ambiguous section references, all indexed dl elements are removed from the DOM before indexing of other sections happen.

Here is an example of how all content below the title, until a new section is found, will be indexed as part of the section content:

<div role="main">
   <h1 id="section-title">
      Section title
   </h1>
   <p>
      Content to be indexed
   </p>
   <ul>
      <li>This is also part of the section and will be indexed as well</li>
   </ul>

   <h2 id="2">
      This is the start of a new section
   </h2>
   <p>
      ...
   </p>

   ...

   <header>
      <h1 id="3">This is also a valid section title</h1>
   </header>
   <p>
      Thi is the content of the third section.
   </p>
</div>

Sections can be contained in up to two nested tags, and can contain other sections (nested sections). Note that the section content still needs to be below the section title. Example:

<div role="main">
   <div class="section">
      <h1 id="section-title">
         Section title
      </h1>
      <p>
         Content to be indexed
      </p>
      <ul>
         <li>This is also part of the section</li>
      </ul>

      <div class="section">
         <div id="nested-section">
            <h2>
               This is the start of a sub-section
            </h2>
            <p>
               With the h tag within two levels
            </p>
         </div>
      </div>
   </div>
</div>

Note

The title of the first section will be the title of the page, falling back to the title tag.

Other special nodes

  • Anchors: If the title of your section contains an anchor, wrap it in a headerlink class, so it won’t be indexed as part of the title.

<h2>
   Section title
   <a class="headerlink" title="Permalink to this headline"></a>
</h2>
  • Code blocks: If a code block contains line numbers, wrap them in a linenos or lineno class, so they won’t be indexed as part of the code.

<table class="highlighttable">
   <tr>
      <td class="linenos">
         <div class="linenodiv">
            <pre>1 2 3</pre>
         </div>
      </td>

      <td class="code">
         <div class="highlight">
            <pre>First line
Second line
Third line</pre>
         </div>
      </td>
   </tr>
</table>

Supporting more themes and static site generators

All themes that follow these conventions should work as expected. If you think other generators or other conventions should be supported, or content that should be ignored or have an especial treatment, or if you found an error with our indexing, let us know in our issue tracker.

Subscriptions

Subscriptions are available on Read the Docs for Business, we make use of Stripe to handle the payments and subscriptions. We use dj-stripe to handle the integration with Stripe.

Local testing

To test subscriptions locally, you need to have access to the Stripe account, and define the following environment variables with the keys from Stripe test mode:

To test the webhook locally, you need to run your local instance with ngrok, for example:

ngrok http 80
inv docker.up --http-domain xxx.ngrok.io

If this is your first time setting up subscriptions, you will to re-sync djstripe with Stripe:

inv docker.manage djstripe_sync_models

The subscription settings (RTD_PRODUCTS) already mapped to match the Stripe prices from the test mode. To subscribe to any plan, you can use any test card from Stripe, for example: 4242 4242 4242 4242 (use any future date and any value for the other fields).

Modeling

Subscriptions are attached to an organization (customer), and can have multiple products attached to it. A product can have multiple prices, usually monthly and yearly.

When a user subscribes to a plan (product), they are subscribing to a price of a product, for example, the monthly price of the “Basic plan” product.

A subscription has a “main” product (RTDProduct(extra=False)), and can have several “extra” products (RTDProduct(extra=True)). For example, an organization can have a subscription with a “Basic Plan” product, and an “Extra builder” product.

Each product is mapped to a set of features (RTD_PRODUCTS) that the user will have access to (different prices of the same product have the same features). If a subscription has multiple products, the features are multiplied by the quantity and added together. For example, if a subscription has a “Basic Plan” product with a two concurrent builders, and an “Extra builder” product with quantity three, the total number of concurrent builders the organization has will be five.

Life cycle of a subscription

When a new organization is created, a stripe customer is created for that organization, and this customer is subscribed to the trial product (RTD_ORG_DEFAULT_STRIPE_SUBSCRIPTION_PRICE).

After the trial period is over, the subscription is canceled, and their organization is disabled.

During or after the trial a user can upgrade their subscription to a paid plan (RTDProduct(listed=True)).

Custom products

We provide 3 paid plans that users can subscribe to: Basic, Advanced and Pro. Additionally, we provide an Enterprise plan, this plan is customized for each customer, and it’s manually created by the RTD core team.

To create a custom plan, you need to create a new product in Stripe, and add the product id to the RTD_PRODUCTS setting mapped to the features that the plan will provide. After that, you can create a subscription for the organization with the custom product, our appliction will automatically relate this new product to the organization.

Extra products

We have one extra product: Extra builder.

To create a new extra product, you need to create a new product in Stripe, and add the product id to the RTD_PRODUCTS setting mapped to the features that the extra product will provide, this product should have the extra attribute set to True.

To subscribe an organization to an extra product, you just need to add the product to its subscription with the desired quantity, our appliction will automatically relate this new product to the organization.

Interesting settings

DOCKER_LIMITS

A dictionary of limits to virtual machines. These limits include:

time

An integer representing the total allowed time limit (in seconds) of build processes. This time limit affects the parent process to the virtual machine and will force a virtual machine to die if a build is still running after the allotted time expires.

memory

The maximum memory allocated to the virtual machine. If this limit is hit, build processes will be automatically killed. Examples: ‘200m’ for 200MB of total memory, or ‘2g’ for 2GB of total memory.

PRODUCTION_DOMAIN

This is the domain that is used by the main application dashboard (not documentation pages).

RTD_INTERSPHINX_URL

This is the domain that is used to fetch the intersphinx inventory file. If not set explicitly this is the PRODUCTION_DOMAIN.

DEFAULT_PRIVACY_LEVEL

What privacy projects default to having. Generally set to public. Also acts as a proxy setting for blocking certain historically insecure options, like serving generated artifacts directly from the media server.

PUBLIC_DOMAIN

A special domain for serving public documentation. If set, public docs will be linked here instead of the PRODUCTION_DOMAIN.

PUBLIC_DOMAIN_USES_HTTPS

If True and PUBLIC_DOMAIN is set, that domain will default to serving public documentation over HTTPS. By default, documentation is served over HTTP.

ALLOW_ADMIN

Whether to include django.contrib.admin in the URL’s.

RTD_BUILD_MEDIA_STORAGE

Use this storage class to upload build artifacts to cloud storage (S3, Azure storage). This should be a dotted path to the relevant class (eg. 'path.to.MyBuildMediaStorage'). Your class should mixin readthedocs.builds.storage.BuildMediaStorageMixin.

ELASTICSEARCH_DSL

Default:

{
   'default': {
      'hosts': '127.0.0.1:9200'
   },
}

Settings for elasticsearch connection. This settings then pass to elasticsearch-dsl-py.connections.configure

ES_INDEXES

Default:

{
     'project': {
         'name': 'project_index',
         'settings': {'number_of_shards': 5,
                      'number_of_replicas': 0
                      }
     },
     'page': {
         'name': 'page_index',
         'settings': {
             'number_of_shards': 5,
             'number_of_replicas': 0,
         }
     },
 }

Define the elasticsearch name and settings of all the index separately. The key is the type of index, like project or page and the value is another dictionary containing name and settings. Here the name is the index name and the settings is used for configuring the particular index.

ES_TASK_CHUNK_SIZE

The maximum number of data send to each elasticsearch indexing celery task. This has been used while running elasticsearch_reindex management command.

ES_PAGE_IGNORE_SIGNALS

This settings is used to determine whether to index each page separately into elasticsearch. If the setting is True, each HTML page will not be indexed separately but will be indexed by bulk indexing.

ELASTICSEARCH_DSL_AUTOSYNC

This setting is used for automatically indexing objects to elasticsearch.

Docker pass-through settings

If you run a Docker environment, it is possible to pass some secrets through to the Docker containers from your host system. For security reasons, we do not commit these secrets to our repository. Instead, we individually define these settings for our local environments.

We recommend using direnv for storing local development secrets.

Allauth secrets

It is possible to set the Allauth application secrets for our supported providers using the following environment variables:

RTD_SOCIALACCOUNT_PROVIDERS_GITHUB_CLIENT_ID
RTD_SOCIALACCOUNT_PROVIDERS_GITHUB_SECRET
RTD_SOCIALACCOUNT_PROVIDERS_GITLAB_CLIENT_ID
RTD_SOCIALACCOUNT_PROVIDERS_GITLAB_SECRET
RTD_SOCIALACCOUNT_PROVIDERS_BITBUCKET_OAUTH2_CLIENT_ID
RTD_SOCIALACCOUNT_PROVIDERS_BITBUCKET_OAUTH2_SECRET
RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_CLIENT_ID
RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_SECRET

Stripe secrets

The following secrets are required to use djstripe and our Stripe integration.

RTD_STRIPE_SECRET
RTD_STRIPE_PUBLISHABLE
RTD_DJSTRIPE_WEBHOOK_SECRET

Testing

Before contributing to Read the Docs, make sure your patch passes our test suite and your code style passes our code linting suite.

Read the Docs uses Tox to execute testing and linting procedures. Tox is the only dependency you need to run linting or our test suite, the remainder of our requirements will be installed by Tox into environment specific virtualenv paths. Before testing, make sure you have Tox installed:

pip install tox

To run the full test and lint suite against your changes, simply run Tox. Tox should return without any errors. You can run Tox against all of our environments by running:

tox

By default, tox won’t run tests from search, in order to run all test including the search tests, you need to override tox’s posargs. If you don’t have any additional arguments to pass, you can also set the TOX_POSARGS environment variable to an empty string:

TOX_POSARGS='' tox

Note

If you need to override tox’s posargs, but you still don’t want to run the search tests, you need to include -m 'not search' to your command:

tox -- -m 'not search' -x

To target a specific environment:

tox -e py310

To run a subset of tests:

tox -e py310 -- -k test_celery

The tox configuration has the following environments configured. You can target a single environment to limit the test suite:

py310

Run our test suite using Python 3.10

py310-debug

Same as py310, but there are some useful debugging tools available in the environment.

lint

Run code linting using Prospector. This currently runs pylint, pyflakes, pep8 and other linting tools.

docs

Test documentation compilation with Sphinx.

Pytest marks

The Read the Docs code base is deployed as three instances:

  • Main: where you can see the dashboard.

  • Build: where the builds happen.

  • Serve/proxito: It is in charge of serving the documentation pages.

Each instance has its own settings. To make sure we test each part as close as possible to its real settings, we use pytest marks. This allow us to run each set of tests with different settings files, or skip some (like search tests):

DJANGO_SETTINGS_MODULE=custom.settings.file pytest -m mark
DJANGO_SETTINGS_MODULE=another.settings.file pytest -m "not mark"

Current marks are:

  • search (tests that require Elastic Search)

  • proxito (tests from the serve/proxito instance)

Tests without mark are from the main instance.

Continuous Integration

The RTD test suite is exercised by Circle CI on every push to our repo at GitHub. You can check out the current build status: https://app.circleci.com/pipelines/github/readthedocs/readthedocs.org