Read the Docs developer documentation
Documentation for running your own local version of Read the Docs for development, or taking the open source Read the Docs codebase for your own custom installation.
Contributing to Read the Docs
Are you here to help on Read the Docs? Awesome! ❤️
Read the Docs, and all of it’s related projects, are all community maintained, open-source projects. We hope you feel welcome as you begin contributing to any of these projects. You’ll find that development is primarily supported by our core team members, who all work on Read the Docs full-time.
All members of our community are expected to follow our Code of Conduct. Please make sure you are welcoming and friendly in all of our spaces.
Get in touch
If you have a question or comment, we generally suggest the following communication channels:
Ask usage questions (“How do I?”) on StackOverflow.
Report bugs, suggest features, or view the source code on GitHub.
Discuss development topics on Gitter.
Contributing
There are plenty of places to contribute to Read the Docs, but if you are just starting with contributions, we suggest focusing on the following areas:
Contributing to development
If you want to deep dive and help out with development on Read the Docs, then first get the project installed locally according to the installation guide. After that is done we suggest you have a look at tickets in our issue tracker that are labelled Good First Issue. These are meant to be a great way to get a smooth start and won’t put you in front of the most complex parts of the system.
If you are up to more challenging tasks with a bigger scope, then there are a set of tickets with a Feature or Improvement tag. These tickets have a general overview and description of the work required to finish. If you want to start somewhere, this would be a good place to start (make sure that the issue also have the Accepted label). That said, these aren’t necessarily the easiest tickets. They are simply things that are explained. If you still didn’t find something to work on, search for the Sprintable label. Those tickets are meant to be standalone and can be worked on ad-hoc.
You can read all of our Read the Docs developer documentation to understand more the development of Read the Docs. When contributing code, then please follow the standard Contribution Guidelines set forth at contribution-guide.org.
Contributing to documentation
Documentation for Read the Docs itself is hosted by Read the Docs at https://docs.readthedocs.io (likely the website you are currently reading).
There are guidelines around writing and formatting documentation for the project. For full details, including how to build it, see Building and contributing to documentation.
Contributing to translations
We use Transifex to manage localization for all of our projects that we support localization on. If you are interested in contributing, we suggest joining a team on one of our projects on Transifex. From there, you can suggest translations, and can even be added as a reviewer, so you can correct and approve suggestions.
If you don’t see your language in our list of approved languages for any of our projects, feel free to suggest the language on Transifex to start the process.
Triaging issues
Everyone is encouraged to help improving, refining, verifying and prioritizing issues on Github. The Read the Docs core Read the Docs team uses the following guidelines for issue triage on all of our projects. These guidelines describe the issue lifecycle step-by-step.
Note
You will need Triage permission on the project in order to do this. You can ask one of the members of the Read the Docs team to give you access.
Tip
Triaging helps identify problems and solutions and ultimately what issues that are ready to be worked on. The core Read the Docs team maintains a separate Roadmap of prioritized issues - issues will only end up on that Roadmap after they have been triaged.
Initial triage
When sitting down to do some triaging work, we start with the list of untriaged tickets. We consider all tickets that do not have a label as untriaged. The first step is to categorize the ticket into one of the following categories and either close the ticket or assign an appropriate label. The reported issue …
- … is not valid
If you think the ticket is invalid comment why you think it is invalid, then close the ticket. Tickets might be invalid if they were already fixed in the past or it was decided that the proposed feature will not be implemented because it does not conform with the overall goal of Read the Docs. Also if you happen to know that the problem was already reported, reference the other ticket that is already addressing the problem and close the duplicate.
Examples:
Builds fail when using matplotlib: If the described issue was already fixed, then explain and instruct to re-trigger the build.
Provide way to upload arbitrary HTML files: It was already decided that Read the Docs is not a dull hosting platform for HTML. So explain this and close the ticket.
- … does not provide enough information
Add the label Needed: more information if the reported issue does not contain enough information to decide if it is valid or not and ask on the ticket for the required information to go forward. We will re-triage all tickets that have the label Needed: more information assigned. If the original reporter left new information we can try to re-categorize the ticket. If the reporter did not come back to provide more required information after a long enough time, we will close the ticket (this will be roughly about two weeks).
Examples:
My builds stopped working. Please help! Ask for a link to the build log and for which project is affected.
- … is a valid feature proposal
If the ticket contains a feature that aligns with the goals of Read the Docs, then add the label Feature. If the proposal seems valid but requires further discussion between core contributors because there might be different possibilities on how to implement the feature, then also add the label Needed: design decision.
Examples:
Provide better integration with service XYZ
Achieve world domination (also needs the label Needed: design decision)
- … is a small change to the source code
If the ticket is about code cleanup or small changes to existing features would likely have the Improvement label. The distinction for this label is that these issues have a lower priority than a Bug, and aren’t implementing new features.
Examples:
Refactor namedtuples to dataclasess
Change font size for the project’s title
- … is a valid problem within the code base:
If it’s a valid bug, then add the label Bug. Try to reference related issues if you come across any.
Examples:
Builds fail if conf.py contains non-ascii letters
- … is a currently valid problem with the infrastructure:
Users might report about web server downtimes or that builds are not triggered. If the ticket needs investigation on the servers, then add the label Operations.
Examples:
Builds are not starting
- … is a question and needs answering:
If the ticket contains a question about the Read the Docs platform or the code, then add the label Support.
Examples:
My account was set inactive. Why?
How to use C modules with Sphinx autodoc?
Why are my builds failing?
- … requires a one-time action on the server:
Tasks that require a one time action on the server should be assigned the two labels Support and Operations.
Examples:
Please change my username
Please set me as owner of this abandoned project
After we finished the initial triaging of new tickets, no ticket should be left without a label.
Additional labels for categorization
Additionally to the labels already involved in the section above, we have a few more at hand to further categorize issues.
- High Priority
If the issue is urgent, assign this label. In the best case also go forward to resolve the ticket yourself as soon as possible.
- Good First Issue
This label marks tickets that are easy to get started with. The ticket should be ideal for beginners to dive into the code base. Better is if the fix for the issue only involves touching one part of the code.
- Sprintable
Sprintable are all tickets that have the right amount of scope to be handled during a sprint. They are very focused and encapsulated.
For a full list of available labels and their meanings, see Overview of issue labels.
Helpful links for triaging
Here is a list of links for contributors that look for work:
Untriaged tickets: Go and triage them!
Tickets labelled with Needed: more information: Come back to these tickets once in a while and close those that did not get any new information from the reporter. If new information is available, go and re-triage the ticket.
Tickets labelled with Operations: These tickets are for contributors who have access to the servers.
Tickets labelled with Support: Experienced contributors or community members with a broad knowledge about the project should handle those.
Tickets labelled with Needed: design decision: Project leaders must take actions on these tickets. Otherwise no other contributor can go forward on them.
Code of Conduct
Like the technical community as a whole, the Read the Docs team and community is made up of a mixture of professionals and volunteers from all over the world, working on every aspect of the mission - including mentorship, teaching, and connecting people.
Diversity is one of our huge strengths, but it can also lead to communication issues and unhappiness. To that end, we have a few ground rules that we ask people to adhere to. This code applies equally to founders, mentors and those seeking help and guidance.
This isn’t an exhaustive list of things that you can’t do. Rather, take it in the spirit in which it’s intended - a guide to make it easier to enrich all of us and the technical communities in which we participate.
This code of conduct applies to all spaces managed by the Read the Docs project. This includes live chat, mailing lists, the issue tracker, and any other forums created by the project team which the community uses for communication. In addition, violations of this code outside these spaces may affect a person’s ability to participate within them.
If you believe someone is violating the code of conduct, we ask that you report it by emailing dev@readthedocs.org.
Be friendly and patient.
Be welcoming. We strive to be a community that welcomes and supports people of all backgrounds and identities. This includes, but is not limited to members of any race, ethnicity, culture, national origin, colour, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability.
Be considerate. Your work will be used by other people, and you in turn will depend on the work of others. Any decision you take will affect users and colleagues, and you should take those consequences into account when making decisions. Remember that we’re a world-wide community, so you might not be communicating in someone else’s primary language.
Be respectful. Not all of us will agree all the time, but disagreement is no excuse for poor behavior and poor manners. We might all experience some frustration now and then, but we cannot allow that frustration to turn into a personal attack. It’s important to remember that a community where people feel uncomfortable or threatened is not a productive one. Members of the Read the Docs community should be respectful when dealing with other members as well as with people outside the Read the Docs community.
Be careful in the words that you choose. We are a community of professionals, and we conduct ourselves professionally. Be kind to others. Do not insult or put down other participants. Harassment and other exclusionary behavior aren’t acceptable. This includes, but is not limited to:
Violent threats or language directed against another person.
Discriminatory jokes and language.
Posting sexually explicit or violent material.
Posting (or threatening to post) other people’s personally identifying information (“doxing”).
Personal insults, especially those using racist or sexist terms.
Unwelcome sexual attention.
Advocating for, or encouraging, any of the above behavior.
Repeated harassment of others. In general, if someone asks you to stop, then stop.
When we disagree, try to understand why. Disagreements, both social and technical, happen all the time and Read the Docs is no exception. It is important that we resolve disagreements and differing views constructively. Remember that we’re different. The strength of Read the Docs comes from its varied community, people from a wide range of backgrounds. Different people have different perspectives on issues. Being unable to understand why someone holds a viewpoint doesn’t mean that they’re wrong. Don’t forget that it is human to err and blaming each other doesn’t get us anywhere. Instead, focus on helping to resolve issues and learning from mistakes.
Original text courtesy of the Speak Up! project. This version was adopted from the Django Code of Conduct.
Overview of issue labels
Here is a full list of labels that we use in the GitHub issue tracker and what they stand for.
- Accepted
Issues with this label are issues that the core team has accepted on to the roadmap. The core team focuses on accepted bugs, features, and improvements that are on our immediate roadmap and will give priority to these issues. Pull requests could be delayed or closed if the pull request doesn’t align with our current roadmap. An issue or pull request that has not been accepted should either eventually move to an accepted state, or should be closed. As an issue is accepted, we will find room for it on our roadmap or roadmap backlog.
- Bug
An issue describing unexpected or malicious behaviour of the readthedocs.org software. A Bug issue differs from an Improvement issue in that Bug issues are given priority on our roadmap. On release, these issues generally only warrant incrementing the patch level version.
- Design
Issues related to the UI of the readthedocs.org website.
- Feature
Issues that describe new features. Issues that do not describe new features, such as code cleanup or fixes that are not related to a bug, should probably be given the Improvement label instead. On release, issues with the Feature label warrant at least a minor version increase.
- Good First Issue
This label marks issues that are easy to get started with. The issue should be ideal for beginners to dive into the code base.
- Priority: high
Issues with this label should be resolved as quickly as possible.
- Priority: low
Issues with this label won’t have the immediate focus of the core team.
- Improvement
An issue with this label is not a Bug nor a Feature. Code cleanup or small changes to existing features would likely have this label. The distinction for this label is that these issues have a lower priority on our roadmap compared to issues labeled Bug, and aren’t implementing new features, such as a Feature issue might.
- Needed: design decision
Issues that need a design decision are blocked for development until a project leader clarifies the way in which the issue should be approached.
- Needed: documentation
If an issue involves creating or refining documentation, this label will be assigned.
- Needed: more information
This label indicates that a reply with more information is required from the bug reporter. If no response is given by the reporter, the issue is considered invalid after 2 weeks and will be closed. See the documentation about our triage process for more information.
- Needed: patch
This label indicates that a patch is required in order to resolve the issue. A fix should be proposed via a pull request on GitHub.
- Needed: tests
This label indicates that a better test coverage is required to resolve the issue. New tests should be proposed via a pull request on GitHub.
- Needed: replication
This label indicates that a bug has been reported, but has not been successfully replicated by another user or contributor yet.
- Operations
Issues that require changes in the server infrastructure.
- PR: work in progress
Pull requests that are not complete yet. A final review is not possible yet, but every pull request is open for discussion.
- PR: hotfix
Pull request was applied directly to production after a release. These pull requests still need review to be merged into the next release.
- Sprintable
Sprintable are all issues that have the right amount of scope to be handled during a sprint. They are very focused and encapsulated.
- Status: blocked
The issue cannot be resolved until some other issue has been closed. See the issue’s log for which issue is blocking this issue.
- Status: stale
A issue is stale if it there has been no activity on it for 90 days. Once a issue is determined to be stale, it will be closed after 2 weeks unless there is activity on the issue.
- Support
Questions that needs answering but do not require code changes or issues that only require a one time action on the server will have this label. See the documentation about our triage process for more information.
Roadmap
Process
We publicly organize our product roadmap, on our GitHub Roadmap. Here, you will find several views into our roadmap:
- Current sprint
Work that core team is currently responsible for.
- Backlog
Work that we have planned for future sprints. Items with an assigned timeframe have generally been discussed already by the team. Items that do not yet have a timeframe assigned are not yet a priority of the core team.
The focus of the core team will be on roadmap and sprint items. These items are promoted from our backlog before each sprint begins.
Triaging issues for the Roadmap
Issues are triaged before they are worked on, involving a number of steps that are covered in Contributing to Read the Docs. Everyone can take part in helping to triage issues, read more in Triaging issues. Additionally, issues are considered for the Roadmap according to the following process:
New issues coming in will be triaged, but won’t yet be considered part of our roadmap.
If the issue is a valid bug, it will be assigned the
Accepted
label and will be prioritized, likely on an upcoming release.If the issues are a feature or improvement, the issue might go through a design decision phase before being accepted and assigned to our roadmap. This is a good time to discuss how to address the problem technically. Skipping this phase might result in your PR being blocked, sent back to design decision, or perhaps even discarded. It’s best to be active here before submitting a PR for a feature or improvement.
The core team will only work on accepted issues, and will give PR review priority to accepted roadmap/sprint issues. Pull requests addressing issues that are not on our roadmap are welcome, but we cannot guarantee review response, even for small or easy to review pull requests.
Where to contribute
It’s best to pick off issues from our roadmap, and specifically from our backlog, to ensure your pull request gets attention. If you find an issue that is not currently on our roadmap, we suggest asking about the priority of the issue. In some cases, we might put the issue on our roadmap to give it priority.
Design documents
This is where we outline the design of major parts of our project. Generally this is only available for features that have been build in the recent past, but we hope to write more of them over time.
Warning
These documents may not match the final implementation, or may be out of date.
API v3 design document
This document describes the design, some decisions already made and built (current Version 1 of APIv3) and an implementation plan for next Versions of APIv3.
APIv3 will be designed to be easy to use and useful to perform read and write operations as the main two goals.
It will be based on Resources as APIv2 but considering the Project
resource as the main one,
from where most of the endpoint will be based on it.
Goals
Easy to use for our users (access most of resources by
slug
)Useful to perform read and write operations
Authentication/Authorization
Authentication based on scoped-tokens
Handle Authorization nicely using an abstraction layer
Cover most useful cases:
Integration on CI (check build status, trigger new build, etc)
Usage from public Sphinx/MkDocs extensions
Allow creation of flyout menu client-side
Simplify migration from other services (import projects, create multiple redirects, etc)
Non-Goals
Filter by arbitrary and useless fields
“Builds with
exit_code=1
”“Builds containing
ERROR
on their output”“Projects created after X datetime”
“Versions with tag
python
”
Cover all the actions available from the WebUI
Problems with APIv2
There are several problem with our current APIv2 that we can list:
No authentication
It’s read-only
Not designed for slugs
Useful APIs not exposed (only for internal usage currently)
Error reporting is a mess
Relationships between API resources is not obvious
Footer API endpoint returns HTML
Implementation stages
Version 1
The first implementation of APIv3 will cover the following aspects:
Authentication
all endpoints require authentication via
Authorization:
request headerdetail endpoints are available for all authenticated users
only Project’s maintainers can access listing endpoints
personalized listing
Read and Write
edit attributes from Version (only
active
andprivacy_level
)trigger Build for a specific Version
Accessible by slug
Projects are accessed by
slug
Versions are accessed by
slug
/projects/
endpoint is the main one and all of the other are nested under itBuilds are accessed by
id
, as exception to this ruleaccess all (active/non-active) Versions of a Project by
slug
get latest Build for a Project (and Version) by
slug
filter by relevant fields
Proper status codes to report errors
Browse-able endpoints
browse is allowed hitting
/api/v3/projects/
as starting pointability to navigate clicking on other resources under
_links
attribute
Rate limited
Version 2
Note
This is currently implemented and live.
Second iteration will polish issues found from the first step, and add new endpoints to allow import a project and configure it without the needed of using the WebUI as a main goal.
After Version 2 is deployed, we will invite users that reach us as beta testers to receive more feedback and continue improving it by supporting more use cases.
This iteration will include:
Minor changes to fields returned in the objects
Import Project endpoint
Edit Project attributes (“Settings” and “Advanced settings-Global settings” in the WebUI)
Trigger Build for default version
Allow CRUD for Redirect, Environment Variables and Notifications (
WebHook
andEmailHook
)Create/Delete a Project as subproject of another Project
Documentation
Version 3
Third iteration will implement granular permissions. Keeping in mind how Sphinx extension will use it:
sphinx-version-warning
needs to get all active Versions of a ProjectAn extension that creates a landing page, will need all the subprojects of a Project
To fulfill these requirements, this iteration will include:
Scope-based authorization token
Version 4
Specific endpoint for our flyout menu (returning JSON instead of HTML)
Out of roadmap
These are some features that we may want to build at some point. Although, they are currently out of our near roadmap because they don’t affect too many users, or are for internal usage only.
CRUD for Domain
Add User as maintainer
Give access to a documentation page (
objects.inv
,/design/core.html
)Internal Build process
Nice to have
Request-ID
headerJSON minified by default (maybe with
?pretty=true
)
Better handling of docs URLs
proxito
is the component of our code base in charge of serving documentation
to users and handling any other URLs from the user documentation domain.
The current implementation has some problems that are discussed in this document, and an alternative implementation is proposed to solve those problems.
Goals
Simplifying our parsing logic for URLs
Removing reserved paths and ambiguities from URLs
Allow serving docs from a different prefix and subproject prefix.
Non-goals
Allowing fully arbitrary URL generation for projects, like changing the order of the elements or removing them.
Current implementation
The current implementation is based on Django URLs trying to match a pattern that looks like a single project, a versioned project, or a subproject, this means that a couple of URLs are reserved, and won’t resolve to the correct file if it exists (https://github.com/readthedocs/readthedocs.org/issues/8399, https://github.com/readthedocs/readthedocs.org/issues/2292), this usually happens with single version projects.
And to support custom URLs we are hacking into Django’s urlconf to override it at runtime, this doesn’t allow us to implement custom URLs for subprojects easily (https://github.com/readthedocs/readthedocs.org/pull/8327).
Alternative implementation
Instead of trying to map a URL to a view, we first analyze the root project (given from the subdomain), and based on that we map each part of the URL to the current project and version.
This will allow us to re-use this code in our unresolver without the need to override the Django’s urlconf at runtime, or guessing a project only by the structure of its URL.
Terminology:
- Root project
The project from where the documentation is served (usually the parent project of a subproject or translation).
- Current project
The project that owns the current file being served (a subproject, a translation, etc).
- Requested file
The final path to the file that we need to serve from the current project.
Look up process
Proxito will process all documentation requests from a single docs serve view,
exluding /_
URLs.
This view then will process the current URL using the root project as follows:
Check if the root project has translations (the project itself is a translation if isn’t a single version project), and the first part is a language code and the second is a version.
If the lang code doesn’t match, we continue.
If the lang code matches, but the version doesn’t, we return 404.
Check if it has subprojects and the first part of the URL matches the subprojects prefix (
projects
), and if the second part of the URL matches a subproject alias.If the subproject prefix or the alias don’t match, we continue.
If they match, we try to match the rest of the URL for translations/versions and single versions (i.e, we don’t search for subprojects) and we use the subproject as the new root project.
Check if the project is a single version. Here we just try to serve the rest of the URL as the file.
Check if the first part of the URL is
page
, then this is apage
redirect. Note that this is done after we have discarded the project being a single version project, since it doesn’t makes sense to use that redirect with single version projects, and it could collide with the project having apage/
directory.404 if none of the above rules match.
Custom URLs
We are using custom URLs mainly to serve the documentation from a different directory:
deeplearning/nemo/user-guide/docs/$language/$version/$filename
deeplearning/nemo/user-guide/docs/$language/$version/$filename
deeplearning/frameworks/nvtx-plugins/user-guide/docs/$language/$version/$filename
We always keep the lang/version/filename order, do we need/want to support changing this order? Doesn’t seem useful to do so.
So, what we need is have a way to specify a prefix only. We would have a prefix used for translations and another one used for subprojects. These prefixes will be set in the root project.
The look up order would be as follow:
If the root project has a custom prefix, and the current URL matches that prefix, remove the prefix and follow the translations and single version look up process. We exclude subprojects from it, i.e, we don’t check for
{prefix}/projects
.If the root project has subprojects and a custom subprojects prefix (
projects
by default), and if the current URL matches that prefix, and the next part of the URL matches a subproject alias, continue with the subproject look up process.
Examples
The next examples are organized in the following way:
First there is a list of the projects involved, with their available versions.
The first project would be the root project.
The other projects will be related to the root project (their relationship is given by their name).
Next we will have a table of the requests, and their result.
Project with versions and translations
Projects:
project (latest, 1.0)
project-es (latest, 1.0)
Requests:
Request |
Requested file |
Current project |
Note |
---|---|---|---|
/en/latest/manual/index.html |
/latest/manual/index.html |
project |
|
/en/1.0/manual/index.html |
/1.0/manual/index.html |
project |
|
/en/1.0/404 |
404 |
project |
The file doesn’t exist |
/en/2.0/manual/index.html |
404 |
project |
The version doesn’t exist |
/es/latest/manual/index.html |
/latest/manual/index.html |
project-es |
|
/es/1.0/manual/index.html |
/1.0/manual/index.html |
project-es |
|
/es/1.0/404 |
404 |
project-es |
The translation exist, but not the file |
/es/2.0/manual/index.html |
404 |
project-es |
The translation exist, but not the version |
/pt/latest/manual/index.html |
404 |
project |
The translation doesn’t exist |
Project with subprojects and translations
Projects:
project (latest, 1.0)
project-es (latest, 1.0)
subproject (latest, 1.0)
subproject-es (latest, 1.0)
Request |
Requested file |
Current project |
Note |
---|---|---|---|
/projects/subproject/en/latest/manual/index.html |
/latest/manual/index.html |
subproject |
|
/projects/subproject/en/latest/404 |
404 |
subproject |
The subproject exists, but not the file |
/projects/subproject/en/2.x/manual/index.html |
404 |
subproject |
The subproject exists, but not the version |
/projects/subproject/es/latest/manual/index.html |
/latest/manual/index.html |
subproject-es |
|
/projects/subproject/br/latest/manual/index.html |
404 |
subproject |
The subproject exists, but not the translation |
/projects/nothing/en/latest/manual/index.html |
404 |
project |
The subproject doesn’t exist |
/manual/index.html |
404 |
project |
Single version project with subprojects
Projects:
project (latest)
subproject (latest, 1.0)
subproject-es (latest, 1.0)
Request |
Requested file |
Current project |
Note |
---|---|---|---|
/projects/subproject/en/latest/manual/index.html |
/latest/manual/index.html |
subproject |
|
/projects/subproject/en/latest/404 |
404 |
subproject |
The subproject exists, but the file doesn’t |
/projects/subproject/en/2.x/manual/index.html |
404 |
subproject |
The subproject exists, but the version doesn’t |
/projects/subproject/es/latest/manual/index.html |
/latest/manual/index.html |
subproject-es |
|
/projects/subproject/br/latest/manual/index.html |
404 |
subproject |
The subproject exists, but the translation doesn’t |
/projects/nothing/en/latest/manual/index.html |
404 |
project |
The subproject doesn’t exist |
/manual/index.html |
/latest/manual/index.html |
project |
|
/404 |
404 |
project |
The file doesn’t exist |
/projects/index.html |
/latest/projects/index.html |
project |
The project has a |
/en/index.html |
/latest/en/index.html |
project |
The project has an |
Project with single version subprojects
Projects:
project (latest, 1.0)
project-es (latest, 1.0)
subproject (latest)
Request |
Requested file |
Current project |
Note |
---|---|---|---|
/projects/subproject/manual/index.html |
/latest/manual/index.html |
subproject |
|
/projects/subproject/en/latest/manual/index.html |
404 |
subproject |
The subproject is single version |
/projects/subproject/404 |
404 |
subproject |
The subproject exists, but the file doesn’t |
/projects/subproject/br/latest/manual/index.html |
/latest/br/latest/manual/index.html |
subproject |
The subproject has a |
/projects/nothing/manual/index.html |
404 |
project |
The subproject doesn’t exist |
/en/latest/manual/index.html |
/latest/manual/index.html |
project |
|
/404 |
404 |
project |
Project with custom prefix
project (latest, 1.0)
subproject (latest, 1.0)
project
has the prefix
prefix, and sub
subproject prefix.
Request |
Requested file |
Current project |
Note |
---|---|---|---|
/en/latest/manual/index.html |
404 |
project |
The prefix doesn’t match |
/prefix/en/latest/manual/index.html |
/latest/manual/index.html |
project |
|
/projects/subproject/en/latest/manual/index.html |
404 |
project |
The subproject prefix doesn’t match |
/sub/subproject/en/latest/manual/index.html |
/latest/manual/index.html |
subproject |
|
/sub/nothing/en/latest/manual/index.html |
404 |
project |
The subproject doesn’t exist |
Project with custom subproject prefix (empty)
project (latest, 1.0)
subproject (latest, 1.0)
project
has the /
subproject prefix,
this allow us to serve subprojects without using a prefix.
Request |
Requested file |
Current project |
Note |
---|---|---|---|
/en/latest/manual/index.html |
/latest/manual/index.html |
project |
|
/projects/subproject/en/latest/manual/index.html |
404 |
project |
The subproject prefix doesn’t match |
/subproject/en/latest/manual/index.html |
/latest/manual/index.html |
subproject |
|
/nothing/en/latest/manual/index.html |
/latest/manual/index.html |
project |
The subproject/file doesn’t exist |
Implementation example
This is a simplified version of the implementation, there are some small optimizations and validations that will be in the final implementation.
In the final implementation we will be using regular expressions to extract the parts from the URL.
from readthedocs.projects.models import Project
LANGUAGES = {"es", "en"}
def pop_parts(path, n):
if path[0] == "/":
path = path[1:]
parts = path.split("/", maxsplit=n)
start, end = parts[:n], parts[n:]
end = end[0] if end else ""
return start, end
def resolve(canonical_project: Project, path: str, check_subprojects=True):
prefix = "/"
if canonical_project.prefix:
prefix = canonical_project.prefix
subproject_prefix = "/projects"
if canonical_project.subproject_prefix:
subproject_prefix = canonical_project.subproject_prefix
# Multiversion project.
if path.startswith(prefix):
new_path = path.removeprefix(prefix)
parts, new_path = pop_parts(new_path, 2)
language, version_slug = parts
if not canonical_project.single_version and language in LANGUAGES:
if canonical_project.language == language:
project = canonical_project
else:
project = canonical_project.translations.filter(language=language).first()
if project:
version = project.versions.filter(slug=version_slug).first()
if version:
return project, version, new_path
return project, None, None
# Subprojects.
if check_subprojects and path.startswith(subproject_prefix):
new_path = path.removeprefix(subproject_prefix)
parts, new_path = pop_parts(new_path, 1)
project_slug = parts[0]
project = canonical_project.subprojects.filter(alias=project_slug).first()
if project:
return resolve(
canonical_project=project,
path=new_path,
check_subprojects=False,
)
# Single project.
if path.startswith(prefix):
new_path = path.removeprefix(prefix)
if canonical_project.single_version:
version = canonical_project.versions.filter(
slug=canonical_project.default_version
).first()
if version:
return canonical_project, version, new_path
return canonical_project, None, None
return None, None, None
def view(canonical_project, path):
current_project, version, file = resolve(
canonical_project=canonical_project,
path=path,
)
if current_project and version:
return serve(current_project, version, file)
if current_project:
return serve_404(current_project)
return serve_404(canonical_project)
def serve_404(project, version=None):
pass
def serve(project, version, file):
pass
Performance
Performance is mainly driven by the number of database lookups. There is an additional impact performing a regex lookup.
A single version project:
/index.html
: 1, the version./projects/guides/index.html
: 2, the version and one additional lookup for a path that looks like a subproject.
A multi version project:
/en/latest/index.html
: 1, the version./es/latest/index.html
: 2, the translation and the version./br/latest/index.html
: 1, the translation (it doesn’t exist).
A project with single version subprojects:
/projects/subproject/index.html
: 2, the subproject and its version.
A project with multi version subprojects:
/projects/subproject/en/latest/index.html
: 2, the subproject and its version./projects/subproject/es/latest/index.html
: 3, the subproject, the translation, and its version./projects/subproject/br/latest/index.html
: 2, the subproject and the translation (it doesn’t exist).
As seen, the number of database lookups are the minimal required to get the current project and version, this is a minimum of 1, and maximum of 3.
Questions
When using custom URLs, should we support changing the URLs that aren’t related to doc serving?
These are:
Health check
Proxied APIs
robots and sitemap
The
page
redirect
This can be useful for people that proxy us from another path.
Should we use the urlconf from the subproject when processing it? This is an URL like
/projects/subproject/custom/prefix/en/latest/index.html
.I don’t think that’s useful, but it should be easy to support if needed.
Should we support the page redirect when using a custom subproject prefix? This is
/{prefix}/subproject/page/index.html
.
Build images
This document describes how Read the Docs uses the Docker Images and how they are named. Besides, it proposes a path forward about a new way to create, name and use our Docker build images to reduce its complexity and support installation of other languages (e.g. nodejs, rust, go) as extra requirements.
Introduction
We use Docker images to build user’s documentation. Each time a build is triggered, one of our VMs picks the task and go through different steps:
run some application code to spin up a Docker image into a container
execute
git
inside the container to clone the repositoryanalyze and parse files (
.readthedocs.yaml
) from the repository outside the containerspin up a new Docker container based on the config file
create the environment and install docs’ dependencies inside the container
execute build commands inside the container
push the output generated by build commands to the storage
All those steps depends on specific commands versions: git
, python
, virtualenv
, conda
, etc.
Currently, we are pinning only a few of them in our Docker images and that have caused issues
when re-deploying these images with bugfixes: the images are not reproducible over time.
Note
We have been improving the reproducibility of our images by adding some tests cases. These are run inside the Docker image after it’s built and check that it contains the versions we expect.
To allow users to pin the image we ended up exposing three images: stable
, latest
and testing
.
With that naming, we were able to bugfix issues and add more features
on each image without asking the users to change the image selected in their config file.
Then, when a completely different image appeared and after testing testing
image enough,
we discarded stable
, old latest
became the new stable
and old testing
became the new latest
.
This produced issues to people pinning their images to any of these names because after this change,
we changed all the images for all the users and many build issues arrised!
Goals
release completely new Docker images without forcing users to change their pinned image (
stable
,latest
,testing
)allow users to select language requirements instead of an image name
use a
base
image with the dependencies that don’t change frequently (OS and base requirements)base
image naming is tied to the OS version (e.g. Ubuntu LTS)allow us to add/update a Python version without affecting the
base
imagereduce size on builder VM disks by sharing Docker image layers
allow users to specify extra languages (e.g. nodejs, rust, go)
de-motivate the usage of
stable
,latest
andtesting
; and promote declaring language requirements insteadnew images won’t contain old/deprecated OS (eg. Ubuntu 18) and Python versions (eg. 3.5, miniconda2)
install language requirements at built time using
asdf
and its pluginscreate local mirrors for all languages supported
deleting a pre-built image won’t make builds to fail; only make them slower
support only the latest Ubuntu LTS version and keep the previous one as long as it’s officially supported
Non goals
allow creation/usage of custom Docker images
allow to execute arbitrary commands via hooks (eg.
pre_build
)automatically build & push all images on commit
pre-built multiple images for all the languages combinations
Pre-built build image structure
The new pre-built images will depend only on the Ubuntu OS.
They will contain all the requirements to add extra languages support at built time via asdf
command.
ubuntu20-base
labels
environment variables
system dependencies
install requirements
LaTeX dependencies (for PDF generation)
languages version manager (
asdf
) and its plugins for each languageUID and GID
Instead of building all the Docker image per language versions combination, it will be easier to install all of them at build time using the same steps. Installing a language only adds a few seconds when binaries are provided. However, to reduce the time to install these languages as much as possible, a local mirror hosted on S3 for each language will be created.
It’s important to note that Python does not provide binaries and compiling a version takes around ~2 minutes. However, the Python versions could be pre-compiled and expose their binaries via S3 to builders. Then, at build time, the builder will only download the binary and copy it in the correct path.
Note
Depending on the demand, Read the Docs may pre-build the most common combinations of languages used by users.
For example, ubuntu20+python39+node14
or ubuntu20+python39+node14+rust1
.
However, this is seen as an optimization for the future and it’s not required for this document.
Build steps
With this new approach, the steps followed by a builder will be:
run some application code to spin up the
-base
Docker image into a containerexecute
git
inside the container to clone the repositoryanalyze and parse files (
.readthedocs.yaml
) from the repository outside the containerspin up a new Docker container based on the Ubuntu OS specified in the config file
install all language dependencies from the cache
create the environment and install docs’ dependencies inside the container
execute build commands inside the container
push the output generated by build commands to the storage
The main difference with the current approach are:
the image to spin up is selected depending on the OS version
all language dependencies are installed at build time
languages not offering binaries are pre-compiled by Read the Docs and stored in the cache
miniconda/mambaforge are now managed with the same management tool (e.g.
asdf install python miniconda3-4.7.12
)
Specifying extra languages requirements
Different users may have different requirements.
People with specific language dependencies will be able to install them by using .readthedocs.yaml
config file.
Example:
build:
os: ubuntu20
languages:
python: "3.9" # supports "pypy3", "miniconda3" and "mambaforge"
nodejs: "14"
rust: "1.54"
golang: "1.17"
Important highlights:
do not treat Python language different from the others (will help us to support other non-Python doctools in the future)
specifying
build.languages.python: "3"
will use Python version3.x.y
, and may differ between buildsspecifying
build.languages.python: "3.9"
will use Python version3.9.y
, and may differ between buildsspecifying
build.languages.nodejs: "14"
will use nodejs version14.x.y
, and may differ between buildsif no full version is declared, it will try first latest available on our cache, and then the latest on
asdf
(it has to match the first part of the version declared)specifying minor language versions is not allowed (e.g.
3.7.11
)not specifying
build.os
will make the config file parser to failnot specifying
build.languages
will make the config file parsing to fail (at least one is required)specifying only
build.languages.nodejs
and using Sphinx to build the docs, will make the build to fail (e.g. “Command not found”)build.image
is incompatible withbuild.os
orbuild.languages
and will produce an errorpython.version
is incompatible withbuild.os
orbuild.languages
and will produce an errorUbuntu 18 will still be available via
stable
andlatest
images, but not in new onesonly a subset (not defined yet) of
python
,nodejs
,rust
andgo
versions onasdf
are available to select
Note
We are moving away from users specifying a particular Docker image. With the new approach, users will specify the languages requirements they need, and Read the Docs will decide if it will use a pre-built image or will spin up the base one and install these languages on the fly.
However, build.image
will be still available for backward compatibility with stable
, latest
and testing
but won’t support the new build.languages
config.
Note that knowing exactly what packages users are installing,
could allow us to pre-build the most common combinations used images: ubuntu20+py39+node14
.
Time required to install languages at build time
Testings using time
command in ASG instances to install extra languages took these “real” times:
build-default
python 3.9.6: 2m21.331s
mambaforge 4.10.1: 0m26.291s
miniconda3 4.7.12: 0m9.955s
nodejs 14.17.5: 0m5.603s
rust 1.54.0: 0m13.587s
golang 1.17: 1m30.428s
build-large
python 3.9.6: 2m33.688s
mambaforge 4.10.1: 0m28.781s
miniconda3 4.7.12: 0m10.551s
nodejs 14.17.5: 0m6.136s
rust 1.54.0: 0m14.716s
golang 1.17: 1m36.470s
Note that the only one that required compilation was Python. All the others, spent 100% of its time downloading the binary. These download times are way better from EU with a home internet connection.
In the worst scenario: “none of the specified language version has a pre-built image”, the build will require ~5 minutes to install all the language requirements. By providing only pre-built images with the Python version (that’s the most time consuming), build times will only require ~2 minutes to install the others. However, requiring one version of each language is not a common case.
Cache language binaries on S3
asdf
scripts can be altered to download the .tar.gz
dist files from a different mirror than the official one.
Read the Docs can make usage of this to create a mirror hosted locally on S3 to get faster download speeds.
This will make a good improvement for languages that offer binaries: nodejs
, rust
and go
:
nodejs
usesNODEJS_ORG_MIRROR
: https://github.com/asdf-vm/asdf-nodejs/blob/f9957f3f256ebbb3fdeebcaed5082ad305222be6/lib/utils.sh#L5rust
usesRUSTUP_UPDATE_ROOT
: https://github.com/rust-lang/rustup/blob/499e582bc8ba34fa7e84d5120001aae31151d3c8/rustup-init.sh#L23go
has the URL hardcoded: https://github.com/kennyp/asdf-golang/blob/cc8bc47d4877beed61e10815d46669e1eaaa0bbe/bin/download#L54
However, currently Python does not offer binaries and a different solution is needed. Python versions can be pre-compiled once and expose the output on the S3 for the builders to download and extract in the correct PATH.
Tip
Since we are building a special cache for pre-compiled Python versions, we could use the same method for all the other languages instead of creating a full mirror (many Gigabyes) This simple bash script download the language sources, compiles it and upload it to S3 without requiring a mirror. Note that it works in the same way for all the languages, not just for Python.
Questions
What Python versions will be pre-compiled and cached?
At start only a small subset of Python version will be pre-compiled:
2.7.x
3.7.x
3.8.x
3.9.x
3.10.x
pypy3.x
How do we upgrade a Python version?
Python patch versions can be upgraded by re-compiling the new patch version and making it available in our cache. For example, if version 3.9.6 is the one available and 3.9.7 is released, after updating our cache:
users specifying
build.languages.python: "3.9"
will get the 3.9.7 versionusers specifying
build.languages.python: "3"
will get the 3.9.7 version
As we will have control over these version, we can decide when to upgrade (if ever required) and we can roll back if the new pre-compiled version was built with a problem.
Note
Python versions may need to be re-compiled each time that the -base
image is re-built.
This is due that some underlying libraries that Python depend on may have changed.
Note
Installing always the latest version is harder to maintain. It will require building the newest version each time a new patch version is released. Beacause of that, Read the Docs will always be behind official releases. Besides, it will give projects different versions more often.
Exposing to the user the patch version would require to cache many different versions ourselves, and if the user selects one patched version that we don’t have cached by mistake, those builds will add extra build time.
How do we add a Python version?
Adding a new Python version requires:
pre-compile the desired version for each Ubuntu OS version supported
upload the compressed output to S3
add the supported version to the config file validator
How do we remove an old Python version?
At some point, an old version of Python will be deprecated (eg. 3.4) and will be removed. To achieve this, we can just remove the pre-compiled Python version from the cache.
However, unless it’s strictly neeed for some specific reason, we shouldn’t require to remove support for a Python version as long as we support the Ubuntu OS version where this version was compiled for.
In any case, we will know which projects are using these versions because they are pinning these specific versions in the config file. We could show a message in the build output page and also send them an email with the EOL date for this image.
However, removing pre-compiled Python version that it’s being currently used by some users won’t make their builds to fail. Instead, that Python version will be compiled and installed at build time; adding a “penalization” time to those projects and motivating them to move forward to a newer version.
How do we upgrade system versions?
We usually don’t upgrade these dependencies unless we upgrade the Ubuntu version. So, they will be only upgraded when we go from Ubuntu 18.04 LTS to Ubuntu 20.04 LTS for example.
Examples of these versions are:
doxygen
git
subversion
pandoc
swig
latex
This case will introduce a new base
image. Example, ubuntu22-base
in 2022.
Note that these images will be completely isolated from the rest and don’t require them to rebuild.
This also allow us to start testing a newer Ubuntu version (e.g. 22.04 LTS) without breaking people’s builds,
even before it’s officially released.
How do we add an extra requirement?
In case we need to add an extra requirement to the base
image,
we will need to rebuild all of them.
The new image may have different package versions since there may be updates on the Ubuntu repositories.
This conveys some risk here, but in general we shouldn’t require to add packages to the base images.
In case we need an extra requirement for all our images, I’d recommend to add it when creating a new base image.
If it’s strongly needed and we can’t wait for a new base image,
we could install it at build time in a similar way as we do with build.apt_packages
as a temporal workaround.
How do we create a mirror for each language?
A mirror can be created with wget
together with rclone
:
Download all the files from the official mirror:
# https://stackoverflow.com/questions/29802579/create-private-mirror-of-http-nodejs-org-dist wget --mirror --convert-links --adjust-extension --page-requisites --no-parent -e robots=off http://nodejs.org/dist
Upload all the files to S3:
# https://rclone.org/s3/ rclone sync -i nodejs.org s3:languages
Note
Downloading a copy of the official mirror took 15m and 52Gb.
How local development will work with the new approach?
Local development will require scripts to clone the official mirrors for each language and upload them to MinIO (S3). Besides, a script to define a set of Python version, pre-compile them and also upload them to S3.
This is already covered by this simple bash script and tested in this PR with a POC: https://github.com/readthedocs/readthedocs.org/pull/8453
Deprecation plan
After this design document gets implemented and tested,
all our current images (stable
, latest
, testing
) will be deprecated and their usage will be de-motivated.
However, we could keep them on our builders to give users a good time to migrate their projects to the new ones.
We may want to keep only the latest Ubuntu LTS release available in production,
with a special consideration for our current Ubuntu 18.04 LTS on stable
, latest
and testing
because 100% of the projects depend on them currently.
Once Ubuntu 22.04 LTS is released, we should communicate that Ubuntu 20.04 LTS is deprecated,
and keep it available in our servers during the time that’s officially supported by Ubuntu during the “Maintenance updates”
(see “Login term support and interim releases” in https://ubuntu.com/about/release-cycle).
As an example, Ubuntu 22.04 LTS will be officially released on April 2022 and we will offer support for it until 2027.
Warning
Deleting -base
images from the build servers will make project’s builds to fail.
We want to keep supporting them as much as we can, but having a well-defined deprecation policy is a win.
Work required and rollout plan
The following steps are required to support the full proposal of this document.
allow users to install extras languages requirements via config file
update config file to support
build.os
andbuild.languages
configmodify builder code to run
asdf install
for all supported languages
build a new base Docker image with new structure (
ubuntu20-base
)build new image with Ubuntu 20.04 LTS and pre-installed
asdf
with all its pluginsdo not install any language version on base image
deploy builders with new base image
At this point, we will have a full working setup.
It will be opt-in by using the new configs build.os
and build.languages
.
However, all languages will be installed at build time;
which will “penalize” all projects because all of them will have to install Python.
After testing this for some time, we can continue with the following steps that provides a cache to optimize installation times:
create mirrors on S3 for all supported languages
create mirror for pre-compiled latest 3 Python versions, Python 2.7 and PyPy3
Conclusion
There is no need to differentiate the images by its state (stable, latest, testing) but by its main base differences: OS. The version of the OS will change many library versions, LaTeX dependencies, basic required commands like git and more, that doesn’t seem to be useful to have the same OS version with different states.
Allowing users to install extra languages by using the Config File will cover most of the support requests we have had in the past. It also will allow us to know more about how our users are using the platform to make future decisions based on this data. Exposing users how we want them to use our platform will allow us to be able to maintain it longer, than giving the option to select a specific Docker image by name that we can’t guarrantee it will be frozen.
Finally, having the ability to deprecate and remove pre-built images from our builders over time,
will reduce the maintainance work required from the the core team.
We can always support all the languages versions by installing them at build time.
The only required pre-built image for this are the OS -base
images.
In fact, even after decided to deprecate and removed a pre-built image from the builders,
we can re-build it if we find that it’s affecting many projects and slowing down their builds too much,
causing us problems.
Embed APIv3
The Embed API allows users to embed content from documentation pages in other sites.
It has been treated as an experimental feature without public documentation or real applications,
but recently it started to be used widely (mainly because we created the hoverxref
Sphinx extension).
The main goal of this document is to design a new version of the Embed API to be more user friendly, make it more stable over time, support embedding content from pages not hosted at Read the Docs, and remove some quirkiness that makes it hard to maintain and difficult to use.
Note
This work is part of the CZI grant that Read the Docs received.
Current implementation
The current implementation of the API is partially documented in How to embed content from your documentation. It has some known problems:
There are different ways of querying the API:
?url=
(generic) and?doc=
(relies on Sphinx’s specific concept)Doesn’t support MkDocs
Lookups are slow (~500 ms)
IDs returned aren’t well formed (like empty IDs
"headers": [{"title": "#"}]
)The content is always an array of one element
It tries different variations of the original ID
It doesn’t return valid HTML for definition lists (
dd
tags without adt
tag)
Goals
We plan to add new features and define a contract that works the same for all HTML. This project has the following goals:
Support embedding content from pages hosted outside Read the Docs
Do not depend on Sphinx
.fjson
filesQuery and parse the
.html
file directly (from our storage or from an external request)Rewrite all links returned in the content to make them absolute
Require a valid HTML
id
selectorAccept only
?url=
request GET argument to query the endpointSupport
?nwords=
and?nparagraphs=
to return chunked contentHandle special cases for particular doctools (e.g. Sphinx requires to return the
.parent()
element fordl
)Make explicit the client is asking to handle the special cases (e.g. send
?doctool=sphinx&version=4.0.1&writer=html4
)Delete HTML tags from the original document (for well-defined special cases)
Add HTTP cache headers to cache responses
Allow CORS from everywhere only for public projects
The contract
Return the HTML tag (and its children) with the id
selector requested
and replace all the relative links from its content making them absolute.
Note
Any other case outside this contract will be considered special and will be implemented
only under ?doctool=
, ?version=
and ?writer=
arguments.
If no id
selector is sent to the request, the content of the first meaningful HTML tag
(<main>
, <div role="main">
or other well-defined standard tags) identifier found is returned.
Embed endpoints
This is the list of endpoints to be implemented in APIv3:
- GET /api/v3/embed/
Returns the exact HTML content for a specific identifier (
id
). If no anchor identifier is specified the content of the first one returned.Example request:
$ curl https://readthedocs.org/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment
Example response:
{ "project": "docs", "version": "latest", "language": "en", "path": "development/install.html", "title": "Development Installation", "url": "https://docs.readthedocs.io/en/latest/install.html#set-up-your-environment", "id": "set-up-your-environment", "content": "<div class=\"section\" id=\"development-installation\">\n<h1>Development Installation<a class=\"headerlink\" href=\"https://docs.readthedocs.io/en/stable/development/install.html#development-installation\" title=\"Permalink to this headline\">¶</a></h1>\n ..." }
- Query Parameters:
(required) (url) – Full URL for the documentation page with optional anchor identifier.
- GET /api/v3/embed/metadata/
Returns all the available metadata for an specific page.
Note
As it’s not trivial to get the
title
associated with a particularid
and it’s not easy to get a nested list of identifiers, we may not implement this endpoint in initial version.The endpoint as-is, is mainly useful to explore/discover what are the identifiers available for a particular page –which is handy in the development process of a new tool that consumes the API. Because of this, we don’t have too much traction to add it in the initial version.
Example request:
$ curl https://readthedocs.org/api/v3/embed/metadata/?url=https://docs.readthedocs.io/en/latest/development/install.html
Example response:
{ "identifiers": { "id": "set-up-your-environment", "url": "https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment" "_links": { "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#set-up-your-environment" } }, { "id": "check-that-everything-works", "url": "https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works" "_links": { "embed": "https://docs.readthedocs.io/_/api/v3/embed/?url=https://docs.readthedocs.io/en/latest/development/install.html#check-that-everything-works" } }, }
- Query Parameters:
(required) (url) – Full URL for the documentation page
Handle specific Sphinx cases
We are currently handling some special cases for Sphinx due how it writes the HTML output structure.
In some cases, we look for the HTML tag with the identifier requested but we return
the .next()
HTML tag or the .parent()
tag instead of the requested one.
Currently, we have identified that this happens for definition tags (dl
, dt
, dd
)
–but may be other cases we don’t know yet.
Sphinx adds the id=
attribute to the dt
tag, which contains only the title of the definition,
but as a user, we are expecting the description of it.
In the following example we will return the whole dl
HTML tag instead of
the HTML tag with the identifier id="term-name"
as requested by the client,
because otherwise the “Term definition for Term Name” content won’t be included and the response would be useless.
<dl class="glossary docutils">
<dt id="term-name">Term Name</dt>
<dd>Term definition for Term Name</dd>
</dl>
If the definition list (dl
) has more than one definition it will return only the term requested.
Considering the following example, with the request ?url=glossary.html#term-name
<dl class="glossary docutils">
...
<dt id="term-name">Term Name</dt>
<dd>Term definition for Term Name</dd>
<dt id="term-unknown">Term Unknown</dt>
<dd>Term definition for Term Unknown </dd>
...
</dl>
It will return the whole dl
with only the dt
and dd
for id
requested:
<dl class="glossary docutils">
<dt id="term-name">Term Name</dt>
<dd>Term definition for Term Name</dd>
</dl>
However, this assumptions may not apply to documentation pages built with a different doctool than Sphinx.
For this reason, we need to communicate to the API that we want to handle this special cases in the backend.
This will be done by appending a request GET argument to the Embed API endpoint: ?doctool=sphinx&version=4.0.1&writer=html4
.
In this case, the backend will known that has to deal with these special cases.
Note
This leaves the door open to be able to support more special cases (e.g. for other doctools) without breaking the actual behavior.
Support for external documents
When the ?url=
argument passed belongs to a documentation page not hosted on Read the Docs,
the endpoint will do an external request to download the HTML file,
parse it and return the content for the identifier requested.
The whole logic should be the same, the only difference would be where the source HTML comes from.
Warning
We should be careful with the URL received from the user because those may be internal URLs and we could be leaking some data.
Example: ?url=http://localhost/some-weird-endpoint
or ?url=http://169.254.169.254/latest/meta-data/
(see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html).
This is related to SSRF (https://en.wikipedia.org/wiki/Server-side_request_forgery). It doesn’t seem to be a huge problem, but something to consider.
Also, the endpoint may need to limit the requests per-external domain to avoid using our servers to take down another site.
Note
Due to the potential security issues mentioned, we will start with an allowed list of domains for common Sphinx docs projects.
Projects like Django and Python, where sphinx-hoverxref
users might commonly want to embed from.
We aren’t planning to allow arbitrary HTML from any website.
Handle project’s domain changes
The proposed Embed APIv3 implementation only allows ?url=
argument to embed content from that page.
That URL can be:
a URL for a project hosted under
<project-slug>.readthedocs.io
a URL for a project with a custom domain
In the first case, we can easily get the project’s slug directly from the URL.
However, in the second case we get the project’s slug by querying our database for a Domain
object
with the full domain from the URL.
Now, consider that all the links in the documentation page that uses Embed APIv3 are pointing to
docs.example.com
and the author decides to change the domain to be docs.newdomain.com
.
At this point there are different possible scenarios:
The user creates a new
Domain
object withdocs.newdomain.com
as domain’s name. In this case, old links will keep working because we still have the oldDomain
object in our database and we can use it to get the project’s slug.The user deletes the old
Domain
besides creating the new one. In this scenario, our query for aDomain
with namedocs.example.com
to our database will fail. We will need to do a request todocs.example.com
and check for a 3xx response status code and in that case, we can read theLocation:
HTTP header to find the new domain’s name for the documentation. Once we have the new domain from the redirect response, we can query our database again to find out the project’s slug.Note
We will follow up to 5 redirects to find out the project’s domain.
Embed APIv2 deprecation
The v2 is currently widely used by projects using the sphinx-hoverxref
extension.
Because of that, we need to keep supporting it as-is for a long time.
Next steps on this direction should be:
Add a note in the documentation mentioning this endpoint is deprecated
Promote the usage of the new Embed APIv3
Migrate the
sphinx-hoverxref
extension to use the new endpoint
Once we have done them, we could check our NGINX logs to find out if there are people still using APIv2, contact them and let them know that they have some months to migrate since the endpoint is deprecated and will be removed.
Unanswered questions
How do we distinguish between our APIv3 for resources (models in the database) from these “feature API endpoints”?
Future builder
This document is a continuation of Santos’ work about “Explicit Builders”. It builds on top of that document some extra features and makes some decisions about the final goal, proposing a clear direction to move forward with intermediate steps keeping backward and forward compatibility.
Note
A lot of things have changed since this document was written.
We have had multiple discussions where we already took some decisions and discarded some of the ideas/details proposed here.
The document was merged as-is without a cleaned up and there could be some inconsistencies.
Note that build.jobs
and build.commands
are already implemented without definig a contract yet,
and with small differences from the idea described here.
Please, refer to the following links to read more about all the discussions we already had:
Public discussions:
Private discussions:
Goals
Keep the current builder working as-is
Keep backward and forward (with intermediate steps) compatibility
Define a clear support for newbie, intermediate and advanced users
Allow users to override a command, run pre/post hook commands or define all commands by themselves
Remove the Read the Docs requirement of having access to the build process
Translate our current magic at build time to a defined contract with the user
Provide a way to add a command argument without implementing it as a config file (e.g.
fail_on_warning
)Define a path forward towards supporting other tools
Re-write all
readthedocs-sphinx-ext
features to post-processsing HTML featuresReduce complexity maintained by Read the Docs’ core team
Make Read the Docs responsible for Sphinx support and delegate other tools to the community
Eventually support upload pre-build docs
Allow us to add a feature with a defined contract without worry about breaking old builds
Introduce
build.builder: 2
config (does not install pre-defined packages) for these new featuresMotivate users to migrate to
v2
to finally deprecate this magic by educating users
Steps ran by the builder
Read the Docs currently controls all the build process.
Users are only allowed to modify very limited behavior by using a .readthedocs.yaml
file.
This drove us to implement features like sphinx.fail_on_warning
, submodules
, among others,
at a high implementation and maintenance cost to the core team.
Besides, this hasn’t been enough for more advanced users that require more control over these commands.
This document proposes to clearly define the steps the builder ran and allow users to override them depending on their needings:
Newbie user / simple platform usage: Read the Docs controls all the commands (current builder)
Intermediate user: ability to override one or more commands plus running pre/post hooks
Advanced user: controls all the commands executed by the builder
The steps identified so far are:
Checkout
Expose project data via environment variables (*)
Create environment (virtualenv / conda)
Install dependencies
Build documentation
Generate defined contract (
metadata.yaml
)Post-process HTML (*)
Upload to storage (*)
Steps marked with (*) are managed by Read the Docs and can’t be overwritten.
Defined contract
Projects building on Read the Docs must provide a metadata.yaml
file after running their last command.
This file contains all the data required by Read the Docs to be able to add its integrations.
If this file is not provided or malformed, Read the Docs will fail the build and stop the process
communicating to the user that there was a problem with the metadata.yaml
and we require them to fix the problem.
Note
There is no restriction about how this file is generated (e.g. generated with Python, Bash, statically uploaded to the repository, etc) Read the Docs does not have control over it and it’s only responsible for generating it when building with Sphinx.
The following is an example of a metadata.yaml
that is generated by Read the Docs when building Sphinx documentation:
# metadata.yaml
version: 1
tool:
name: sphinx
version: 3.5.1
builder: html
readthedocs:
html_output: ./_build/html/
pdf_output: ./_build/pdf/myproject.pdf
epub_output: ./_build/pdf/myproject.epub
search:
enabled: true
css_identifier: #search-form > input[name="q"]
analytics: false
flyout: false
canonical: docs.myproject.com
language: en
Warning
The metadata.yaml
contract is not defined yet.
This is just an example of what we could expect from it to be able to add our integrations.
Config file
As we mentioned, we want all users to use the same config file and have a clear way to override commands as they need.
This will be done by using the current .readthedocs.yaml
file that we already have by adding two new keys:
build.jobs
and build.commands
.
If neither build.jobs
or build.commands
are present in the config file,
Read the Docs will execute the builder we currently support without modification,
keeping compatibility with all projects already building successfully.
When users make usage of jobs:
or commands:
keys we are not responsible for them in case they fail.
In these cases, we only check for a metadata.yaml
file and run our code to add the integrations.
build.jobs
It allows users to execute one or multiple pre/post hooks and/or overwrite one or multiple commands. These are some examples where this is useful:
User wants to pass an extra argument to
sphinx-build
Project requires to execute a command before building
User has a personal/private PyPI URL
Install project with
pip install -e
(see https://github.com/readthedocs/readthedocs.org/issues/6243)Disable git shallow clone (see https://github.com/readthedocs/readthedocs.org/issues/5989)
Call
pip install
with--constraint
(see https://github.com/readthedocs/readthedocs.org/issues/7258)Do something _before_ install (see https://github.com/readthedocs/readthedocs.org/issues/6662)
Use a conda lock file to create the environment (see https://github.com/readthedocs/readthedocs.org/issues/7772)
Run a check after the build is done (e.g.
sphinx-build -W -b linkcheck . _build/html
)Create virtualenv with
--system-site-packages
etc
# .readthedocs.yaml
build:
builder: 2
jobs:
pre_checkout:
checkout: git clone --branch main https://github.com/readthedocs/readthedocs.org
post_checkout:
pre_create_environment:
create_environment: python -m virtualenv venv
post_create_environment:
pre_install:
install: pip install -r requirements.txt
post_install:
pre_build:
build:
html: sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
pdf: latexmk -r latexmkrc -pdf -f -dvi- -ps- -jobname=test-builds -interaction=nonstopmode
epub: sphinx -T -j auto -b epub -d _build/doctrees -D language=en . _build/epub
post_build:
pre_metadata:
metadata: ./metadata_sphinx.py
post_medatada:
Note
All these commands are executed passing all the exposed environment variables.
If the user only provides a subset of these jobs, we ran our default commands if the user does not provide them (see Steps ran by the builder). For example, the following YAML is enough when the project requires running Doxygen as a pre-build step:
# .readthedocs.yaml
build:
builder: 2
jobs:
# https://breathe.readthedocs.io/en/latest/readthedocs.html#generating-doxygen-xml-files
pre_build: cd ../doxygen; doxygen
build.commands
It allows users to have full control over the commands executed in the build process. These are some examples where this is useful:
project with a custom build process that does map ours
specific requirements that we can’t/want to cover as a general rule
build documentation with a different tool than Sphinx
# .readthedocs.yaml
build:
builder: 2
commands:
- git clone --branch main https://github.com/readthedocs/readthedocs.org
- pip install -r requirements.txt
- sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
- ./metadata.py
Intermediate steps for rollout
Remove all the exposed data in the
conf.py.tmpl
file and move it tometadata.yaml
Define structure required for
metadata.yaml
as contractDefine the environment variables required (e.g. some from
html_context
) and execute all commands with themBuild documentation using this contract
Leave
readthedocs-sphinx-ext
as the only package installed and extension install inconf.py.tmpl
Add
build.builder: 2
config without any magicBuild everything needed to support
build.jobs
andbuild.commands
keysWrite guides about how to use the new keys
Re-write
readthedocs-sphinx-ext
features to post-process HTML features
Final notes
The migration path from
v1
tov2
will require users to explicitly specify their requirements (we don’t install pre-defined packages anymore)We probably not want to support
build.jobs
onv1
to reduce core team’s time maintaining that code without the ability to update it due to projects randomly breaking.We would be able to start building documentation using new tools without having to integrate them.
Building on Read the Docs with a new tool will require: - the user to execute a different set of commands by overriding the defaults. - the project/build/user to expose a
metadata.yaml
with the contract that Read the Docs expects. - none, some or all the integrations will be added to the HTML output (these have to be implemented at Read the Docs core)We are not responsible for extra formats (e.g. PDF, ePub, etc) on other tools.
Focus on support Sphinx with nice integrations made in a tool-agnostic way that can be re-used.
Removing the manipulation of
conf.py.tmpl
does not require us to implement the same manipulation for projects using the new potential featuresphinx.yaml
file.
In-doc search UI
Giving readers the ability to easily search the information
that they are looking for is important for us.
We have already upgraded to the latest version of Elasticsearch and
we plan to implement search as you type
feature for all the documentations hosted by us.
It will be designed to provide instant results as soon as the user starts
typing in the search bar with a clean and minimal frontend.
This design document aims to provides the details of it.
This is a GSoC’19 project.
Warning
This design document details future features that are not yet implemented. To discuss this document, please get in touch in the issue tracker.
The final result may look something like this:

Short demo
Goals And non-Goals
Project goals
Support a search-as-you-type/autocomplete interface.
Support across all (or virtually all) Sphinx themes.
Support for the JavaScript user experience down to IE11 or graceful degradation where we can’t support it.
Project maintainers should have a way to opt-in/opt-out of this feature.
(Optional) Project maintainers should have the flexibility to change some of the styles using custom CSS and JS files.
Non-goals
For the initial release, we are targeting only Sphinx documentations as we don’t index MkDocs documentations to our Elasticsearch index.
Existing search implementation
We have a detailed documentation explaining the underlying architecture of our search backend and how we index documents to our Elasticsearch index. You can read about it here.
Proposed architecture for in-doc search UI
Frontend
Technologies
Frontend is to designed in a theme agnostics way. For that, we explored various libraries which may be of use but none of them fits our needs. So, we might be using vanilla JavaScript for this purpose. This will provide us some advantages over using any third party library:
Better control over the DOM.
Performance benefits.
Proposed architecture
We plan to select the search bar, which is present in every theme,
and use the querySelector() method of JavaScript.
Then add an event listener to it to listen for the changes and
fire a search query to our backend as soon as there is any change.
Our backend will then return the suggestions,
which will be shown to the user in a clean and minimal UI.
We will be using document.createElement() and node.removeChild() method
provided by JavaScript as we don’t want empty <div>
hanging out in the DOM.
We have a few ways to include the required JavaScript and CSS files in all the projects:
Add CSS into
readthedocs-doc-embed.css
and JS intoreadthedocs-doc-embed.js
and it will get included.Package the in-doc search into it’s own self-contained CSS and JS files and include them in a similar manner to
readthedocs-doc-embed.*
.It might be possible to package up the in-doc CSS/JS as a sphinx extension. This might be nice because then it’s easy to enable it on a per-project basis. When we are ready to roll it out to a wider audience, we can make a decision to just turn it on for everybody (put it in here) or we could enable it as an opt-in feature like the 404 extension.
UI/UX
We have two ways which can be used to show suggestions to the user.
Show suggestions below the search bar.
Open a full page search interface when the user click on search field.
Backend
We have a few options to support search as you type
feature,
but we need to decide that which option would be best for our use-case.
Edge NGram Tokenizer
Pros
More effective than Completion Suggester when it comes to autocompleting words that can appear in any order.
It is considerable fast because most of the work is being done at index time, hence the time taken for autocompletion is reduced.
Supports highlighting of the matching terms.
Cons
Requires greater disk space.
Completion suggester
Pros
Really fast as it is optimized for speed.
Does not require large disk space.
Cons
Matching always starts at the beginning of the text. So, for example, “Hel” will match “Hello, World” but not “World Hello”.
Highlighting of the matching words is not supported.
According to the official docs for Completion Suggester, fast lookups are costly to build and are stored in-memory.
Milestones
Milestone |
Due Date |
---|---|
A local implementation of the project. |
12th June, 2019 |
In-doc search on a test project hosted on Read the Docs using the RTD Search API. |
20th June, 2019 |
In-doc search on docs.readthedocs.io. |
20th June, 2019 |
Friendly user trial where users can add this on their own docs. |
5th July, 2019 |
Additional UX testing on the top-10 Sphinx themes. |
15th July, 2019 |
Finalize the UI. |
25th July, 2019 |
Improve the search backend for efficient and fast search results. |
10th August, 2019 |
Open questions
Should we rely on jQuery, any third party library or pure vanilla JavaScript?
Are the subprojects to be searched?
Is our existing Search API is sufficient?
Should we go for edge ngrams or completion suggester?
Notification system: a new approach after lot of discussions
Notifications have been a recurrent topic in the last years. We have talked about different problems and solution’s approaches during these years. However, due to the complexity of the change, and without having a clear path, it has been hard to prioritize.
We’ve written a lot about the problems and potential solutions for the current notification system. This is a non-complete list of them just for reference:
At the offsite in Portland, Anthony and myself were able to talk deeply about this and wrote a bunch of thoughts in a Google Doc. We had pretty similar ideas and we thought we were solving most of the problems we identified already.
I read all of these issues and all the discussions I found and wrote this document that summarizes my proposal: create a new notification system that we can customize and expand as we need in the future:
A Django model to store the notifications’ data
API endpoints to retrieve the notifications for a particular resource (User, Build, Project, Organization)
Frontend code to display them (outside the scope of this document)
Goals
Keep raising exceptions for errors from the build process
Ability to add non-error notifications from the build process
Add extra metadata associated to the notification: icon, header, body, etc
Support different types of notifications (e.g. error, warning, note, tip)
Re-use the new notification system for product updates (e.g. new features, deprecated config keys)
Message content lives on Python classes that can be translated and formatted with objects (e.g. Build, Project)
Message could have richer content (e.g. HTML code) to generate links and emphasis
Notifications have trackable state (e.g. unread (default)=never shown, read=shown, dismissed=don’t show again, cancelled=auto-removed after user action)
An object (e.g. Build, Organization) can have more than 1 notification attached
Remove hardcoded notifications from the templates
Notifications can be attached to Project, Organization, Build and User models
Specific notifications can be shown under the user’s bell icon
Easy way to cleanup notification on status changes (e.g. subscription failure notification is auto-deleted after CC updated)
Notifications attached to Organization/Project dissappear for all the users once they are dismissed by anyone
Non-goals
Create new Build “state” or “status” option for these fields
Implement the new notification in the old dashboard
Define front-end code implementation
Replace email or webhook notifications
Small notes and other considerations
Django message system is not enough for this purpose.
Use a new model to store all the required data (expandable in the future)
How do we handle translations? We should use
_("This is the message shown to the user")
in Python code and return the proper translation when they are read.Reduce complexity on
Build
object (removeBuild.status
andBuild.error
fields among others).Since the
Build
object could have more than 1 notification, when showing them, we will sort them by importane: errors, warnings, note, tip.In case we need a pretty specific order, we can add an extra field for that, but it adds unnecessary complexity at this point.
For those notifications that are attached to the
Project
orOrganization
, should it be shown to all the members even if they don’t have admin permissions? If yes, this is good because all of them will be notified but only some of them will be able to take an action. If no, non-admin users won’t see the notification and won’t be able to communicate this to the adminsNotification could be attached to a
BuildCommand
in case we want to display a specific message on a command itself. We don’t know how useful this will be, but it’s something we can consider in the future.Notification preferences: what kind of notifications I want to see in my own bell icon?
Build errors
Build tips
Product updates
Blog post news
Organization updates
Project updates
Implementation ideas
This section shows all the classes and models involved for the notification system as well as some already known use-cases.
Note
Accessing the database from the build process
Builders doesn’t have access to the database due to security reasons.
We had solved this limitation by creating an API endpoint the builder hits once they need to interact with the databse to get a Project
, Version
and Build
resources, create a BuildCommand
resource, etc.
Besides, the build process is capable to trigger Celery tasks that are useful for managing more complex logic that also require accessing from and writing to the database.
Currently, readthedocs.doc_builder.director.Director
and
readthedocs.doc_builder.environments.DockerBuildEnvironment
have access to the API client and can use it to create the Notification
resources.
I plan to use the same pattern to create Notification
resources by hitting the API from the director or the build environment.
In case we require hitting the API from other places, we will need to pass the API client instance to those other classes as well.
Message
class definition
This class encapsulates the content of the notification (e.g. header, body, icon, etc) –the message it’s shown to the uer–, and some helper logic to return in the API response.
class Message:
def __init__(self):
header = str
body = str
icon = str
icon_style = str(SOLID, DUOTONE)
type = str(ERROR, WARINIG, NOTE, TIP)
def get_display_icon(self):
if self.icon:
return self.icon
if self.type == ERROR:
return "fa-exclamation"
if self.type == WARNING:
return "fa-triangle-exclamation"
Definition of notifications to display to users
This constant defines all the possible notifications to be displayed to the user.
Each notification has to be defined here using the Message
class previously defined.
NOTIFICATION_MESSAGES = {
"generic-with-build-id": Message(
header=_("Unknown problem"),
# Note the message receives the instance it's attached to
# and could be use it to inject related data
body=_(
"""
There was a problem with Read the Docs while building your documentation.
Please try again later.
If this problem persists,
report this error to us with your build id ({instance[pk]}).
""",
type=ERROR,
),
),
"build-os-required": Message(
header=_("Invalid configuration"),
body=_(
"""
The configuration key "build.os" is required to build your documentation.
<a href='https://docs.readthedocs.io/en/stable/config-file/v2.html#build-os'>Read more.</a>
""",
type=ERROR,
),
),
"cancelled-by-user": Message(
header=_("User action"),
body=_(
"""
Build cancelled by the user.
""",
type=ERROR,
),
),
"os-ubuntu-18.04-deprecated": Message(
header=_("Deprecated OS selected"),
body=_(
"""
Ubuntu 18.04 is deprecated and will be removed soon.
Update your <code>.readthedocs.yaml</code> to use a newer image.
""",
type=TIP,
),
),
}
Notification
model definition
This class is the representation of a notification attached to an resource (e.g. User, Build, etc) in the database.
It contains an identifier (message_id
) pointing to one of the messages defined in the previous section (key in constant NOTIFICATION_MESSAGES
).
import textwrap
from django.utils.translation import gettext_noop as _
class Notification(TimeStampedModel):
# Message identifier
message_id = models.CharField(max_length=128)
# UNREAD: the notification was not shown to the user
# READ: the notifiation was shown
# DISMISSED: the notification was shown and the user dismissed it
# CANCELLED: removed automatically because the user has done the action required (e.g. paid the subscription)
state = models.CharField(
choices=[UNREAD, READ, DISMISSED, CANCELLED],
default=UNREAD,
db_index=True,
)
# Makes the notification imposible to dismiss (useful for Build notifications)
dismissable = models.BooleanField(default=False)
# Show the notification under the bell icon for the user
news = models.BooleanField(default=False, help_text="Show under bell icon")
# Notification attached to
#
# Uses ContentType for this.
# https://docs.djangoproject.com/en/4.2/ref/contrib/contenttypes/#generic-relations
#
attached_to_content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
attached_to_id = models.PositiveIntegerField()
attached_to = GenericForeignKey("attached_to_content_type", "attached_to_id")
# If we don't want to use ContentType, we could define all the potential models
# the notification could be attached to
#
# organization = models.ForeignKey(Organization, null=True, blank=True, default=None)
# project = models.ForeignKey(Project, null=True, blank=True, default=None)
# build = models.ForeignKey(Build, null=True, blank=True, default=None)
# user = models.ForeignKey(User, null=True, blank=True, default=None)
def get_display_message(self):
return textwrap.dedent(
NOTIFICATION_MESSAGES.get(self.message_id).format(
instance=self.attached_to, # Build, Project, Organization, User
)
)
Attach error Notification
during the build process
During the build, we will keep raising exceptions to both things:
stop the build process immediately
communicate back to the
doc_builder.director.Director
class the build failed.
The director is the one in charge of creating the error Notification
,
in a similar way it currently works now.
The only difference is that instead of saving the error under Build.error
as it currently works now,
it will create a Notification
object and attach it to the particular Build
.
Note the director does not have access to the DB, so it will need to create/associate the object via an API endpoint/Celery task.
Example of how the exception BuildCancelled
creates an error Notification
:
class UpdateDocsTask(...):
def on_failure(self):
self.data.api_client.build(self.data.build["id"]).notifications.post(
{
"message_id": "cancelled-by-user",
# Override default fields if required
"type": WARNING,
}
)
Attach non-error Notification
during the build process
During the build, we will be able attach non-error notifications with the following pattern:
check something in particular (e.g. using a deprecated key in
readthedocs.yaml
)create a non-error
Notification
and attach it to the particularBuild
object
class DockerBuildEnvironment(...):
def check_deprecated_os_image(self):
if self.config.build.os == "ubuntu-18.04":
self.api_client.build(self.data.build["id"]).notifications.post(
{
"message_id": "os-ubuntu-18.04-deprecated",
}
)
Show a Notification
under the user’s bell icon
If we want to show a notification on a user’s profile, we can create the notification as follows, maybe from a simple script ran in the Django shell’s console after publishing a blog post:
users_to_show_notification = User.objects.filter(...)
for user in users_to_show_notification:
Notification.objects.create(
message_id="blog-post-beta-addons",
dismissable=True,
news=True,
attached_to=User,
attached_to_id=user.id,
)
Remove notification on status change
When we show a notification for an unpaid subscription, we want to remove it once the user has updated and paid the subscription. We can do this with the following code:
@handler("customer.subscription.updated", "customer.subscription.deleted")
def subscription_updated_event(event):
if subscription.status == ACTIVE:
organization = Organization.objects.get(slug="read-the-docs")
Notification.objects.filter(
message_id="subscription-update-your-cc-details",
state__in=[UNREAD, READ],
attached_to=Organization,
attached_to_id=organization.id,
).update(state=CANCELLED)
API definition
I will follows the same pattern we have on APIv3 that uses nested endpoints.
This means that we will add a /notifications/
postfix to most of the resource endpoints
where we want to be able to attach/list notifications.
Notifications list
- GET /api/v3/users/(str: user_username)/notifications/
Retrieve a list of all the notifications for this user.
- GET /api/v3/projects/(str: project_slug)/notifications/
Retrieve a list of all the notifications for this project.
- GET /api/v3/organizations/(str: organization_slug)/notifications/
Retrieve a list of all the notifications for this organization.
- GET /api/v3/projects/(str: project_slug)/builds/(int: build_id)/notifications/
Retrieve a list of all the notifications for this build.
Example response:
{ "count": 25, "next": "/api/v3/projects/pip/builds/12345/notifications/?unread=true&sort=type&limit=10&offset=10", "previous": null, "results": [ { "message_id": "cancelled-by-user", "state": "unread", "dismissable": false, "news": false, "attached_to": "build", "message": { "header": "User action", "body": "Build cancelled by the user.", "type": "error", "icon": "fa-exclamation", "icon_style": "duotone", } } ] }
- Query Parameters:
unread (boolean) – return only unread notifications
type (string) – filter notifications by type (
error
,note
,tip
)sort (string) – sort the notifications (
type
,date
(default))
Notification create
- POST /api/v3/projects/(str: project_slug)/builds/(int: build_id)/notifications/
Create a notification for the resource. In this example, for a
Build
resource.Example request:
{ "message_id": "cancelled-by-user", "type": "error", "state": "unread", "dismissable": false, "news": false, }
Note
Similar API endpoints will be created for each of the resources
we want to attach a Notification
(e.g. User
, Organization
, etc)
Notification update
- PATCH /api/v3/projects/(str: project_slug)/builds/(int: build_id)/notifications/(int: notification_id)/
Update an existing notification. Mainly used to change the state from the front-end.
Example request:
{ "state": "read", }
Note
Similar API endpoints will be created for each of the resources
we want to attach a Notification
(e.g. User
, Organization
, etc)
Backward compatibility
It’s not strickly required, but if we want, we could extract the current notification logic from:
Django templates
“Don’t want
setup.py
called?”build.image
config key is deprecatedConfiguration file is required
build.commands
is a beta feature
Build.error
fieldsBuild cancelled by user
Unknown exception
build.os
is not foundNo config file
No checkout revision
Failed when cloning the repository
etc
and iterate over all the Build
objects to create a Notification
object for each of them.
I’m not planning to implement the “new notification system” in the old templates. It doesn’t make sense to spend time in them since we are deprecating them.
Old builds will keep using the current notification approach based on build.error
field.
New builds won’t have build.error
anymore and they will use the new notification system on ext-theme.
New search API
Goals
Allow to configure search at the API level, instead of having the options in the database.
Allow to search a group of projects/versions at the same time.
Bring the same syntax to the dashboard search.
Syntax
The parameters will be given in the query using the key:value
syntax.
Inspired by GitHub and other services.
Currently the values from all parameters don’t include spaces,
so surrounding the value with quotes won’t be supported (key:"value"
).
To avoid interpreting a query as a parameter,
an escape character can be put in place,
for example project\:docs
won’t be interpreted as
a parameter, but as the search term project:docs
.
This is only necessary if the query includes a valid parameter,
unknown parameters (foo:bar
) don’t require escaping.
All other tokens that don’t match a valid parameter, will be join to form the final search term.
Parameters
- project:
Indicates the project and version to includes results from (this doesn’t include subprojects). If the version isn’t provided, the default version is used.
Examples:
project:docs/latest
project:docs
It can be one or more project parameters. At least one is required.
If the user doesn’t have permission over one version or if the version doesn’t exist, we don’t include results from that version. We don’t fail the search, this is so users can use one endpoint for all their users, without worrying about what permissions each user has or updating it after a version or project has been deleted.
The
/
is used as separator, but it could be any other character that isn’t present in the slug of a version or project.:
was considered (project:docs:latest
), but it could be hard to read since:
is already used to separate the key from the value.- subprojects:
This allows to specify from what project exactly we are going to return subprojects from, and also include the version we are going to try to match. This includes the parent project in the results.
As the
project
parameter, the version can be optional, and defaults to the default version of the parent project.- user:
Include results from projects the given user has access to. The only supported value is
@me
, which is an alias for the current user.
Including subprojects
Now that we are returning results only from the given projects, we need an easy way to include results from subprojects. Some ideas for implementing this feature are:
include-subprojects:true
This doesn’t make it clear from what projects we are going to include subprojects from. We could make it so it returns subprojects for all projects. Users will probably use this with one project only.
subprojects:project/version
(inclusive)This allows to specify from what project exactly we are going to return subprojects from, and also include the version we are going to try to match. This includes the parent project in the results.
As the
project
parameter, the version can be optional, and defaults to the default version of the parent project.subprojects:project/version
(exclusive)This is the same as the above, but it doesn’t include the parent project in the results. If we want to include the results from the project, then the query will be
project:project/latest subprojects:project/latest
. Is this useful?
The second option was chosen, since that’s the current behavior of our search when searching on a project with subprojects, and avoids having to repeat the project if the user wants to include it in the search too.
Cache
Since the request could be attached to more than one project.
We will return all the list of projects for the cache tags,
this is project1, project1:version, project2, project2:version
.
CORS
Since the request could be attached to more than one project. we can’t make the decision if we should enable CORS or not on a given request from the middleware easily, so we won’t allow cross site requests when using the new API for now. We would need to refactor our CORS code, so every view can decide if CORS should be allowed or not, for this case, cross site requests will be allowed only if all versions of the final search are public, another alternative could be to always allow cross site requests, but when a request is cross site, we only return results from public versions.
Analytics
We will record the same query for each project that was used in the final search.
Response
The response will be similar to the old one, but will include extra information about the search, like the projects, versions, and the query that were used in the final search.
And the version
, project
, and project_alias
attributes will
now be objects.
We could just re-use the old response too, since the only breaking changes would be the attributes now being objects, and we aren’t adding any new information to those objects (yet). But also, re-using the current serializers shouldn’t be a problem either.
{
"count": 1,
"next": null,
"previous": null,
"projects": [
{
"slug": "docs",
"versions": [
{
"slug": "latest"
}
]
}
],
"query": "The final query used in the search",
"results": [
{
"type": "page",
"project": {
"slug": "docs",
"alias": null
},
"version": {
"slug": "latest"
},
"title": "Main Features",
"path": "/en/latest/features.html",
"domain": "https://docs.readthedocs.io",
"highlights": {
"title": []
},
"blocks": [
{
"type": "section",
"id": "full-text-search",
"title": "Full-Text Search",
"content": "We provide search across all the projects that we host. This actually comes in two different search experiences: dashboard search on the Read the Docs dashboard and in-doc search on documentation sites, using your own theme and our search results. We offer a number of search features: Search across subprojects Search results land on the exact content you were looking for Search across projects you have access to (available on Read the Docs for Business) A full range of search operators including exact matching and excluding phrases. Learn more about Server Side Search.",
"highlights": {
"title": [
"Full-<span>Text</span> Search"
],
"content": []
}
},
{
"type": "domain",
"role": "http:post",
"name": "/api/v3/projects/",
"id": "post--api-v3-projects-",
"content": "Import a project under authenticated user. Example request: BashPython$ curl \\ -X POST \\ -H \"Authorization: Token <token>\" https://readthedocs.org/api/v3/projects/ \\ -H \"Content-Type: application/json\" \\ -d @body.json import requests import json URL = 'https://readthedocs.org/api/v3/projects/' TOKEN = '<token>' HEADERS = {'Authorization': f'token {TOKEN}'} data = json.load(open('body.json', 'rb')) response = requests.post( URL, json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like, { \"name\": \"Test Project\", \"repository\": { \"url\": \"https://github.com/readthedocs/template\", \"type\": \"git\" }, \"homepage\": \"http://template.readthedocs.io/\", \"programming_language\": \"py\", \"language\": \"es\" } Example response: See Project details Note Read the Docs for Business, also accepts",
"highlights": {
"name": [],
"content": [
", json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like, "name": "<span>Test</span>"
]
}
}
]
}
]
}
Examples
project:docs project:dev/latest test
: search fortest
in the default version of thedocs
project, and in the latest version of thedev
project.a project:docs/stable search term
: search fora search term
in the stable version of thedocs
project.project:docs project\:project/version
: search forproject::project/version
in the default version of thedocs
project.search
: invalid, at least one project is required.
Dashboard search
This is the search feature that you can access from the readthedocs.org/readthedocs.com domains.
We have two types:
- Project scoped search:
Search files and versions of the curent project only.
- Global search:
Search files and versions of all projects in .org, and only the projects the user has access to in .com.
Global search also allows to search projects by name/description.
This search also allows you to see the number of results from other projects/versions/sphinx domains (facets).
Project scoped search
Here the new syntax won’t have effect, since we are searching for the files of one project only!
Another approach could be linking to the global search
with project:{project.slug}
filled in the query.
Global search (projects)
We can keep the project search as is, without using the new syntax (since it doesn’t make sense there).
Global search (files)
Using the same syntax from the API will be allowed, by default it will search all projects in .org, and all projects the user has access to in .com.
Another approach could be to allow
filtering by user on .org, this is user:stsewd
or user:@me
so a user can search all their projects easily.
We could allow just @me
to start.
Facets
We will support only the projects
facet to start.
We can keep the facets, but they would be a little different,
since with the new syntax we need to specify a project in order to search for
a version, i.e, we can’t search all latest
versions of all projects.
By default we will use/show the project
facet,
and after the user has filtered by a project,
we will use/show the version
facet.
If the user searches more than one project,
things get complicated, should we keep showing the version
facet?
If clicked, should we change the version on all the projects?
If that is too complicated to explain/implement,
we should be fine by just supporting the project
facet for now.
Backwards compatibility
We should be able to keep the old URLs working in the global search,
but we could also just ignore the old syntax, or transform
the old syntax to the new one and redirect the user to it,
for example ?q=test&project=docs&version=latest
would be transformed to ?q=test project:docs/latest
.
Future features
Allow searching on several versions of the same project (the API response is prepared to support this).
Allow searching on all versions of a project easily, with a syntax like
project:docs/*
orproject:docs/@all
.Allow specify the type of search:
Multi match (query as is)
Simple query string (allows using the ES query syntax)
Fuzzy search (same as multi match, but with with fuzziness)
Add the
org
filter, so users can search by all projects that belong to an organization. We would show results of the default versions of each project.
Proposed contents for new Sphinx guides
Note
This work is in progress, see discussion on this Sphinx issue and the pull requests linked at the end.
The two main objectives are:
Contributing a good Sphinx tutorial for beginners. This should introduce the readers to all the various Sphinx major features in a pedagogical way, and be mostly focused on Markdown using MyST. We would try to find a place for it in the official Sphinx documentation.
Write a new narrative tutorial for Read the Docs that complements the existing guides and offers a cohesive story of how to use the service.
Sphinx tutorial
Appendixes are optional, i.e. not required to follow the tutorial, but highly recommended.
The Sphinx way
Preliminary section giving an overview of what Sphinx is, how it works, how reStructuredText and Markdown/MyST are related to it, some terminology (toctree, builders), what can be done with it.
About this tutorial
A section explaining the approach of the tutorial, as well as how to download the result of each section for closer inspection or for skipping parts of it.
Getting started
Creating our project
Present a fictitious goal for a documentation project
Create a blank
README.md
to introduce the most basic elements of Markdown (headings and paragraph text)
Installing Sphinx and cookiecutter in a new development environment
Install Python (or miniforge)
Create a virtual environment (and/or conda environment)
Activate our virtual environment (it will always be the first step)
Install Sphinx inside the virtual environment
Check that
sphinx-build --help
works (yay!)
Creating the documentation layout
Apply our cookiecutter to create a minimal
docs/
directory (similar to whatsphinx-quickstart
does, but with source and build separation by default, project release 0.1, English language, and a MyST index, if at all) [1]Check that the correct files are created (yay!)
Appendix: Using version control
Install git (we will not use it during the tutorial)
Add a proper
.gitignore
file (copied from gitignore.io)Create the first commit for the project (yay!)
First steps to document our project using Sphinx
Converting our documentation to local HTML
Create (or minimally tweak)
index.md
Build the HTML output using
sphinx-build -b -W html doc doc/_build/html
[2]Navigate to
doc/_build/html
and launch an HTTP server (python -m http.server
)Open http://localhost:8000 in a web browser, and see the HTML documentation (yay!)
Converting our documentation to other formats
Build PseudoXML using
make pseudoxml
Build Text using
make text
See how the various formats change the output (yay!)
Appendix: Simplify documentation building by using Make [3]
Install Make (nothing is needed on Windows,
make.bat
is standalone)Add more content to
index.md
Build HTML doing
cd doc && make html
Observe that the HTML docs have changed (yay!)
Appendix: PDF without LaTeX using rinoh (beta)
Customizing Sphinx configuration
Changing the HTML theme
Install https://pypi.org/project/furo/
Change the
html_theme
inconf.py
Rebuild the HTML documentation and observe that the theme has changed (yay!)
Changing the PDF appearance
Add a
latex_theme
and set it tohowto
Rebuild
make latexpdf
Check that the appearance changed (yay!)
Enable an extension
Add a string to the
extensions
list inconf.py
forsphinx.ext.duration
Rebuild the HTML docs
make html
and notice that now the times are printed (yay!)
Writing narrative documentation with Sphinx
First focus on
index.md
, diving more into Markdown and mentioning Semantic Line Breaks.Then add another
.md
file to teach howtoctree
works.Then continue introducing elements of the syntax to add pictures, cross-references, and the like.
Describing code in Sphinx
Explain the Python domain as part of narrative documentation to interleave code with text, include doctests, and justify the usefulness of the next section.
Autogenerating documentation from code in Sphinx
Deploying a Sphinx project online
A bit of background on the options: GitHub/GitLab Pages, custom server, Netlify, Read the Docs
Make reference to Read the Docs tutorial
Appendix: Using Jupyter notebooks inside Sphinx
Appendix: Understanding the docutils document tree
Appendix: Where to go from here
Refer the user to the Sphinx, reST and MyST references, prominent projects already using Sphinx, compilations of themes and extensions, the development documentation.
Read the Docs tutorial
The Read the Docs way
Getting started
Preparing our project on GitHub
Fork a starter GitHub repository (something like our demo template, as a starting point that helps mimicking the
sphinx-quickstart
orcookiecutter
step without having to checkout the code locally)
Importing our project to Read the Docs
Sign up with GitHub on RTD
Import the project (don’t “Edit advanced project options”, we will do this later)
The project is created on RTD
Browse “builds”, open the build live logs, wait a couple of minutes, open the docs (yay!)
Basic configuration changes
Add a description, homepage, and tags
Configure your email for build failure notification (until we turn them on by default)
Enable “build pull requests for this project” in the advanced settings
Edit a file from the GitHub UI as part of a new branch, and open a pull request
See the RTD check on the GitHub PR UI, wait a few minutes, open result (yay!)
Customizing the build process
Use
readthedocs.yaml
(rather than the web UI) to customize build formats, change build requirements and Python version, enable fail-on-warnings
Versioning documentation
Explain how to manage versions on RTD: create release branches, activate the corresponding version, browse them in the version selector, selectively build versions
Intermediate topics: hide versions, create Automation Rules
Getting insights from your projects
Move around the project, explore results in Traffic Analytics
Play around with server-side search, explore results in Search Analytics
Managing translations
Where to go from here
Reference our existing guides, prominent projects already using RTD, domain configuration, our support form, our contributing documentation
Possible new how-to Guides
Some ideas for extra guides on specific topics, still for beginners but more problem-oriented documents, covering a wide range of use cases:
How to turn a bunch of Markdown files into a Sphinx project
How to turn a bunch of Jupyter notebooks into a Sphinx project
How to localize an existing Sphinx project
How to customize the appearance of the HTML output of a Sphinx project
How to convert existing reStructuredText documentation to Markdown
How to use Doxygen autogenerated documentation inside a Sphinx project
How to keep a changelog of your project
Reference
All the references should be external: the Sphinx reference, the MyST and reST syntax specs, and so forth.
Organizations
Currently we don’t support organizations in the community site (a way to group different projects), we only support individual accounts.
Several integrations that we support like GitHub and Bitbucket have organizations, where users group their repositories and manage them in groups rather than individually.
Why move organizations in the community site?
We support organizations in the commercial site, having no organizations in the community site makes the code maintenance difficult for Read the Docs developers. Having organizations in the community site will make the differences between both more easy to manage.
Users from the community site can have organizations in external sites from where we import their projects (like GitHub, Gitlab). Currently users have all projects from different organizations in their account. Having not a clear way to group/separate those.
We are going to first move the code, and after that enable the feature on the community site.
How are we going to support organizations?
Currently only users can own projects in the community site. With organizations this is going to change to: Users and organizations can own projects.
With this, the migration process would be straightforward for the community site.
For the commercial site we are only to allow organizations to own projects for now (since the we have only subscriptions per organizations).
What features of organizations are we going to support?
We have the following features in the commercial site that we don’t have on the community site:
Owners
Teams
Permissions
Subscriptions
Owners should be included to represent owners of the current organization.
Teams, this is also handy to manage access to different projects under the same organization.
Permissions, currently we have two type of permissions for teams: admin and read only. Read only permissions doesn’t make sense in the community site since we only support public projects/versions (we do support private versions now, but we are planning to remove those). So, we should only support admin permissions for teams.
Subscriptions, this is only valid for the corporate site, since we don’t charge for use in the community site.
How to migrate current projects
Since we are not replacing the current implementation, we don’t need to migrate current projects from the community site nor from the corporate site.
How to migrate the organizations app
The migration can be split in:
Remove/simplify code from the organizations app on the corporate site.
Isolate/separate models and code that isn’t going to be moved.
Start by moving the models, managers, and figure out how to handle migrations.
Move the rest of the code as needed.
Activate organizations app on the community site.
Integrate the code from the community site to the new code.
UI changes
We should start by removing unused features and dead code from the organizations in the corporate site, and simplify existing code if possible (some of this was already done).
Isolate/separate the models to be moved from the ones that aren’t going to be moved. We should move the models that aren’t going to me moved to another app.
Plan
PlanFeature
Subscription
This app can be named subscriptions.
We can get around the table names and migrations by setting the explicitly the table name to organizations_<model>
,
and doing a fake migration.
Following suggestions in https://stackoverflow.com/questions/48860227/moving-multiple-models-from-one-django-app-to-another,
that way we avoid having any downtime during the migration and any inconvenient caused from renaming the tables manually.
Code related to subscriptions should be moved out from the organizations app.
After that, it should be easier to move the organizations app (or part of it) to the community site (and no changes to table names would be required).
We start by moving the models.
Organization
OrganizationOwner
Team
TeamInvite
TeamMember
Migrations aren’t moved, since all current migrations depend on other models that aren’t going to be moved. In the community site we run an initial migration, for the corporate site we run a fake migration. The migrations left from the commercial site can be removed after that.
For managers and querysets that depend on subscriptions,
we can use our pattern to make overridable classes (inheriting from SettingsOverrideObject
).
Templates, urls, views, forms, notifications, signals, tasks can be moved later
(we just need to make use of the models from the readthedocs.organizations
module).
If we decide to integrate organizations in the community site, we can add/move the UI elements and enable the app.
After the app is moved, we can move more code that depends on organizations to the community site.
Namespace
Currently we use the project’s slug as namespace,
in the commercial site we use the combination of organization.slug
+ project.slug
as namespace,
since in the corporate site we don’t care so much about a unique namespace between all users,
but a unique namespace per organization.
For the community site probably this approach isn’t the best,
since we always serve docs publicly from slug.readthedocs.io
.
And most of the users don’t have a custom domain.
The corporate site will use organization.slug
+ project.slug
as slug,
And the community site will always use project.slug
as slug, even if the project belongs to an organization.
We need to refactor the way we get the namespace to be more easy to manage in both sites.
Future Changes
Changes that aren’t needed immediately after the migration, but that should be done:
UI for organizations in the community site.
Add new endpoints to the API (v3 only).
Make the relationship between the models
Organization
andProject
one to many (currently many to many).
Design of pull request builder
Background
This will focus on automatically building documentation for Pull Requests on Read the Docs projects. This is one of the most requested feature of Read the Docs. This document will serve as a design document for discussing how to implement this features.
Scope
Making Pull Requests work like temporary
Version
Excluding PR Versions from Elasticsearch Indexing
Adding a
PR Builds
Tab in the Project DashboardUpdating the Footer API
Adding Warning Banner to Docs
Serving PR Docs
Excluding PR Versions from Search Engines
Receiving
pull_request
webhook event from GithubFetching data from pull requests
Storing PR Version build Data
Creating PR Versions when a pull request is opened and Triggering a build
Triggering Builds on new commits on a PR
Status reporting to Github
Fetching data from pull requests
We already get Pull request events from Github webhooks.
We can utilize that to fetch data from pull requests.
when a pull_request
event is triggered we can fetch the data of that pull request.
We can fetch the pull request by doing something similar to travis-ci.
ie: git fetch origin +refs/pull/<pr_number>/merge:
Modeling pull requests as a type of version
Pull requests can be Treated as a Type of Temporary Version
.
We might consider adding a VERSION_TYPES
to the Version
model.
If we go with
VERSION_TYPES
we can add something likepull_request
alongside Tag and Branch.
We should add Version
and Build
Model Managers for PR and Regular Versions and Builds.
The proposed names for PR and Regular Version and Build Mangers are external
and internal
.
We can then use Version.internal.all()
to get all regular versions,
Version.external.all()
to get all PR versions.
We can then use Build.internal.all()
to get all regular version builds,
Build.external.all()
to get all PR version builds.
Excluding PR versions from Elasticsearch indexing
We should exclude to PR Versions from being Indexed to Elasticsearch. We need to update the queryset to exclude PR Versions.
Adding a PR builds tab in the project dashboard
We can add a Tab in the project dashboard that will listout the PR Builds of that project.
We can name it PR Builds
.
Creating versions for pull requests
If the Github webhook event is pull_request
and action is opened
,
this means a pull request was opened in the projects repository.
We can create a Version
from the Payload data and trigger a initial build for the version.
A version will be created whenever RTD receives an event like this.
Triggering build for new commits in a pull request
We might want to trigger a new build for the PR version if there is a new commit on the PR.
If the Github webhook event is pull_request
and action is synchronize
,
this means a new commit was added to the pull request.
Status reporting to GitHub
We could send build status reports to Github. We could send if the build was Successful or Failed. We can also send the build URL. By this we could show if the build passed or failed on Github something like travis-ci does.
As we already have the repo:status
scope on our OAuth App,
we can send the status report to Github using the Github Status API.
Sending the status report would be something like this:
- POST /repos/:owner/:repo/statuses/:sha
{
"state": "success",
"target_url": "<pr_build_url>",
"description": "The build succeeded!",
"context": "continuous-documentation/read-the-docs"
}
Storing pull request docs
We need to think about how and where to store data after a PR Version build is finished. We can store the data in a blob storage.
Excluding PR versions from search engines
We should Exclude the PR Versions from Search Engines, because it might cause problems for RTD users. As users might land to a pull request doc but not the original Project Docs. This will cause confusion for the users.
Serving PR docs
We need to think about how we want to serve the PR Docs.
We could serve the PR Docs from another Domain.
We could serve the PR Docs using
<pr_number>
namespace on the same Domain.Using
pr-<pr_number>
as the version slughttps://<project_slug>.readthedocs.io/<language_code>/pr-<pr_number>/
Using
pr
subdomainhttps://pr.<project_slug>.readthedocs.io/<pr_number>/
Privacy levels
This document describes how to handle and unify privacy levels on the community and commercial version of Read the Docs.
Current state
Currently, we have three privacy levels for projects and versions:
Public
Private
Protected (currently hidden)
These levels of privacy aren’t clear and bring confusion to our users. Also, the private level doesn’t makes sense on the community site, since we only support public projects.
Places where we use the privacy levels are:
On serving docs
Footer
Dashboard
Project level privacy
Project level privacy was meant to control the dashboard visibility.
This privacy level brings to confusion when users want to make a version public. We should remove all the project privacy levels.
For the community site the dashboard would be always visible, and for the commercial site, the dashboard would be always hidden.
The project privacy level is also used to serve the 404.html
page,
show robots.txt
, and show sitemap.xml
.
The privacy level from versions should be used instead.
Some other ideas about keeping the privacy level is to dictate the default version level of new versions, but removing all other logic related to this privacy level. This can be (or is going to be) possible with automation rules, so we can just remove the field.
Version level privacy
Version level privacy is mainly used to restrict access to documentation. For public level, everyone can access to the documentation. For private level, only users that are maintainers or that belong to a team with access (for the commercial site) can access to the documentation.
The protected privacy level was meant to hide versions from listings and search. For the community site these versions are treated like public versions, and on the commercial site they are treated like private.
The protected privacy level is currently hidden.
To keep the behavior of hiding versions from listings and search,
a new field should be added to the Version model and forms: hidden
(#5321).
The privacy level (public or private) would be respected to determine access to the documentation.
For the community site, the privacy level would be public and can’t be changed.
The default privacy level of new versions for the commercial site would be private
(this is the DEFAULT_PRIVACY_LEVEL
setting).
Overview
For the community site:
The project’s dashboard is visible to all users.
All versions are always public.
The footer shows links to the project’s dashboard (build, downloads, home) to all users.
Only versions with
hidden = False
are listed on the footer and appear on search results.If a project has a
404.html
file on the default version, it’s served.If a project has a
robots.txt
file on the default version, it’s served.A
sitemap.xml
file is always served.
For the commercial site:
The project’s dashboard is visible to only users that have read permission over the project.
The footer shows links to the project’s dashboard (build, downloads, home) to only admin users.
Only versions with
hidden = False
are listed on the footer and appear on search results.If a project has a
404.html
file on the default version, it’s served if the user has permission over that version.If a project has a
robots.txt
file on the default version, it’s served if the user has permission over that version.A
sitemap.xml
file is served if the user has at least one public version. And it will only list public versions.
Migration
To differentiate between allowing or not privacy levels,
we need to add a setting RTD_ALLOW_PRIVACY_LEVELS
(False
by default).
For the community and commercial site, we need to:
Remove/change code that depends on the project’s privacy level. Use the global setting
RTD_ALLOW_PRIVACY_LEVELS
and default version’s privacy level instead.Display robots.txt
Serve 404.html page
Display sitemap.xml
Querysets
Remove
Project.privacy_level
fieldMigrate all protected versions to have the attribute
hidden = True
(data migration), and set their privacy level to public for the community site and private for the commercial site.Change all querysets used to list versions on the footer and on search to use the
hidden
attribute.Update docs
For the community site:
Hide all privacy level related settings from the version form.
Don’t expose privacy levels on API v3.
Mark all versions as public.
For the commercial site:
Always hide the dashboard
Show links to the dashboard (downloads, builds, project home) on the footer only to admin users.
Upgrade path overview
Community site
The default privacy level for the community site is public for versions and the dashboard is always public.
Public project (community)
Public version: Normal use case, no changes required.
Protected version: Users didn’t want to list this version on the footer, but also not deactivate it. We can do a data migration of those versions to the new
hidden
setting and make them public.Private version: Users didn’t want to show this version to their users yet or they were testing something. This can be solved with the pull request builder feature and the
hidden
setting. We migrate those to public with thehidden
setting. If we are worried about leaking anything from the version, we can email users before doing the change.
Protected project (community)
Protected projects are not listed publicly. Probably users were hosting a WIP project, or personal public project. A public project should work for them, as we are removing listing all projects publicly (except for search).
The migration path for versions of protected projects is the same as a public project.
Private project (community)
Probably these users want to use our enterprise solution instead. Or they were hosting a personal project.
The migration path for versions of private projects is the same as a public project.
If we are worried about leaking anything from the dashboard or build page, we can email users before doing the change.
Commercial site
The default privacy level for the commercial site is private for versions and the dashboard is show only to admin users.
Private project (commercial)
Private version: Normal usa case, not changes required.
Protected version: Users didn’t want to list this version on the footer, but also not deactivate it. This can be solved by using the new
hidden
setting. We can do a data migration of those versions to the newhidden
setting and make them private.Public version: User has private code, but want to make public their docs. No changes required.
Protected project (commercial)
I can’t think of a use case for protected projects, since they aren’t listed publicly on the commercial site.
The migration path for versions of protected projects is the same as a private project.
Public project (commercial)
Currently we show links back to project dashboard if the project is public, which probably users shouldn’t see. With the implementation of this design doc, public versions don’t have links to the project dashboard (except for admin users) and the dashboard is always under login.
Private versions: Users under the organization can see links to the dashboard. Not changes required.
Protected versions: Users under the organization can see links to the dashboard. We can do a data migration of those versions to the new
hidden
setting and make them private.Public versions: All users can see links to the dashboard. Probably they have an open source project, but they still want to manage access using the same teams of the organization. Not changes are required.
A breaking change here is: users outside the organization would not be able to see the dashboard of the project.
Improving redirects
Redirects are a core feature of Read the Docs, they allow users to keep old URLs working when they rename or move a page.
The current implementation lacks some features and has some undefined/undocumented behaviors.
Goals
Improve the user experience when creating redirects.
Improve the current implementation without big breaking changes.
Non-goals
Replicate every feature of other services without having a clear use case for them.
Improve the performance of redirects. This can be discussed in an issue or pull request. Performance should be considered when implementing new improvements.
Allow importing redirects. We should push users to use our API instead.
Allow specifying redirects in the RTD config file. We have had several discussions around this, but we haven’t reached a consensus.
Current implementation
We have five types of redirects:
- Prefix redirect:
Allows to redirect all the URLs that start with a prefix to a new URL using the default version and language of the project. For example: a prefix redirect with the value
/prefix/
will redirect/prefix/foo/bar
to/en/latest/foo/bar
.They are basically the same as an exact redirect with a wildcard at the end. They are a shortcut for a redirect like:
- From:
/prefix/$rest
- To:
/en/latest/
Or maybe we could use a prefix redirect to replace the exact redirect with a wildcard?
- Page redirect:
Allows to redirect a single page to a new URL using the current version and language. For example: a page redirect with the value
/old/page.html
will redirect/en/latest/old/page.html
to/en/latest/new/page.html
.Cross domain redirects are not allowed in page redirects. They apply to all versions, if you want it to apply only to a specific version you can use an exact redirect.
A whole directory can’t be redirected with a page redirect, an exact redirect with a wildcard at the end needs to be used instead.
A page redirect on a single version project is the same as an exact redirect.
- Exact redirect:
Allows to redirect an exact URL to a new URL, it allows a wildcard at the end to redirect. For example: an exact redirect with the value
/en/latest/page.html
will redirect/en/latest/page.html
to the new URL.If an exact redirect with the value
/en/latest/dir/$rest
is created, it will redirect all paths that start with/en/latest/dir/
, the rest of the path will be added to the new URL automatically.Cross domain redirects are allowed in exact redirects.
They apply to all versions.
A wildcard is allowed at the end of the URL.
If a wildcard is used, the rest of the path will be added to the new URL automatically.
- Sphinx HTMLDir to HTML:
Allows to redirect clean-URLs to HTML URLs. Useful in case a project changed the style of their URLs.
They apply to all projects, not just Sphinx projects.
- Sphinx HTML to HTMLDir:
Allows to redirect HTML URLs to clean-URLs. Useful in case a project changed the style of their URLs.
They apply to all projects, not just Sphinx projects.
How other services implement redirects
Gitbook implementation is very basic, they only allow page redirects.
https://docs.gitbook.com/integrations/git-sync/content-configuration#redirects
Cloudflare pages allow to capture placeholders and one wildcard (in any part of the URL). They also allow you to set the status code of the redirect, and redirects can be specific in a
_redirects
file.https://developers.cloudflare.com/pages/platform/redirects/
They have a limit of 2100 redirects. In case of multiple matches, the topmost redirect will be used.
Netlify allows to capture placeholders and a wildcard (only allowed at the end). They also allow you to set the status code of the redirect, and redirects can be specific in a
_redirects
file.Forced redirects
Match query arguments
Match by country/language and cookies
Per-domain and protocol redirects
In case of multiple matches, the topmost redirect will be used.
Rewrites, serve a different file without redirecting.
GitLab pages supports the same syntax as Netlify, and supports a subset of their features:
_redirects
config fileStatus codes
Rewrites
Wildcards (splats)
Placeholders
https://docs.gitlab.com/ee/user/project/pages/redirects.html
Improvements
General improvements
The following improvements will be applied to all types of redirects.
Allow choosing the status code of the redirect. We already have a field for this, but it’s not exposed to users.
Allow to explicitly define the order of redirects. This will be similar to the automation rules feature, where users can reorder the rules so the most specific ones are first. We currently rely on the implicit order of the redirects (updated_at).
Allow to disable redirects. It’s useful when testing redirects, or when debugging a problem. Instead of having to re-create the redirect, we can just disable it and re-enable it later.
Allow to add a short description. It’s useful to document why the redirect was created.
Don’t run redirects on domains from pull request previews
We currently run redirects on domains from pull request previews, this is a problem when moving a whole project to a new domain.
We don’t the need to run redirects on external domains, they should be treated as temporary domains.
Normalize paths with trailing slashes
Currently, if users want to redirect a path with a trailing slash and without it,
they need to create two separate redirects (/page/
and /page
).
We can simplify this by normalizing the path before matching it, or before saving it.
For example:
- From:
/page/
- To:
/new/page
The from path will be normalized to /page
,
and the filename to match will also be normalized before matching it.
This is similar to what Netlify does:
https://docs.netlify.com/routing/redirects/redirect-options/#trailing-slash.
Page and exact redirects without a wildcard at the end will be normalized, all other redirects need to be matched as is.
This makes it impossible to match a path with a trailing slash.
Use *
and :splat
for wildcards
Currently we are using $rest
at the end of the From URL
to indicate that the rest of the path should be added to the target URL.
A similar feature is implemented in other services using *
and :splat
.
Instead of using $rest
in the URL for the suffix wildcard, we now will use *
,
and :splat
as a placeholder in the target URL to be more consistent with other services.
Existing redirects can be migrated automatically.
Explicit :splat
placeholder
Explicitly place the :splat
placeholder in the target URL,
instead of adding it automatically.
Some times users want to redirect to a different path,
we have been adding a query parameter in the target URL to
prevent the old path from being added in the final path.
For example /new/path/?_=
.
Instead of adding the path automatically,
users have to add the :splat
placeholder in the target URL.
For example:
- From:
/old/path/*
- To:
/new/path/:splat
- From:
/old/path/*
- To:
/new/path/?page=:splat&foo=bar
Improving page redirects
Allow to redirect to external domains. This can be useful to apply a redirect of a well known path in all versions to another domain.
For example,
/security/
to a their security policy page in another domain.This new feature isn’t strictly needed, but it will be useful to simplify the explanation of the feature (one less restriction to explain).
Example:
- From:
/security/
- To:
https://example.com/security/
Allow a wildcard at the end of the from path. This will allow users to migrate a whole directory to a new path without having to create an exact redirect for each version.
Similar to exact redirects, users need to add the
:splat
placeholder explicitly. This means that that page redirects are the same as exact redirects, with the only difference that they apply to all versions.Example:
- From:
/old/path/*
- To:
/new/path/:splat
Merge prefix redirects with exact redirects
Prefix redirects are the same as exact redirects with a wildcard at the end. We will migrate all prefix redirects to exact redirects with a wildcard at the end.
For example:
- From:
/prefix/
Will be migrated to:
- From:
/prefix/*
- To:
/en/latest/:splat
Where /en/latest
is the default version and language of the project.
For single version projects, the redirect will be:
- From:
/prefix/*
- To:
/:splat
Improving Sphinx redirects
These redirects are useful, but we should rename them to something more general, since they apply to all types of projects, not just Sphinx projects.
Proposed names:
HTML URL to clean URL redirect (
file.html
tofile/
)Clean URL to HTML URL redirect (
file/
tofile.html
)
Other ideas to improve redirects
The following improvements will not be implemented in the first iteration.
Run forced redirects before built-in redirects. We currently run built-in redirects before forced redirects, this is a problem when moving a whole project to a new domain. For example, a forced redirect like
/$rest
, won’t work for the root URL of the project, since/
will first redirect to/en/latest/
.But shouldn’t be a real problem, since users will still need to handle the
/en/latest/file/
paths.Run redirects on the edge. Cloudflare allow us to create redirects on the edge, but they have some limitations around the number of redirect rules that can be created.
And they will be useful for forced exact redirects only, since we can’t match a redirect based on the response of the origin server.
Merge all redirects into a single type. This may simplify the implementation, but it will make it harder to explain the feature to users. And to replace some redirects we need to implement some new features.
Placeholders. I haven’t seen users requesting this feature. We can consider adding it in the future. Maybe we can expose the current language and version as placeholders.
Per-protocol redirects. We should push users to always use HTTPS.
Allow a prefix wildcard. We currently only allow a suffix wildcard, adding support for a prefix wildcard should be easy. But do users need this feature?
Per-domain redirects. The main problem that originated this request was that we were applying redirects on external domains, if we stop doing that, there is no need for this feature. We can also try to improve how our built-in redirects work (specially our canonical domain redirect).
Allow matching query arguments
We can do this in three ways:
At the DB level with some restrictions. If done at the DB level, we would need to have a different field with just the path, and other with the query arguments normalized and sorted.
For example, if we have a redirect with the value
/foo?blue=1&yellow=2&red=3
, if would be normalized in the DB as/foo
andblue=1&red=3&yellow=2
. This implies that the URL to be matched must have the exact same query arguments, it can’t have more or less.I believe the implementation described here is the same being used by Netlify, since they have that same restriction.
If the URL contains other parameters in addition to or instead of id, the request doesn’t match that rule.
https://docs.netlify.com/routing/redirects/redirect-options/#query-parameters
At the DB level using a JSONField. All query arguments will be saved normalized as a dictionary. When matching the URL, we will need to normalize the query arguments, and use some a combination of
has_keys
andcontained_by
to match the exact number of query arguments.At the Python level. If done at the DB level, we would need to have a different field with just the path, and other with query arguments.
The matching of the path would be done at the DB level, and the matching of the query arguments would be done at the Python level. Here we can be more flexible, allowing any query arguments in the matched URL.
We had some performance problems in the past, but I believe it was mainly due to the use of regex instead of using string operations. And matching the path is still done at the DB level. We could limit the number of redirects that can be created with query arguments, or the number of redirects in general.
We hava had only one user requesting this feature, so this is not a priority.
Migration
Most of the proposed improvements are backwards compatible, and just need a data migration to normalize existing redirects.
For the exception of adding the $rest
placeholder in the target URL explicitly,
that needs users to re-learn how this feature works, i.e, they may be expecting
to have the path added automatically in the target URL.
We can create a small blog post explaining the changes.
Refactor RemoteRepository
object
This document describes the current usage of RemoteRepository
objects and proposes a new normalized modeling.
Goals
De-duplicate data stored in our database.
Save only one
RemoteRepository
per GitHub repository.Use an intermediate table between
RemoteRepository
andUser
to store associated remote data for the specific user.Make this model usable from our SSO implementation (adding
remote_id
field inRemote
objects).Use Post
JSONField
to store associatedjson
remote data.Make
Project
connect directly toRemoteRepository
without being linked to a specificUser
.Do not disconnect
Project
andRemoteRepository
when a user delete/disconnects their account.
Non-goals
Keep
RemoteRepository
in sync with GitHub repositories.Delete
RemoteRepository
objects deleted from GitHub.Listen to GitHub events to detect
full_name
changes and update our objects.
Note
We may need/want some of these non-goals in the future. They are just outside the scope of this document.
Current implementation
When a user connect their account to a social account, we create a
allauth.socialaccount.models.SocialAccount
* basic information (provider, last login, etc) * provider’s specific data saved in a JSON underextra_data
allauthsocialaccount.models.SocialToken
* token to hit the API on behalf the user
We don’t create any RemoteRepository
at this point.
They are created when the user jumps into “Import Project” page and hit the circled arrows.
It triggers sync_remote_repostories
task in background that updates or creates RemoteRepositories
,
but it does not delete them (after #7183 and #7310 got merged, they will be deleted).
One RemoteRepository
is created per repository the User
has access to.
Note
In corporate, we are automatically syncing RemoteRepository
and RemoteOganization
at signup (foreground) and login (background) via a signal. We should eventually move these to community.
Where RemoteRepository
is used?
List of available repositories to import under “Import Project”
Show a “+”, “External Arrow” or a “Lock” sign next to the element in the list * +: it’s available to be imported * External Arrow: the repository is already imported (see RemoteRepository.matches method) * Lock: user doesn’t have (admin) permissions to import this repository (uses
RemoteRepository.private
andRemoteRepository.admin
)Avatar URL in the list of project available to import
Update webhook when user clicks “Resync webhook” from the Admin > Integrations tab
Send build status when building Pull Requests
New normalized implementation
The ManyToMany
relation RemoteRepository.users
will be changed to be ManyToMany(through='RemoteRelation')
to add extra fields in the relation that are specific only for the User.
Allows us to have only one RemoteRepository
per GitHub repository with multiple relationships to User
.
With this modeling, we can avoid the disconnection Project
and RemoteRepository
only by removing the RemoteRelation
.
Note
All the points mentioned in the previous section may need to be adapted to use the new normalized modeling. However, it may be only field renaming or small query changes over new fields.
Use this modeling for SSO
We can get the list of Project
where a user as access:
admin_remote_repositories = RemoteRepository.objects.filter(
users__contains=request.user,
users__remoterelation__admin=True, # False for read-only access
)
Project.objects.filter(remote_repository__in=admin_remote_repositories)
Rollout plan
Due the constraints we have in the RemoteRepository
table and its size,
we can’t just do the data migration at the same time of the deploy.
Because of this we need to be more creative here and find a way to re-sync the data from VCS providers,
while the site continue working.
To achieve this, we thought on following this steps:
1. modify all the Python code to use the new modeling in .org and .com (will help us to find out bugs locally in an easier way)
1. QA this locally with test data
1. enable Django signal to re-sync RemoteRepository on login async (we already have this in .com). New active users will have updated data immediately
1. spin up a new instance with the new refactored code
1. run migrations to create a new table for RemoteRepository
1. re-sync everything from VCS providers into the new table for 1-week or so
1. dump-n-load Project - RemoteRepository
relations
1. create a migration to use the new table with synced data
1. deploy new code once the sync is finished
See these issues for more context: * https://github.com/readthedocs/readthedocs.org/pull/7536#issuecomment-724102640 * https://github.com/readthedocs/readthedocs.org/pull/7675#issuecomment-732756118
Secure API access from builders
Goals
Provide a secure way for builders to access the API.
Limit the access of the tokens to the minimum required.
Non-goals
Migrate builds to use API V3
Implement this mechanism in API V3
Expose it to users
All these changes can be made in the future, if needed.
Current state
Currently, we access the API V2 from the builders using the credentials of the “builder” user. This user is a superuser, it has access to all projects, write access to the API, access to restricted endpoints, and restricted fields.
The credentials are hardcoded in our settings file, so if there is a vulnerability that allows users to have access to the settings file, the attacker will have access to the credentials of the “builder” user, giving them full access to the API and all projects.
Proposed solution
Instead of using the credential of a super user to access the API, we will create a temporal token attached to a project, and one of the owners of the project. This way this token will have access to the given project only for a limited period of time.
This token will be generated from the webs, and passed to the builders via the celery task, where it can be used to access the API. Once the build has finished, this token will be revoked.
Technical implementation
We will use the rest-knox package, this package is recommended by the DRF documentation, since the default token implementation of DRF is very basic, some relevant features of knox are:
Support for several tokens per user.
Tokens are stored in a hashed format in the database. We don’t have access the tokens after they are created.
Tokens can have an expiration date.
Tokens can be created with a prefix (rtd_xxx) (unreleased)
Support for custom token model (unreleased)
We won’t expose the token creation view directly, since we can create the tokens from the webs, and this isn’t exposed to users.
The view to revoke the token will be exposed, since we need it to revoke the token once the build has finished.
From the API, we just need to add the proper permission and authentication classes to the views we want to support.
To differentiate from a normal user and a token authed user,
we will have access to the token via the request.auth
attribute in the API views,
this will also be used to get the attached projects to filter the querysets.
The knox package allows us to provide our own token model, this will be useful to add our own fields to the token model. Fields like the projects attached to the token, or access to all projects the user has access to, etc.
Flow
The flow of creation and usage of the token will be:
Create a token from the webs when a build is triggered. The triggered project will be attached to the token, if the build was triggered by a user, that user will be attached to the token, otherwise the token will be attached to one of the owners of the project.
The token will be created with an expiration date of 3 hours, this should be enough for the build to finish. We could also make this dynamic depending of the project.
Pass the token to the builder via the celery task.
Pass the token to all places where the API is used.
Revoke the token when the build has finished. This is done by hitting the revoke endpoint.
In case the revoke endpoint fails, the token will expire in 3 hours.
Why attach tokens to users?
Attaching tokens to users will ease the implementation, since we can re-use the code from knox package.
Attaching tokens to projects only is possible,
but it will require to manage the authentication manually.
This is since Knox requires a user to be attached to the token,
and this user is used in their TokenAuthentication
class.
An alternative is to use the DRF API key package, which doesn’t require a user,
but then if we wanted to extend this functionality to our normal APIs, we will have
to implement the authentication manually.
Kepping backwards compatibility
Access to write API V2 is restricted to superusers, and was used only from the builders. So we don’t need to keep backwards compatibility for authed requests, but we need to keep the old implementation working while we deploy the new one.
Possible issues
Some of the features that we may need are not released yet, we need the custom token model feature, specially.
There is a race condition when using the token, and the user that is attached to that token is removed from the project. This is, if the user is removed while the build is running, the builders won’t be able to access the API. We could avoid this by not relying on the user attached to the token, only on the projects attached to it (this would be for our build APIs only).
Alternative implementation with Django REST Framework API Key
Instead of using knox, we can use DRF API key, it has the same features as knox, with the exception of:
It is only used for authorization, it can’t be used for authentication (or it can’t be out of the box).
It doesn’t expose views to revoke the tokens (but this should be easy to manually implement)
Changing the behaviour of some things require sub-classing instead of defining settings.
It supports several token models (not just one like knox).
All features that we need are already released.
The implementation will be very similar to the one described for knox, with the exception that tokens won’t be attached to users, but just a project. And we won’t be needing to handle authentication, since the token itself will grant access to the projects.
To avoid breaking builders, we need to be able to make the old and the new implementation work together, this is, allow authentication and handle tokens at the same time. This means passing valid user credentials together with the token, this “feature” can be removed in the next deploy (with knox we also need to handle both implementations, but it doesn’t require passing credentials with the token, since it also handles authentication).
Decision
Due to the fact that the required featues from knox are not released yet, we have decided to use DRF API key instead.
Future work
This work can be extended to API V3, and be exposed to users in the future. We only need to take into consideration that the token model will be shared by both, API V2 and API V3 if using knox, if we use API key, we can have different token models for each use case.
sphinxcontrib-jquery
jQuery will be removed from Sphinx 6.0.0. We can expect 6.0.0 to ship in late 2022.
See also
This is a “request for comments” for a community-owned Sphinx extension that bundles jQuery.
Overview
- Comment deadline:
November 1st, 2022
- Package-name:
sphinxcontrib-jquery
- Python package:
sphinxcontrib.jquery
- Dependencies:
Python 3+, Sphinx 1.8+ (or perhaps no lower bound?)
- Ownership:
Read the Docs core team will implement the initial releases of an otherwise community-owned package that lives in https://github.com/sphinx-contrib/jquery
- Functionality:
sphinxcontrib-jquery is a Sphinx extension that provides a simple mechanism for other Sphinx extensions and themes to ensure that jQuery is included into the HTML build outputs and loaded in the HTML DOM itself. More specifically, the extension ensures that jQuery is loaded exactly once no matter how many themes and extensions that request to include jQuery nor the version of Sphinx.
- Scope:
This extension assumes that it’s enough to provide a single version of jQuery for all of its dependent extensions and themes. As the name implies, this extension is built to handle jQuery only. It’s not a general asset manager and it’s not looking to do dependency resolution of jQuery versions.
Usage
The primary users of this package are theme and extension developers and documentation project owners.
Theme and extension developers
The following 2 steps need to be completed:
A Sphinx theme or extension should depend on the python package
sphinxcontrib-jquery
.In your extension’s or theme’s
setup(app)
, callapp.setup_extension("sphinxcontrib.jquery")
.
In addition to this, we recommend extension and theme developers to log to the browser’s console.error
in case jQuery isn’t found. The log message could for instance say:
if (typeof $ == "undefined") console.error("<package-name> depends on sphinxcontrib-jquery. Please ensure that <package-name>.setup(app) is called or add 'sphinxcontrib-jquery' to your conf.py extensions setting.")
Documentation project owners
If you are depending on a theme or extension that did not itself address the removal of jQuery from Sphinx 6, you can patch up your project like this:
Add
sphinxcontrib-jquery
to your installed dependencies.Add
sphinxcontrib.jquery
to yourextensions
setting inconf.py
.
Calling app.setup_extension("sphinxcontrib.jquery")
When a Sphinx theme or extension calls setup_extension(), a call to sphinxcontrib.jquery.setup(app)
will happen. Adding sphinxcontrib.jquery
to a documentation project’s conf.extensions
will also call sphinxcontrib.jquery.setup(app)
(at most once).
In sphinxcontrib.jquery.setup(app)
, jQuery is added. The default behaviour is to detect the Sphinx version and include jQuery via app.add_js_file when Sphinx is from version 6 and up. jQuery is added at most once.
Config value: jquery_force_enable
When setting jquery_force_enable=True
, jQuery is added no matter the Sphinx version, but at most once. This is useful if you want to handle alternative conditions for adding jQuery.
Warning
If you set jquery_force_enable=True
, you most likely should also add Sphinx>=6
to your theme’s/extension’s dependencies since versions before this already bundles jQuery!
jQuery version and inclusion
jQuery should be be shipped together with the Python package and not be referenced from a CDN.
Sphinx has kept relatively up to date with jQuery, and this package intends to follow. The most recently bundled jQuery version was v3.5.1 and only two releases have happened since: 3.6.0 and 3.6.1. The 3.6.0 release had a very small backwards incompatibility which illustrates how harmless these upgrades are for the general purpose Sphinx package.
Therefore, we propose to start the release of sphinxcontrib-jquery
at 3.5.1 (the currently shipped version) and subsequently release 3.6.1 in an update. This will give users that need 3.5.1 a choice of a lower version.
The bundled jQuery version will be NPM pre-minified and distributed together with the PyPI package.
The minified jQuery JS file is ultimately included by calling app.add_js_file, which is passed the following arguments:
app.add_js_file(
get_jquery_url_path(),
loading_method="defer",
priority=200,
integrity="sha256-{}".format(get_jquery_sha256_checksum()),
)
Note
It’s possible to include jQuery in other ways, but this ultimately doesn’t require this extension and is therefore not supported.
Allow installation of system packages
Currently we don’t allow executing arbitrary commands in the build process. The more common use case is to install extra dependencies.
Current status
There is a workaround when using Sphinx to run arbitrary commands,
this is executing the commands inside the conf.py
file.
There isn’t a workaround for MkDocs, but this problem is more common in Sphinx,
since users need to install some extra dependencies in order to use autodoc or build Jupyter Notebooks.
However, installation of some dependencies require root access,
or are easier to install using apt
.
Most of the CI services allow to use apt
or execute any command with sudo
,
so users are more familiar with that workflow.
Some users use Conda instead of pip to install dependencies in order to avoid these problems, but not all pip users are familiar with Conda, or want to migrate to Conda just to use Read the Docs.
Security concerns
Builds are run in a Docker container, but the app controlling that container lives in the same server. Allowing to execute arbitrary commands with super user privileges may introduce some security issues.
Exposing apt install
For the previous reasons we won’t allow to execute arbitrary commands with root (yet),
but instead allow only to install extra packages using apt
.
We would expose this through the config file. Users will provide a list of packages to install, and under the hook we would run:
apt update -y
apt install -y {packages}
These commands will be run before the Python setup step and after the clone step.
Note
All package names must be validated to avoid injection of extra options
(like -v
).
Using docker exec
Currently we use docker exec
to execute commands in a running container.
This command also allows to pass a user which is used to run the commands (#8058).
We can run the apt
commands in our current containers using a super user momentarily.
Config file
The config file can add an additional mapping build.apt_packages
to a list of packages to install.
version: 2
build:
apt_packages:
- cmatrix
- mysql-server
Note
Other names that were considered were:
build.packages
build.extra_packages
build.system_packages
These were rejected to avoid confusion with existing keys, and to be explicit about the type of package.
Possible problems
Some users may require to pass some additional flags or install from a ppa.
Some packages may require some additional setup after installation.
Other possible solutions
We can allow to run the containers as root doing something similar to what Travis does: They have one tool to convert the config file to a shell script (travis-build), and another that spins a docker container, executes that shell script and streams the logs back (travis-worker).
A similar solution could be implemented using AWS Lambda.
This of course would require a large amount of work, but may be useful for the future.
Collect data about builds
We may want to take some decisions in the future about deprecations and supported versions. Right now we don’t have data about the usage of packages and their versions on Read the Docs to be able to make an informed decision.
Tools
- Kibana:
We can import data from ES.
Cloud service provided by Elastic.
- Superset:
We can import data from several DBs (including postgres and ES).
Easy to setup locally, but doesn’t look like there is cloud provider for it.
- Metabase:
We can import data from several DBs (including postgres).
Cloud service provided by Metabase.
Summary: We have several tools that can inspect data form a postgres DB,
and we also have Kibana
that works only with ElasticSearch.
The data to be collected can be saved in a postgres or ES database.
Currently, we are making use of Metabase to get other information,
so it’s probably the right choice for this task.
Data to be collected
The following data can be collected after installing all dependencies.
Configuration file
We are saving the config file in our database, but to save some space we are saving it only if it’s different than the one from a previous build (if it’s the same we save a reference to it).
The config file being saved isn’t the original one used by the user, but the result of merging it with its default values.
We may also want to have the original config file, so we know which settings users are using.
PIP packages
We can get a json with all and root dependencies with pip list
.
This will allow us to have the name of the packages and their versions used in the build.
$ pip list --pre --local --format json | jq
# and
$ pip list --pre --not-required --local --format json | jq
[
{
"name": "requests-mock",
"version": "1.8.0"
},
{
"name": "requests-toolbelt",
"version": "0.9.1"
},
{
"name": "rstcheck",
"version": "3.3.1"
},
{
"name": "selectolax",
"version": "0.2.10"
},
{
"name": "slumber",
"version": "0.7.1"
},
{
"name": "sphinx-autobuild",
"version": "2020.9.1"
},
{
"name": "sphinx-hoverxref",
"version": "0.5b1"
},
]
With the --not-required
option, pip will list only the root dependencies.
Conda packages
We can get a json with all dependencies with conda list --json
.
That command gets all the root dependencies and their dependencies
(there is no way to list only the root dependencies),
so we may be collecting some noise, but we can use pip list
as a secondary source.
$ conda list --json --name conda-env
[
{
"base_url": "https://conda.anaconda.org/conda-forge",
"build_number": 0,
"build_string": "py_0",
"channel": "conda-forge",
"dist_name": "alabaster-0.7.12-py_0",
"name": "alabaster",
"platform": "noarch",
"version": "0.7.12"
},
{
"base_url": "https://conda.anaconda.org/conda-forge",
"build_number": 0,
"build_string": "pyh9f0ad1d_0",
"channel": "conda-forge",
"dist_name": "asn1crypto-1.4.0-pyh9f0ad1d_0",
"name": "asn1crypto",
"platform": "noarch",
"version": "1.4.0"
},
{
"base_url": "https://conda.anaconda.org/conda-forge",
"build_number": 3,
"build_string": "3",
"channel": "conda-forge",
"dist_name": "python-3.5.4-3",
"name": "python",
"platform": "linux-64",
"version": "3.5.4"
}
]
APT packages
We can get the list from the config file,
or we can list the packages installed with dpkg --get-selections
.
That command would list all pre-installed packages as well, so we may be getting some noise.
$ dpkg --get-selections
adduser install
apt install
base-files install
base-passwd install
bash install
binutils install
binutils-common:amd64 install
binutils-x86-64-linux-gnu install
bsdutils install
build-essential install
We can get the installed version with:
$ dpkg --status python3
Package: python3
Status: install ok installed
Priority: optional
Section: python
Installed-Size: 189
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: allowed
Source: python3-defaults
Version: 3.8.2-0ubuntu2
Replaces: python3-minimal (<< 3.1.2-2)
Provides: python3-profiler
Depends: python3.8 (>= 3.8.2-1~), libpython3-stdlib (= 3.8.2-0ubuntu2)
Pre-Depends: python3-minimal (= 3.8.2-0ubuntu2)
Suggests: python3-doc (>= 3.8.2-0ubuntu2), python3-tk (>= 3.8.2-1~), python3-venv (>= 3.8.2-0ubuntu2)
Description: interactive high-level object-oriented language (default python3 version)
Python, the high-level, interactive object oriented language,
includes an extensive class library with lots of goodies for
network programming, system administration, sounds and graphics.
.
This package is a dependency package, which depends on Debian's default
Python 3 version (currently v3.8).
Homepage: https://www.python.org/
Original-Maintainer: Matthias Klose <doko@debian.org>
Or with
$ apt-cache policy python3
Installed: 3.8.2-0ubuntu2
Candidate: 3.8.2-0ubuntu2
Version table:
*** 3.8.2-0ubuntu2 500
500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages
100 /var/lib/dpkg/status
Python
We can get the Python version from the config file when using a Python environment,
and from the conda list
output when using a Conda environment.
OS
We can infer the OS version from the build image used in the config file, but since it changes with time, we can get it from the OS itself:
$ lsb_release --description
Description: Ubuntu 18.04.5 LTS
# or
$ cat /etc/issue
Ubuntu 18.04.5 LTS \n \l
Format
The final information to be saved would consist of:
organization: the organization id/slug
project: the project id/slug
version: the version id/slug
build: the build id, date, length, status.
user_config: Original user config file
final_config: Final configuration used (merged with defaults)
packages.pip: List of pip packages with name and version
packages.conda: List of conda packages with name, channel, and version
packages.apt: List of apt packages
python: Python version used
os: Operating system used
{
"organization": {
"id": 1,
"slug": "org"
},
"project": {
"id": 2,
"slug": "docs"
},
"version": {
"id": 1,
"slug": "latest"
},
"build": {
"id": 3,
"date/start": "2021-04-20-...",
"length": "00:06:34",
"status": "normal",
"success": true,
"commit": "abcd1234"
},
"config": {
"user": {},
"final": {}
},
"packages": {
"pip": [{
"name": "sphinx",
"version": "3.4.5"
}],
"pip_all": [
{
"name": "sphinx",
"version": "3.4.5"
},
{
"name": "docutils",
"version": "0.16.0"
}
],
"conda": [{
"name": "sphinx",
"channel": "conda-forge",
"version": "0.1"
}],
"apt": [{
"name": "python3-dev",
"version": "3.8.2-0ubuntu2"
}],
},
"python": "3.7",
"os": "ubuntu-18.04.5"
}
Storage
All this information can be collected after the build has finished, and we can store it in a dedicated database (telemetry), using Django’s models.
Since this information isn’t sensitive, we should be fine saving this data even if the project/version is deleted. As we don’t care about historical data, we can save the information per-version and from their latest build only. And delete old data if it grows too much.
Should we make heavy use of JSON fields? Or try to avoid nesting structures as possible? Like config.user/config.final vs user_config/final_config. Or having several fields in our model instead of just one big json field?
Read the Docs data passed to Sphinx build context
Before calling sphinx-build
to render your docs, Read the Docs injects some
extra context in the templates by using the html_context Sphinx setting in the conf.py
file.
This extra context can be used to build some awesome features in your own theme.
Warning
This design document details future features that are not yet implemented. To discuss this document, please get in touch in the issue tracker.
Note
The Read the Docs Sphinx Theme uses this context to add additional features to the built documentation.
Context injected
Here is the full list of values injected by Read the Docs as a Python dictionary.
Note that this dictionary is injected under the main key readthedocs
:
{
"readthedocs": {
"v1": {
"version": {
"id": int,
"slug": str,
"verbose_name": str,
"identifier": str,
"type": str,
"build_date": str,
"downloads": {"pdf": str, "htmlzip": str, "epub": str},
"links": [
{
"href": "https://readthedocs.org/api/v2/version/{id}/",
"rel": "self",
}
],
},
"project": {
"id": int,
"name": str,
"slug": str,
"description": str,
"language": str,
"canonical_url": str,
"subprojects": [
{
"id": int,
"name": str,
"slug": str,
"description": str,
"language": str,
"canonical_url": str,
"links": [
{
"href": "https://readthedocs.org/api/v2/project/{id}/",
"rel": "self",
}
],
}
],
"links": [
{
"href": "https://readthedocs.org/api/v2/project/{id}/",
"rel": "self",
}
],
},
"sphinx": {"html_theme": str, "source_suffix": str},
"analytics": {"user_analytics_code": str, "global_analytics_code": str},
"vcs": {
"type": str, # 'bitbucket', 'github', 'gitlab' or 'svn'
"user": str,
"repo": str,
"commit": str,
"version": str,
"display": bool,
"conf_py_path": str,
},
"meta": {
"API_HOST": str,
"MEDIA_URL": str,
"PRODUCTION_DOMAIN": str,
"READTHEDOCS": True,
},
}
}
}
Warning
Read the Docs passes information to sphinx-build
that may change in the future
(e.g. at the moment of building the version 0.6
this was the latest
but then 0.7
and 0.8
were added to the project and also built under Read the Docs)
so it’s your responsibility to use this context in a proper way.
In case you want fresh data at the moment of reading your documentation, you should consider using the Read the Docs Public API via Javascript.
Using Read the Docs context in your theme
In case you want to access to this data from your theme, you can use it like this:
{% if readthedocs.v1.vcs.type == 'github' %}
<a href="https://github.com/{{ readthedocs.v1.vcs.user }}/{{ readthedocs.v1.vcs.repo }}
/blob/{{ readthedocs.v1.vcs.version }}{{ readthedocs.v1.vcs.conf_py_path }}{{ pagename }}.rst">
Show on GitHub</a>
{% endif %}
Note
In this example, we are using pagename
which is a Sphinx variable
representing the name of the page you are on. More information about Sphinx
variables can be found in the Sphinx documentation.
Customizing the context
In case you want to add some extra context you will have to declare your own
html_context
in your conf.py
like this:
html_context = {
"author": "My Name",
"date": datetime.date.today().strftime("%d/%m/%y"),
}
and use it inside your theme as:
<p>This documentation was written by {{ author }} on {{ date }}.</p>
Warning
Take into account that the Read the Docs context is injected after your definition of html_context
so,
it’s not possible to override Read the Docs context values.
YAML configuration file
Background
The current YAML configuration file is in beta state. There are many options and features that it doesn’t support yet. This document will serve as a design document for discuss how to implement the missing features.
Scope
Finish the spec to include all the missing options
Have consistency around the spec
Proper documentation for the end user
Allow to specify the spec’s version used on the YAML file
Collect/show metadata about the YAML file and build configuration
Promote the adoption of the configuration file
RTD settings
No all the RTD settings are applicable to the YAML file, others are applicable for each build (or version), and others for the global project.
Not applicable settings
Those settings can’t be on the YAML file because: may depend for the initial project setup, are planned to be removed, security and privacy reasons.
Project Name
Repo URL
Repo type
Privacy level (this feature is planned to be removed [1])
Project description (this feature is planned to be removed [2])
Single version
Default branch
Default version
Domains
Active versions
Translations
Subprojects
Integrations
Notifications
Language
Programming Language
Project homepage
Tags
Analytics code
Global redirects
Global settings
To keep consistency with the per-version settings and avoid confusion, this settings will not be stored in the YAML file and will be stored in the database only.
Local settings
Those configurations will be read from the YAML file in the current version that is being built.
Several settings are already implemented and documented on https://docs.readthedocs.io/en/latest/yaml-config.html. So, they aren’t covered with much detail here.
Documentation type
Project installation (virtual env, requirements file, sphinx configuration file, etc)
Additional builds (pdf, epub)
Python interpreter
Per-version redirects
Configuration file
Format
The file format is based on the YAML spec 1.2 [3] (latest version on the time of this writing).
The file must be on the root directory of the repository, and must be named as:
readthedocs.yml
readthedocs.yaml
.readthedocs.yml
.readthedocs.yaml
Conventions
The spec of the configuration file must use this conventions.
Use
[]
to indicate an empty listUse
null
to indicate a null valueUse
all
(internal string keyword) to indicate that all options are included on a list with predetermined choices.Use
true
andfalse
as only options on boolean fields
Spec
The current spec is documented on https://docs.readthedocs.io/en/latest/yaml-config.html. It will be used as base for the future spec. The spec will be written using a validation schema such as https://json-schema-everywhere.github.io/yaml.
Versioning the spec
The version of the spec that the user wants to use will be specified on the YAML file. The spec only will have mayor versions (1.0, not 1.2) [4]. For keeping compatibility with older projects using a configuration file without a version, the latest compatible version will be used (1.0).
Adoption of the configuration file
When a user creates a new project or it’s on the settings page, we could suggest her/him an example of a functional configuration file with a minimal setup. And making clear where to put global configurations.
For users that already have a project, we can suggest him/her a configuration file on each build based on the current settings.
Configuration file and database
The settings used in the build from the configuration file (and other metadata) needs to be stored in the database, this is for later usage only, not to populate existing fields.
The build process
The repository is updated
Checkout to the current version
Retrieve the settings from the database
Try to parse the YAML file (the build fails if there is an error)
Merge the both settings (YAML file and database)
The version is built according to the settings
The settings used to build the documentation can be seen by the user
Dependencies
Current repository which contains the code related to the configuration file: https://github.com/readthedocs/readthedocs-build
Footnotes
Development installation
These are development setup and standards that are followed to by the core development team. If you are a contributor to Read the Docs, it might a be a good idea to follow these guidelines as well.
Requirements
A development setup can be hosted by your laptop, in a VM, on a separate server etc. Any such scenario should work fine, as long as it can satisfy the following:
Is Unix-like system (Linux, BSD, Mac OSX) which supports Docker. Windows systems should have WSL+Docker or Docker Desktop.
Has 10 GB or more of free disk space on the drive where Docker’s cache and volumes are stored. If you want to experiment with customizing Docker containers, you’ll likely need more.
Can spare 2 GB of system memory for running Read the Docs, this typically means that a development laptop should have 8 GB or more of memory in total.
Your system should ideally match the production system which uses the latest official+stable Docker distribution for Ubuntu (the
docker-ce
package). If you are on Windows or Mac, you may also want to try Docker Desktop.
Note
Take into account that this setup is intended for development purposes. We do not recommend to follow this guide to deploy an instance of Read the Docs for production.
Install external dependencies (Docker, Docker Compose, gVisor)
Install Docker by following the official guide.
Install Docker Compose with the official instructions.
Install and set up gVisor following gVisor installation.
Set up your environment
Clone the
readthedocs.org
repository:git clone --recurse-submodules https://github.com/readthedocs/readthedocs.org/
Install or clone additional repositories:
Note
This step is only required for Read the Docs core team members.
Core team should at very least have all required packages installed in their development image. To install these packages you must define a GitHub token before building your image:
export GITHUB_TOKEN="..." export GITHUB_USER="..."
In order to make development changes on any of our private repositories, such as
readthedocs-ext
orext-theme
, you will also need to check these repositories out:git clone --recurse-submodules https://github.com/readthedocs/readthedocs-ext/
Install the requirements from
common
submodule:pip install -r common/dockerfiles/requirements.txt
Build the Docker image for the servers:
Warning
This command could take a while to finish since it will download several Docker images.
inv docker.build
Pull down Docker images for the builders:
inv docker.pull
Start all the containers:
inv docker.up --init # --init is only needed the first time
Go to http://devthedocs.org to access your local instance of Read the Docs.
Check that everything works
Visit http://devthedocs.org
Login as
admin
/admin
and verify that the project list appears.Go to the “Read the Docs” project, under section Build a version, click on the Build version button selecting “latest”, and wait until it finishes (this can take several minutes).
Warning
Read the Docs will compile the Python/Node.js/Rust/Go version on-the-fly each time when building the documentation.
To speed things up, you can pre-compile and cache all these versions by using inv docker.compilebuildtool
command.
We strongly recommend to pre-compile these versions if you want to build documentation on your development instance.
Click on the “View docs” button to browse the documentation, and verify that it shows the Read the Docs documentation page.
Working with Docker Compose
We wrote a wrapper with invoke
around docker-compose
to have some shortcuts and
save some work while typing docker compose commands. This section explains these invoke
commands:
inv docker.build
Builds the generic Docker image used by our servers (web, celery, build and proxito).
inv docker.up
Starts all the containers needed to run Read the Docs completely.
--no-search
can be passed to disable search--init
is used the first time this command is ran to run initial migrations, create an admin user, etc--no-reload
makes all celery processes and django runserver to use no reload and do not watch for files changes--no-django-debug
runs all containers withDEBUG=False
--http-domain
configures an external domain for the environment (useful for Ngrok or other http proxy). Note that https proxies aren’t supported. There will also be issues with “suspicious domain” failures on Proxito.--ext-theme
to use the new dashboard templates--webpack
to start the Webpack dev server for the new dashboard templates
inv docker.shell
Opens a shell in a container (web by default).
--no-running
spins up a new container and open a shell--container
specifies in which container the shell is open
inv docker.manage {command}
Executes a Django management command in a container.
Tip
Useful when modifying models to run
makemigrations
.inv docker.down
Stops and removes all containers running.
--volumes
will remove the volumes as well (database data will be lost)
inv docker.restart {containers}
Restarts the containers specified (automatically restarts NGINX when needed).
inv docker.attach {container}
Grab STDIN/STDOUT control of a running container.
Tip
Useful to debug with
pdb
. Once the program has stopped in your pdb line, you can runinv docker.attach web
and jump into a pdb session (it also works with ipdb and pdb++)Tip
You can hit CTRL-p CTRL-p to detach it without stopping the running process.
inv docker.test
Runs all the test suites inside the container.
--arguments
will pass arguments to Tox command (e.g.--arguments "-e py310 -- -k test_api"
)
inv docker.pull
Downloads and tags all the Docker images required for builders.
--only-required
pulls only the imageubuntu-20.04
.
inv docker.buildassets
Build all the assets and “deploy” them to the storage.
inv docker.compilebuildtool
Pre-compile and cache tools that can be specified in
build.tools
to speed up builds. It requiresinv docker.up
running in another terminal to be able to upload the pre-compiled version to the cache.
Adding a new Python dependency
The Docker image for the servers is built with the requirements defined in the current checked out branch.
In case you need to add a new Python dependency while developing,
you can use the common/dockerfiles/entrypoints/common.sh
script as shortcut.
This script is run at startup on all the servers (web, celery, builder, proxito) which
allows you to test your dependency without re-building the whole image.
To do this, add the pip
command required for your dependency in common.sh
file:
# common.sh
pip install my-dependency==1.2.3
Once the PR that adds this dependency was merged, you can rebuild the image so the dependency is added to the Docker image itself and it’s not needed to be installed each time the container spins up.
Debugging Celery
In order to step into the worker process, you can’t use pdb
or ipdb
, but
you can use celery.contrib.rdb
:
from celery.contrib import rdb
rdb.set_trace()
When the breakpoint is hit, the Celery worker will pause on the breakpoint and
will alert you on STDOUT of a port to connect to. You can open a shell into the container
with inv docker.shell celery
(or build
) and then use telnet
or netcat
to connect to the debug process port:
nc 127.0.0.1 6900
The rdb
debugger is similar to pdb
, there is no ipdb
for remote
debugging currently.
Configuring connected accounts
These are optional steps to setup the connected accounts (GitHub, Bitbucket, and GitLab) in your development environment. This will allow you to login to your local development instance using your GitHub, Bitbucket, or GitLab credentials and this makes the process of importing repositories easier.
However, because these services will not be able to connect back to your local development instance, incoming webhooks will not function correctly. For some services, the webhooks will fail to be added when the repository is imported. For others, the webhook will simply fail to connect when there are new commits to the repository.

Configuring an OAuth consumer for local development on Bitbucket
Configure the applications on GitHub, Bitbucket, and GitLab. For each of these, the callback URI is
http://devthedocs.org/accounts/<provider>/login/callback/
where<provider>
is one ofgithub
,gitlab
, orbitbucket_oauth2
. When setup, you will be given a “Client ID” (also called an “Application ID” or just “Key”) and a “Secret”.Take the “Client ID” and “Secret” for each service and enter it in your local Django admin at:
http://devthedocs.org/admin/socialaccount/socialapp/
. Make sure to apply it to the “Site”.
Troubleshooting
Warning
The environment is developed and mainly tested on Docker Compose v1.x.
If you are running Docker Compose 2.x, please make sure you have COMPOSE_COMPATIBILITY=true
set.
This is automatically loaded via the .env
file.
If you want to ensure that the file is loaded, run:
source .env
Builds fail with a generic error
There are projects that do not use the default Docker image downloaded when setting up the development environment. These extra images are not downloaded by default because they are big and they are not required in all cases. However, if you are seeing the following error

Build failing with a generic error
and in the console where the logs are shown you see something like BuildAppError: No such image: readthedocs/build:ubuntu-22.04
,
that means the application wasn’t able to find the Docker image required to build that project and it failed.
In this case, you can run a command to download all the optional Docker images:
inv docker.pull
However, if you prefer to download only the specific image required for that project and save some space on disk, you have to follow these steps:
find the latest tag for the image shown in the logs (in this example is
readthedocs/build:ubuntu-22.04
, which the current latest tag on that page isubuntu-22.04-2022.03.15
)run the Docker command to pull it:
docker pull readthedocs/build:ubuntu-22.04-2022.03.15
tag the downloaded Docker image for the app to findit:
docker tag readthedocs/build:ubuntu-22.04-2022.03.15 readthedocs/build:ubuntu-22.04
Once this is done, you should be able to trigger a new build on that project and it should succeed.
Core team standards
Core team members expect to have a development environment that closely approximates our production environment, in order to spot bugs and logical inconsistencies before they make their way to production.
This solution gives us many features that allows us to have an environment closer to production:
- Celery runs as a separate process
Avoids masking bugs that could be introduced by Celery tasks in a race conditions.
- Celery runs multiple processes
We run celery with multiple worker processes to discover race conditions between tasks.
- Docker for builds
Docker is used for a build backend instead of the local host build backend. There are a number of differences between the two execution methods in how processes are executed, what is installed, and what can potentially leak through and mask bugs – for example, local SSH agent allowing code check not normally possible.
- Serve documentation under a subdomain
There are a number of resolution bugs and cross-domain behavior that can only be caught by using a
PUBLIC_DOMAIN
setting different from thePRODUCTION_DOMAIN
setting.- PostgreSQL as a database
It is recommended that Postgres be used as the default database whenever possible, as SQLite has issues with our Django version and we use Postgres in production. Differences between Postgres and SQLite should be masked for the most part however, as Django does abstract database procedures, and we don’t do any Postgres-specific operations yet.
- Celery is isolated from database
Celery workers on our build servers do not have database access and need to be written to use API access instead.
- Use NGINX as web server
All the site is served via NGINX with the ability to change some configuration locally.
- MinIO as Django storage backend
All static and media files are served using Minio –an emulator of S3, which is the one used in production.
- Serve documentation via El Proxito
El Proxito is a small application put in front of the documentation to serve files from the Django Storage Backend.
- Use Cloudflare Wrangler
Documentation pages are proxied by NGINX to Wrangler, who executes a JavaScript worker to fetch the response from El Proxito and injects HTML tags (for addons) based on HTTP headers.
- Search enabled by default
Elasticsearch is properly configured and enabled by default. All the documentation indexes are updated after a build is finished.
Development guides
These are guides to aid local development and common development procedures.
gVisor installation
You can mostly get by just following installation instructions in the gVisor Docker Quick Start guide.
There are a few caveats to installation, which likely depend on your local
environment. For systemd
based OS, you do need to configure the Docker
daemon to avoid systemd cgroups.
Follow the installation and quick start directions like normal:
% yay -S gvisor-bin
...
% sudo runsc install
You do need to instruct Docker to avoid systemd cgroups. You will need
to make further changes to /etc/docker/daemon.json
and restart the
Docker service:
{
"runtimes": {
"runsc": {
"path": "/usr/bin/runsc"
}
},
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
Install docker from their repositories, the one included in Fedora doesn’t work, using their convenience script is an easy way to do it.
Install gvisor manually, the one included in Fedora doesn’t work.
Enable cgroups v1:
% sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
Docker is correctly configured when you can run this command from the quick start guide:
% docker run --rm -ti --runtime=runsc readthedocs/build dmesg
[ 0.000000] Starting gVisor...
...
Testing gVisor
You can enable the gVisor feature flag on a project and you should see the
container created with runtime=runsc
now.
Designing Read the Docs
So you’re thinking of contributing some of your time and design skills to Read the Docs? That’s awesome. This document will lead you through a few features available to ease the process of working with Read the Doc’s CSS and static assets.
To start, you should follow the Development installation instructions to get a working copy of the Read the Docs repository locally.
Style catalog
Once you have RTD running locally, you can open http://localhost:8000/style-catalog/
for a quick overview of the currently available styles.

This way you can quickly get started writing HTML – or if you’re modifying existing styles you can get a quick idea of how things will change site-wide.
Readthedocs.org changes
Styles for the primary RTD site are located in media/css
directory.
These styles only affect the primary site – not any of the generated documentation using the default RTD style.
Contributing
Contributions should follow the Contributing to Read the Docs guidelines where applicable – ideally you’ll create a pull request against the Read the Docs GitHub project from your forked repo and include a brief description of what you added / removed / changed, as well as an attached image (you can just take a screenshot and drop it into the PR creation form) of the effects of your changes.
There’s not a hard browser range, but your design changes should work reasonably well across all major browsers, IE8+ – that’s not to say it needs to be pixel-perfect in older browsers! Just avoid making changes that render older browsers utterly unusable (or provide a sane fallback).
Brand guidelines
Find our branding guidelines in our guidelines documentation: https://read-the-docs-guidelines.readthedocs-hosted.com.
Building and contributing to documentation
As one might expect,
the documentation for Read the Docs is built using Sphinx and hosted on Read the Docs.
The docs are kept in the docs/
directory at the top of the source tree,
and are divided into developer and user-facing documentation.
Contributing through the Github UI
If you’re making small changes to the documentation, you can verify those changes through the documentation generated when you open a PR and can be accessed using the Github UI.
click the checkmark next to your commit and it will expand to have multiple options
click the “details” link next to the “docs/readthedocs.org:docs” item
navigate to the section of the documentation you worked on to verify your changes
Contributing from your local machine
If you’re making large changes to the documentation, you may want to verify those changes locally before pushing upstream.
clone the
readthedocs.org
repository:$ git clone --recurse-submodules https://github.com/readthedocs/readthedocs.org/
create a virtual environment with Python 3.8 (preferably the latest release, 3.8.12 at the time of writing), activate it, and upgrade pip:
$ cd readthedocs.org $ python3.8 -m venv .venv $ source .venv/bin/activate (.venv) $ python -m pip install -U pip
install documentation requirements
(.venv) $ pip install -r requirements/testing.txt (.venv) $ pip install -r requirements/docs.txt
build the documents
To build the user-facing documentation:
(.venv) $ cd docs (.venv) $ make livehtml
To build the developer documentation:
(.venv) $ cd docs (.venv) $ RTD_DOCSET=dev make livehtml
the documents will be available at http://127.0.0.1:4444/ and will rebuild each time you edit and save a file.
Documentation style guide
This document will serve as the canonical place to define how we write documentation at Read the Docs. The goal is to have a shared understanding of how things are done, and document the conventions that we follow.
Let us know if you have any questions or something isn’t clear.
The brand
We are called Read the Docs.
The the
is not capitalized.
We do however use the acronym RTD.
Titles
For page titles we use sentence case. This means only proper nouns and the first word are capitalized:
# Good ✅
How we handle support on Read the Docs.
# Bad 🔴
How we Handle Support on Read the Docs
If the page includes multiple sub-headings (H2, H3), we use sentence case there as well.
Content
Use
:menuselection:
when referring to an item or sequence of items in navigation.Use
:guilabel:
when referring to a visual element on the screen - such as a button, drop down or input field.Use
**bold text**
when referring to a non-interactive text element, such as a header.Do not break the content across multiple lines at 80 characters, but rather break them on semantic meaning (e.g. periods or commas). Read more about this here.
If you are cross-referencing to a different page within our website, use the
doc
role and not a hyperlink.If you are cross-referencing to a section within our website, use the
ref
role with the label from the autosectionlabel extension.Use
<abstract concept>
and<variable>
as placeholders in code and URLs. For instance:https://<slug>.readthedocs.io
:guilabel:`<your username>` dropdown
Make sure that all bullet list items end with a period, and don’t mix periods with no periods.
Word list
We have a specific way that we write common words:
build command
is the name of each step in the file. We try to avoid confusion with pipelines, jobs and steps from other CIs, as we do not have a multi-dimentional build sequence.build job
is the name of the general and pre-defined steps that can be overridden. They are similar to “steps” in pipelines, but on Read the Docs they are pre-defined. So it’s important to have a unique name.Git
should be upper case. Except when referring to the git command, then it should be written as:program:`git`
.Git repository
for the place that stores Git repos. We used to useVCS
, but this is deprecated.Git provider
for generic references to GitHub/Bitbucket/GitLab/Gitea etc. We avoid “host” and “platform” because they are slightly more ambiguous.how to
do the thing is explained in ahow-to guide
(notice hyphen and spaces).lifecycle
is spelled without hyphen nor space.open source
should be lower case, unless you are definitely referring toOSI's Open Source Definition
..readthedocs.yaml
is the general name of the build configuration file. Even though we allow custom paths to the config file, we only validate.readthedocs.yaml
as the file name. Older variations of the name are considered legacy. We do not refer to it in general terms.
Substitutions
The following substitutions are used in our documentation to guarantee consistency and make it easy to apply future changes.
|org_brand|
is used for mentioning of .org: Example: Read the Docs Community|com_brand|
is used for mentioning of .com. Example: Read the Docs for Business|git_providers_and|
is used to mention currently support Git providers with “and”. Example: GitHub, Bitbucket, and GitLab|git_providers_or|
is used to mention currently support Git providers with “or”. Example: GitHub, Bitbucket, or GitLab
Glossary
Since the above Word List is for internal reference,
we also maintain a Glossary with terms that have canonical definitions in our docs.
Terms that can otherwise have multiple definitions
or have a particular meaning in Read the Docs context
should always be added to the Glossary and referenced using the :term:
role.
Using a glossary helps us (authors) to have consistent definitions but even more importantly, it helps and includes readers by giving them quick and easy access to terms that they may be unfamiliar with.
Use an external link or Intersphinx reference when a term is clearly defined elsewhere.
Cross-references
Cross-references are great to have as inline links. Because of sphinx-hoverxref, inline links also have a nice tooltip displayed.
We like to cross-reference other articles with a definition list inside a seealso::
admonition box.
It looks like this:
.. seealso::
:doc:`/other/documentation/article`
You can learn more about <concept> in this (how-to/description/section/article)
Differentiating .org and .com
When there are differences on .org and .com,
you can use a note::
admonition box with a definition list.
Notice the use of substitutions in the example:
.. note::
|org_brand|
You need to be *maintainer* of a subproject in order to choose it from your main project.
|com_brand|
You need to have *admin access* to the subproject in order to choose it from your main project.
If the contents aren’t suitable for a note::
, you can also use tabs::
.
We are using sphinx-tabs,
however since sphinx-design also provides tabs,
it should be noted that we don’t use that feature of sphinx-design.
Headlines
Sphinx is very relaxed about how headlines are applied and will digest different notations. We try to stick to the following:
Header 1
========
Header 2
--------
Header 3
~~~~~~~~
Header 4
^^^^^^^^
In the above, Header 1
is the title of the article.
Diátaxis Framework
We apply the methodology and concepts of the Diátaxis Framework. This means that both content and navigation path for all sections should fit a single category of the 4 Diátaxis categories:
Tutorial
Explanation
How-to
Reference
See also
- https://diataxis.fr/
The official website of Diátaxis is the main resource. It’s best to check this out before guessing what the 4 categories mean.
Warning
Avoid minimal changes
If your change has a high coherence with another proposed or planned change, propose the changes in the same PR.
By multi-tasking on several articles about the same topic, such as an explanation and a how-to, you can easily design your content to end up in the right place Diátaxis-wise. This is great for the author and the reviewers and it saves coordination work.
Minimal or isolated changes generally raise more questions and concerns than changes that seek to address a larger perspective.
Explanation
Title convention: Use words indicating explanation in the title. Like Understanding <subject>, Dive into <subject>, Introduction to <subject> etc.
Introduce the scope in the first paragraph: “This article introduces …”. Write this as the very first thing, then re-read it and potentially shorten it later in your writing process.
Cross-reference the related How-to Guide. Put a
seealso::
somewhere visible. It should likely be placed right after the introduction, and if the article is very short, maybe at the bottom.Consider adding an Examples section.
Can you add screenshots or diagrams?
How-to guides
Title should begin with “How to …”. If the how-to guide is specific for a tool, make sure to note it in the title.
Navigation titles should not contain the “How to” part. Navigation title for “How to create a thing” is Creating a thing.
Introduce the scope: “In this guide, we will…”
Introduction paragraph suggestions:
“This guide shows <something>. <motivation>”
“<motivation>. This guide shows you how.”
Cross-reference related explanation. Put a
seealso::
somewhere visible, It should likely be placed right after the introduction and if the article is very short, maybe at the bottom.Try to avoid a “trivial” how-to, i.e. a step-by-step guide that just states what is on a page without further information. You can ask questions like:
Can this how-to contain recommendations and practical advice without breaking the how-to format?
Can this how-to be expanded with relevant troubleshooting?
Worst-case: Is this how-to describing a task that’s so trivial and self-evident that we might as well remove it?
Consider if an animation can be embedded: Here is an article about ‘gif-to-video’
Reference
We have not started organizing the Reference section yet, guidelines pending.
Tutorial
Note
We don’t really have tutorials targeted in the systematic refactor, so this checklist isn’t very important right now.
“Getting started with <subject>” is likely a good start!
Cross-reference related explanation and how-to.
Try not to explain things too much, and instead link to the explanation content.
Refactor other resources so you can use references instead of disturbing the flow of the tutorial.
Front-end development
Background
Note
This information is for the current dashboard templates and JavaScript source files and will soon be replaced by the new dashboard templates. This information will soon be mostly out of date.
Our modern front end development stack includes the following tools:
And soon, LESS
We use the following libraries:
Previously, JavaScript development has been done in monolithic files or inside templates. jQuery was added as a global object via an include in the base template to an external source. There are no standards currently to JavaScript libraries, this aims to solve that.
The requirements for modernizing our front end code are:
Code should be modular and testable. One-off chunks of JavaScript in templates or in large monolithic files are not easily testable. We currently have no JavaScript tests.
Reduce code duplication.
Easy JavaScript dependency management.
Modularizing code with Browserify is a good first step. In this development workflow, major dependencies commonly used across JavaScript includes are installed with Bower for testing, and vendorized as standalone libraries via Gulp and Browserify. This way, we can easily test our JavaScript libraries against jQuery/etc, and have the flexibility of modularizing our code. See JavaScript Bundles for more information on what and how we are bundling.
To ease deployment and contributions, bundled JavaScript is checked into the repository for now. This ensures new contributors don’t need an additional front end stack just for making changes to our Python code base. In the future, this may change, so that assets are compiled before deployment, however as our front end assets are in a state of flux, it’s easier to keep absolute sources checked in.
Getting started
You will need to follow our guide to install a development Read the Docs instance first.
The sources for our bundles are found in the per-application path
static-src
, which has the same directory structure as static
. Files in
static-src
are compiled to static
for static file collection in Django.
Don’t edit files in static
directly, unless you are sure there isn’t a
source file that will compile over your changes.
To compile your changes and make them available in the application you need to run:
inv docker.buildassets
Once you are happy with your changes,
make sure to check in both files under static
and static-src
,
and commit those.
Making changes
If you are creating a new library, or a new library entry point, make sure to
define the application source file in gulpfile.js
, this is not handled
automatically right now.
If you are bringing in a new vendor library, make sure to define the bundles you
are going to create in gulpfile.js
as well.
Tests should be included per-application, in a path called tests
, under the
static-src/js
path you are working in. Currently, we still need a test
runner that accumulates these files.
Deployment
If merging several branches with JavaScript changes, it’s important to do a final post-merge bundle. Follow the steps above to rebundle the libraries, and check in any changed libraries.
JavaScript bundles
There are several components to our bundling scheme:
- Vendor library
We repackage these using Browserify, Bower, and Debowerify to make these libraries available by a
require
statement. Vendor libraries are packaged separately from our JavaScript libraries, because we use the vendor libraries in multiple locations. Libraries bundled this way with Browserify are available to our libraries viarequire
and will back down to finding the object on the globalwindow
scope.Vendor libraries should only include libraries we are commonly reusing. This currently includes
jQuery
andKnockout
. These modules will be excluded from libraries by special includes in ourgulpfile.js
.- Minor third party libraries
These libraries are maybe used in one or two locations. They are installed via Bower and included in the output library file. Because we aren’t reusing them commonly, they don’t require a separate bundle or separate include. Examples here would include jQuery plugins used on one off forms, such as jQuery Payments.
- Our libraries
These libraries are bundled up excluding vendor libraries ignored by rules in our
gulpfile.js
. These files should be organized by function and can be split up into multiple files per application.Entry points to libraries must be defined in
gulpfile.js
for now. We don’t have a defined directory structure that would make it easy to imply the entry point to an application library.
Internationalization
This document covers the details regarding internationalization and localization that are applied in Read the Docs. The guidelines described are mostly based on Kitsune’s localization documentation.
As with most of the Django applications out there, Read the Docs’ i18n/l10n framework is based on GNU gettext. Crowd-sourced localization is optionally available at Transifex.
For more information about the general ideas, look at this document: http://www.gnu.org/software/gettext/manual/html_node/Concepts.html
Making strings localizable
Making strings in templates localizable is exceptionally easy. Making strings
in Python localizable is a little more complicated. The short answer, though,
is to just wrap the string in _()
.
Interpolation
A string is often a combination of a fixed string and something changing, for
example, Welcome, James
is a combination of the fixed part Welcome,
,
and the changing part James
. The naive solution is to localize the first
part and then follow it with the name:
_('Welcome, ') + username
This is wrong!
In some locales, the word order may be different. Use Python string formatting to interpolate the changing part into the string:
_('Welcome, {name}').format(name=username)
Python gives you a lot of ways to interpolate strings. The best way is to use Py3k formatting and kwargs. That’s the clearest for localizers.
Localization comments
Sometimes, it can help localizers to describe where a string comes from,
particularly if it can be difficult to find in the interface, or is not very
self-descriptive (e.g. very short strings). If you immediately precede the
string with a comment that starts with Translators:
, the comment will be
added to the PO file, and visible to localizers.
Example:
DEFAULT_THEME_CHOICES = (
# Translators: This is a name of a Sphinx theme.
(THEME_DEFAULT, _('Default')),
# Translators: This is a name of a Sphinx theme.
(THEME_SPHINX, _('Sphinx Docs')),
# Translators: This is a name of a Sphinx theme.
(THEME_TRADITIONAL, _('Traditional')),
# Translators: This is a name of a Sphinx theme.
(THEME_NATURE, _('Nature')),
# Translators: This is a name of a Sphinx theme.
(THEME_HAIKU, _('Haiku')),
)
Adding context with msgctxt
Strings may be the same in English, but different in other languages. English, for example, has no grammatical gender, and sometimes the noun and verb forms of a word are identical.
To make it possible to localize these correctly, we can add “context” (known in
gettext as msgctxt) to differentiate two otherwise identical strings. Django
provides a pgettext()
function for this.
For example, the string Search may be a noun or a verb in English. In a heading, it may be considered a noun, but on a button, it may be a verb. It’s appropriate to add a context (like button) to one of them.
Generally, we should only add context if we are sure the strings aren’t used in the same way, or if localizers ask us to.
Example:
from django.utils.translation import pgettext
month = pgettext("text for the search button on the form", "Search")
Plurals
You have 1 new messages grates on discerning ears. Fortunately, gettext gives
us a way to fix that in English and other locales, the
ngettext()
function:
ngettext('singular sentence', 'plural sentence', count)
A more realistic example might be:
ngettext('Found {count} result.',
'Found {count} results',
len(results)).format(count=len(results))
This method takes three arguments because English only needs three, i.e., zero is considered “plural” for English. Other languages may have different plural rules, and require different phrases for, say 0, 1, 2-3, 4-10, >10. That’s absolutely fine, and gettext makes it possible.
Strings in templates
When putting new text into a template, all you need to do is wrap it in a
{% trans %}
template tag:
<h1>{% trans "Heading" %}</h1>
Context can be added, too:
<h1>{% trans "Heading" context "section name" %}</h1>
Comments for translators need to precede the internationalized text and must
start with the Translators:
keyword.:
{# Translators: This heading is displayed in the user's profile page #}
<h1>{% trans "Heading" %}</h1>
To interpolate, you need to use the alternative and more verbose {%
blocktrans %}
template tag — it’s actually a block:
{% blocktrans %}Welcome, {{ name }}!{% endblocktrans %}
Note that the {{ name }}
variable needs to exist in the template context.
In some situations, it’s desirable to evaluate template expressions such as
filters or accessing object attributes. You can’t do that within the {%
blocktrans %}
block, so you need to bind the expression to a local variable
first:
{% blocktrans trimmed with revision.created_date|timesince as timesince %}
{{ revision }} {{ timesince }} ago
{% endblocktrans %}
{% blocktrans with project.name as name %}Delete {{ name }}?{% endblocktrans %}
{% blocktrans %}
also provides pluralization. For that you need to bind a
counter with the name count
and provide a plural translation after the {%
plural %}
tag:
{% blocktrans trimmed with amount=article.price count years=i.length %}
That will cost $ {{ amount }} per year.
{% plural %}
That will cost $ {{ amount }} per {{ years }} years.
{% endblocktrans %}
Note
The previous multi-lines examples also use the trimmed
option.
This removes newline characters and replaces any whitespace at the beginning and end of a line,
helping translators when translating these strings.
Strings in Python
Note
Whenever you are adding a string in Python, ask yourself if it really needs to be there, or if it should be in the template. Keep logic and presentation separate!
Strings in Python are more complex for two reasons:
We need to make sure we’re always using Unicode strings and the Unicode-friendly versions of the functions.
If you use the
gettext()
function in the wrong place, the string may end up in the wrong locale!
Here’s how you might localize a string in a view:
from django.utils.translation import gettext as _
def my_view(request):
if request.user.is_superuser:
msg = _(u'Oh hi, staff!')
else:
msg = _(u'You are not staff!')
Interpolation is done through normal Python string formatting:
msg = _(u'Oh, hi, {user}').format(user=request.user.username)
Context information can be supplied by using the
pgettext()
function:
msg = pgettext('the context', 'Search')
Translator comments are normal one-line Python comments:
# Translators: A message to users.
msg = _(u'Oh, hi there!')
If you need to use plurals, import the
ungettext()
function:
from django.utils.translation import ungettext
n = len(results)
msg = ungettext('Found {0} result', 'Found {0} results', n).format(n)
Lazily translated strings
You can use gettext()
or
ungettext()
only in views or functions called
from views. If the function will be evaluated when the module is loaded, then
the string may end up in English or the locale of the last request!
Examples include strings in module-level code, arguments to functions in class
definitions, strings in functions called from outside the context of a view. To
internationalize these strings, you need to use the _lazy
versions of the
above methods, gettext_lazy()
and
ungettext_lazy()
. The result doesn’t get
translated until it is evaluated as a string, for example by being output or
passed to unicode()
:
from django.utils.translation import gettext_lazy as _
class UserProfileForm(forms.ModelForm):
first_name = CharField(label=_('First name'), required=False)
last_name = CharField(label=_('Last name'), required=False)
In case you want to provide context to a lazily-evaluated gettext string, you
will need to use pgettext_lazy()
.
Administrative tasks
Updating localization files
To update the translation source files (eg if you changed or added translatable
strings in the templates or Python code) you should run python manage.py
makemessages -l <language>
in the project’s root directory (substitute
<language>
with a valid language code).
The updated files can now be localized in a PO editor or crowd-sourced online translation tool.
Compiling to MO
Gettext doesn’t parse any text files, it reads a binary format for faster
performance. To compile the latest PO files in the repository, Django provides
the compilemessages
management command. For example, to compile all the
available localizations, just run:
python manage.py compilemessages -a
You will need to do this every time you want to push updated translations to the live site.
Also, note that it’s not a good idea to track MO files in version control,
since they would need to be updated at the same pace PO files are updated, so
it’s silly and not worth it. They are ignored by .gitignore
, but please
make sure you don’t forcibly add them to the repository.
Transifex integration
To push updated translation source files to Transifex, run tx
push -s
(for English) or tx push -t <language>
(for non-English).
To pull changes from Transifex, run tx pull -a
. Note that Transifex does
not compile the translation files, so you have to do this after the pull (see
the Compiling to MO section).
For more information about the tx
command, read the Transifex client’s
help pages.
Note
For the Read the Docs community site, we use Invoke with a tasks.py file to follow this process:
Update files and push sources (English) to Transifex:
invoke l10n.push
Pull the updated translations from Transifex:
invoke l10n.pull
Database migrations
We use Django migrations to manage database schema changes, and the django-safemigrate package to ensure that migrations are run in a given order to avoid downtime.
To make sure that migrations don’t cause downtime, the following rules should be followed for each case.
Adding a new field
When adding a new field to a model, it should be nullable. This way, the database can be migrated without downtime, and the field can be populated later. Don’t forget to make the field non-nullable in a separate migration after the data has been populated. You can achieve this by following these steps:
Set the new field as
null=True
andblank=True
in the model.class MyModel(models.Model): new_field = models.CharField( max_length=100, null=True, blank=True, default="default" )
Make sure that the field is always populated with a proper value in the new code, and the code handles the case where the field is null.
if my_model.new_field in [None, "default"]: pass # If it's a boolean field, make sure that the null option is removed from the form. class MyModelForm(forms.ModelForm): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.fields["new_field"].widget = forms.CheckboxInput() self.fields["new_field"].empty_value = False
Create the migration file (let’s call this migration
app 0001
), and mark it asSafe.before_deploy
.from django.db import migrations, models from django_safemigrate import Safe class Migration(migrations.Migration): safe = Safe.before_deploy
Create a data migration to populate all null values of the new field with a proper value (let’s call this migration
app 0002
), and mark it asSafe.after_deploy
.from django.db import migrations def migrate(apps, schema_editor): MyModel = apps.get_model("app", "MyModel") MyModel.objects.filter(new_field=None).update(new_field="default") class Migration(migrations.Migration): safe = Safe.after_deploy operations = [ migrations.RunPython(migrate), ]
After the deploy has been completed, create a new migration to set the field as non-nullable (let’s call this migration
app 0003
). Run this migration on a new deploy, you can mark it asSafe.before_deploy
orSafe.always
.Remove any handling of the null case from the code.
At the end, the deploy should look like this:
Deploy web-extra.
Run
django-admin safemigrate
to run the migrationapp 0001
.Deploy the webs
Run
django-admin migrate
to run the migrationapp 0002
.Create a new migration to set the field as non-nullable, and apply it on the next deploy.
Removing a field
When removing a field from a model, all usages of the field should be removed from the code before the field is removed from the model, and the field should be nullable. You can achieve this by following these steps:
Remove all usages of the field from the code.
Set the field as
null=True
andblank=True
in the model.class MyModel(models.Model): field_to_delete = models.CharField(max_length=100, null=True, blank=True)
Create the migration file (let’s call this migration
app 0001
), and mark it asSafe.before_deploy
.from django.db import migrations, models from django_safemigrate import Safe class Migration(migrations.Migration): safe = Safe.before_deploy
Create a migration to remove the field from the database (let’s call this migration
app 0002
), and mark it asSafe.after_deploy
.from django.db import migrations, models from django_safemigrate import Safe class Migration(migrations.Migration): safe = Safe.after_deploy
At the end, the deploy should look like this:
Deploy web-extra.
Run
django-admin safemigrate
to run the migrationapp 0001
.Deploy the webs
Run
django-admin migrate
to run the migrationapp 0002
.
Server side search
Read the Docs uses Elasticsearch instead of the built in Sphinx search for providing better search results. Documents are indexed in the Elasticsearch index and the search is made through the API. All the Search Code is open source and lives in the GitHub Repository. Currently we are using Elasticsearch 6.3.
Local development configuration
Elasticsearch is installed and run as part of the development installation guide.
Indexing into Elasticsearch
For using search, you need to index data to the Elasticsearch Index. Run reindex_elasticsearch
management command:
inv docker.manage reindex_elasticsearch
For performance optimization, we implemented our own version of management command rather than the built in management command provided by the django-elasticsearch-dsl package.
Auto indexing
By default, Auto Indexing is turned off in development mode. To turn it on, change the
ELASTICSEARCH_DSL_AUTOSYNC
settings to True
in the readthedocs/settings/dev.py
file.
After that, whenever a documentation successfully builds, or project gets added,
the search index will update automatically.
Architecture
The search architecture is divided into 2 parts.
One part is responsible for indexing the documents and projects (
documents.py
)The other part is responsible for querying the Index to show the proper results to users (
faceted_search.py
)
We use the django-elasticsearch-dsl package for our Document abstraction. django-elasticsearch-dsl is a wrapper around elasticsearch-dsl for easy configuration with Django.
Indexing
All the Sphinx documents are indexed into Elasticsearch after the build is successful. Currently, we do not index MkDocs documents to elasticsearch, but any kind of help is welcome.
Troubleshooting
If you get an error like:
RequestError(400, 'search_phase_execution_exception', 'failed to create query: ...
You can fix this by deleting the page index and re-indexing:
inv docker.manage 'search_index --delete'
inv docker.manage 'reindex_elasticsearch --queue web'
How we index documentations
After any build is successfully finished, HTMLFile
objects are created for each of the
HTML
files and the old version’s HTMLFile
object is deleted. By default,
django-elasticsearch-dsl package listens to the post_create
/post_delete
signals
to index/delete documents, but it has performance drawbacks as it send HTTP request whenever
any HTMLFile
objects is created or deleted. To optimize the performance, bulk_post_create
and bulk_post_delete
signals are dispatched with list of HTMLFIle
objects so its possible
to bulk index documents in elasticsearch ( bulk_post_create
signal is dispatched for created
and bulk_post_delete
is dispatched for deleted objects). Both of the signals are dispatched
with the list of the instances of HTMLFile
in instance_list
parameter.
We listen to the bulk_post_create
and bulk_post_delete
signals in our Search
application
and index/delete the documentation content from the HTMLFile
instances.
How we index projects
We also index project information in our search index so that the user can search for projects
from the main site. We listen to the post_create
and post_delete
signals of
Project
model and index/delete into Elasticsearch accordingly.
Elasticsearch document
elasticsearch-dsl provides a model-like wrapper for the Elasticsearch document.
As per requirements of django-elasticsearch-dsl, it is stored in the
readthedocs/search/documents.py
file.
ProjectDocument: It is used for indexing projects. Signal listener of django-elasticsearch-dsl listens to the
post_save
signal ofProject
model and then index/delete into Elasticsearch.PageDocument: It is used for indexing documentation of projects. As mentioned above, our
Search
app listens to thebulk_post_create
andbulk_post_delete
signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in thereadthedocs/search/signals.py
file. Both of the signals are dispatched after a successful documentation build.The fields and ES Datatypes are specified in the
PageDocument
. The indexable data is taken fromprocessed_json
property ofHTMLFile
. This property provides python dictionary with document data liketitle
,sections
,path
etc.
Server side search integration
Read the Docs provides server side search (SSS) in replace of the default search engine of your site. To accomplish this, Read the Docs parses the content directly from your HTML pages [*].
If you are the author of a theme or a static site generator you can read this document, and follow some conventions in order to improve the integration of SSS with your theme/site.
Indexing
The content of the page is parsed into sections, in general, the indexing process happens in three steps:
Identify the main content node.
Remove any irrelevant content from the main node.
Parse all sections inside the main node.
Read the Docs makes use of ARIA roles and other heuristics in order to process the content.
Tip
Following the ARIA conventions will also improve the accessibility of your site. See also https://webaim.org/techniques/semanticstructure/.
Main content node
The main content should be inside a <main>
tag or an element with role=main
,
and there should only be one per page.
This node is the one that contains all the page content to be indexed. Example:
<html>
<head>
...
</head>
<body>
<div>
This content isn't processed
</div>
<div role="main">
All content inside the main node is processed
</div>
<footer>
This content isn't processed
</footer>
</body>
</html>
If a main node isn’t found,
we try to infer the main node from the parent of the first section with a h1
tag.
Example:
<html>
<head>
...
</head>
<body>
<div>
This content isn't processed
</div>
<div id="parent">
<h1>First title</h1>
<p>
The parent of the h1 title will
be taken as the main node,
this is the div tag.
</p>
<h2>Second title</h2>
<p>More content</p>
</div>
</body>
</html>
If a section title isn’t found, we default to the body
tag.
Example:
<html>
<head>
...
</head>
<body>
<p>Content</p>
</body>
</html>
Irrelevant content
If you have content inside the main node that isn’t relevant to the page (like navigation items, menus, or search box), make sure to use the correct role or tag for it.
Roles to be ignored:
navigation
search
Tags to be ignored:
nav
Special rules that are derived from specific documentation tools applied in the generic parser:
.linenos
,.lineno
(line numbers in code-blocks, comes from both MkDocs and Sphinx).headerlink
(added by Sphinx to links in headers).toctree-wrapper
(added by Sphinx to the table of contents generated from thetoctree
directive)
Example:
<div role="main">
...
<nav role="navigation">
...
</nav>
...
</div>
Sections
Sections are stored in a dictionary composed of an id
, title
and content
key.
Sections are defined as:
h1-h7
, all content between one heading level and the next header on the same level is used as content for that section.dt
elements with anid
attribute, we map thetitle
to thedt
element and the content to thedd
element.
All sections have to be identified by a DOM container’s id
attribute,
which will be used to link to the section.
How the id is detected varies with the type of element:
h1-h7
elements use theid
attribute of the header itself if present, or itssection
parent (if exists).dt
elements use theid
attribute of thedt
element.
To avoid duplication and ambiguous section references,
all indexed dl
elements are removed from the DOM before indexing of other sections happen.
Here is an example of how all content below the title, until a new section is found, will be indexed as part of the section content:
<div role="main">
<h1 id="section-title">
Section title
</h1>
<p>
Content to be indexed
</p>
<ul>
<li>This is also part of the section and will be indexed as well</li>
</ul>
<h2 id="2">
This is the start of a new section
</h2>
<p>
...
</p>
...
<header>
<h1 id="3">This is also a valid section title</h1>
</header>
<p>
Thi is the content of the third section.
</p>
</div>
Sections can be contained in up to two nested tags, and can contain other sections (nested sections). Note that the section content still needs to be below the section title. Example:
<div role="main">
<div class="section">
<h1 id="section-title">
Section title
</h1>
<p>
Content to be indexed
</p>
<ul>
<li>This is also part of the section</li>
</ul>
<div class="section">
<div id="nested-section">
<h2>
This is the start of a sub-section
</h2>
<p>
With the h tag within two levels
</p>
</div>
</div>
</div>
</div>
Note
The title of the first section will be the title of the page,
falling back to the title
tag.
Other special nodes
Anchors: If the title of your section contains an anchor, wrap it in a
headerlink
class, so it won’t be indexed as part of the title.
<h2>
Section title
<a class="headerlink" title="Permalink to this headline">¶</a>
</h2>
Code blocks: If a code block contains line numbers, wrap them in a
linenos
orlineno
class, so they won’t be indexed as part of the code.
<table class="highlighttable">
<tr>
<td class="linenos">
<div class="linenodiv">
<pre>1 2 3</pre>
</div>
</td>
<td class="code">
<div class="highlight">
<pre>First line
Second line
Third line</pre>
</div>
</td>
</tr>
</table>
Overriding the default search
Static sites usually have their own static search index, and search results are retrieved via JavaScript. Read the Docs overrides the default search for Sphinx projects only, and provides a fallback to the original search in case of an error or no results.
Sphinx
Sphinx’s basic theme provides the static/searchtools.js file,
which initializes search with the Search.init()
method.
Read the Docs overrides the Search.query
method and makes use of Search.output.append
to add the results.
A simplified example looks like this:
var original_search = Search.query;
function search_override(query) {
var results = fetch_resuls(query);
if (results) {
for (var i = 0; i < results.length; i += 1) {
var result = process_result(results[i]);
Search.output.append(result);
}
} else {
original_search(query);
}
}
Search.query = search_override;
$(document).ready(function() {
Search.init();
});
Highlights from results will be in a span
tag with the highlighted
class
(This is a <span class="highlighted">result</span>
).
If your theme works with the search from the basic theme, it will work with Read the Docs’ SSS.
Other static site generators
All projects that have HTML pages that follow the conventions described in this document can make use of the server side search from the dashboard or by calling the API.
Supporting more themes and static site generators
All themes that follow these conventions should work as expected. If you think other generators or other conventions should be supported, or content that should be ignored or have an especial treatment, or if you found an error with our indexing, let us know in our issue tracker.
For Sphinx projects, the content of the main node is provided by an intermediate step in the build process, but the HTML components from the node are preserved.
Subscriptions
Subscriptions are available on Read the Docs for Business, we make use of Stripe to handle the payments and subscriptions. We use dj-stripe to handle the integration with Stripe.
Local testing
To test subscriptions locally, you need to have access to the Stripe account, and define the following environment variables with the keys from Stripe test mode:
RTD_STRIPE_SECRET
: https://dashboard.stripe.com/test/apikeysRTD_DJSTRIPE_WEBHOOK_SECRET
: https://dashboard.stripe.com/test/webhooks
To test the webhook locally, you need to run your local instance with ngrok, for example:
ngrok http 80
inv docker.up --http-domain xxx.ngrok.io
If this is your first time setting up subscriptions, you will to re-sync djstripe with Stripe:
inv docker.manage djstripe_sync_models
The subscription settings (RTD_PRODUCTS
) already mapped to match the Stripe prices from the test mode.
To subscribe to any plan, you can use any test card from Stripe,
for example: 4242 4242 4242 4242
(use any future date and any value for the other fields).
Modeling
Subscriptions are attached to an organization (customer), and can have multiple products attached to it. A product can have multiple prices, usually monthly and yearly.
When a user subscribes to a plan (product), they are subscribing to a price of a product, for example, the monthly price of the “Basic plan” product.
A subscription has a “main” product (RTDProduct(extra=False)
),
and can have several “extra” products (RTDProduct(extra=True)
).
For example, an organization can have a subscription with a “Basic Plan” product, and an “Extra builder” product.
Each product is mapped to a set of features (RTD_PRODUCTS
) that the user will have access to
(different prices of the same product have the same features).
If a subscription has multiple products, the features are multiplied by the quantity and added together.
For example, if a subscription has a “Basic Plan” product with a two concurrent builders,
and an “Extra builder” product with quantity three, the total number of concurrent builders the
organization has will be five.
Life cycle of a subscription
When a new organization is created, a stripe customer is created for that organization,
and this customer is subscribed to the trial product (RTD_ORG_DEFAULT_STRIPE_SUBSCRIPTION_PRICE
).
After the trial period is over, the subscription is canceled, and their organization is disabled.
During or after the trial a user can upgrade their subscription to a paid plan
(RTDProduct(listed=True)
).
Custom products
We provide 3 paid plans that users can subscribe to: Basic, Advanced and Pro. Additionally, we provide an Enterprise plan, this plan is customized for each customer, and it’s manually created by the RTD core team.
To create a custom plan, you need to create a new product in Stripe,
and add the product id to the RTD_PRODUCTS
setting mapped to the features that the plan will provide.
After that, you can create a subscription for the organization with the custom product,
our appliction will automatically relate this new product to the organization.
Extra products
We have one extra product: Extra builder.
To create a new extra product, you need to create a new product in Stripe,
and add the product id to the RTD_PRODUCTS
setting mapped to the features that the
extra product will provide, this product should have the extra
attribute set to True
.
To subscribe an organization to an extra product, you just need to add the product to its subscription with the desired quantity, our appliction will automatically relate this new product to the organization.
Interesting settings
DOCKER_LIMITS
A dictionary of limits to virtual machines. These limits include:
- time
An integer representing the total allowed time limit (in seconds) of build processes. This time limit affects the parent process to the virtual machine and will force a virtual machine to die if a build is still running after the allotted time expires.
- memory
The maximum memory allocated to the virtual machine. If this limit is hit, build processes will be automatically killed. Examples: ‘200m’ for 200MB of total memory, or ‘2g’ for 2GB of total memory.
PRODUCTION_DOMAIN
This is the domain that is used by the main application dashboard (not documentation pages).
RTD_INTERSPHINX_URL
This is the domain that is used to fetch the intersphinx inventory file.
If not set explicitly this is the PRODUCTION_DOMAIN
.
DEFAULT_PRIVACY_LEVEL
What privacy projects default to having. Generally set to public
. Also acts as a proxy setting for blocking certain historically insecure options, like serving generated artifacts directly from the media server.
PUBLIC_DOMAIN
A special domain for serving public documentation.
If set, public docs will be linked here instead of the PRODUCTION_DOMAIN
.
PUBLIC_DOMAIN_USES_HTTPS
If True
and PUBLIC_DOMAIN
is set, that domain will default to
serving public documentation over HTTPS. By default, documentation is
served over HTTP.
ALLOW_ADMIN
Whether to include django.contrib.admin
in the URL’s.
RTD_BUILD_MEDIA_STORAGE
Use this storage class to upload build artifacts to cloud storage (S3, Azure storage).
This should be a dotted path to the relevant class (eg. 'path.to.MyBuildMediaStorage'
).
Your class should mixin readthedocs.builds.storage.BuildMediaStorageMixin
.
ELASTICSEARCH_DSL
Default:
{
'default': {
'hosts': '127.0.0.1:9200'
},
}
Settings for elasticsearch connection. This settings then pass to elasticsearch-dsl-py.connections.configure
ES_INDEXES
Default:
{
'project': {
'name': 'project_index',
'settings': {'number_of_shards': 5,
'number_of_replicas': 0
}
},
'page': {
'name': 'page_index',
'settings': {
'number_of_shards': 5,
'number_of_replicas': 0,
}
},
}
Define the elasticsearch name and settings of all the index separately.
The key is the type of index, like project
or page
and the value is another
dictionary containing name
and settings
. Here the name
is the index name
and the settings
is used for configuring the particular index.
ES_TASK_CHUNK_SIZE
The maximum number of data send to each elasticsearch indexing celery task.
This has been used while running elasticsearch_reindex
management command.
ES_PAGE_IGNORE_SIGNALS
This settings is used to determine whether to index each page separately into elasticsearch.
If the setting is True
, each HTML
page will not be indexed separately but will be
indexed by bulk indexing.
ELASTICSEARCH_DSL_AUTOSYNC
This setting is used for automatically indexing objects to elasticsearch.
Docker pass-through settings
If you run a Docker environment, it is possible to pass some secrets through to the Docker containers from your host system. For security reasons, we do not commit these secrets to our repository. Instead, we individually define these settings for our local environments.
We recommend using direnv for storing local development secrets.
Allauth secrets
It is possible to set the Allauth application secrets for our supported providers using the following environment variables:
- RTD_SOCIALACCOUNT_PROVIDERS_GITHUB_CLIENT_ID
- RTD_SOCIALACCOUNT_PROVIDERS_GITHUB_SECRET
- RTD_SOCIALACCOUNT_PROVIDERS_GITLAB_CLIENT_ID
- RTD_SOCIALACCOUNT_PROVIDERS_GITLAB_SECRET
- RTD_SOCIALACCOUNT_PROVIDERS_BITBUCKET_OAUTH2_CLIENT_ID
- RTD_SOCIALACCOUNT_PROVIDERS_BITBUCKET_OAUTH2_SECRET
- RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_CLIENT_ID
- RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_SECRET
Stripe secrets
The following secrets are required to use djstripe
and our Stripe integration.
- RTD_STRIPE_SECRET
- RTD_STRIPE_PUBLISHABLE
- RTD_DJSTRIPE_WEBHOOK_SECRET
Testing
Before contributing to Read the Docs, make sure your patch passes our test suite and your code style passes our code linting suite.
Read the Docs uses Tox to execute testing and linting procedures. Tox is the only dependency you need to run linting or our test suite, the remainder of our requirements will be installed by Tox into environment specific virtualenv paths. Before testing, make sure you have Tox installed:
pip install tox
To run the full test and lint suite against your changes, simply run Tox. Tox should return without any errors. You can run Tox against all of our environments by running:
tox
By default, tox won’t run tests from search,
in order to run all test including the search tests,
you need to override tox’s posargs.
If you don’t have any additional arguments to pass,
you can also set the TOX_POSARGS
environment variable to an empty string:
TOX_POSARGS='' tox
Note
If you need to override tox’s posargs, but you still don’t want to run the search tests,
you need to include -m 'not search'
to your command:
tox -- -m 'not search' -x
To target a specific environment:
tox -e py310
To run a subset of tests:
tox -e py310 -- -k test_celery
The tox
configuration has the following environments configured. You can
target a single environment to limit the test suite:
- py310
Run our test suite using Python 3.10
- py310-debug
Same as
py310
, but there are some useful debugging tools available in the environment.- lint
Run code linting using Prospector. This currently runs pylint, pyflakes, pep8 and other linting tools.
- docs
Test documentation compilation with Sphinx.
Pytest marks
The Read the Docs code base is deployed as three instances:
Main: where you can see the dashboard.
Build: where the builds happen.
Serve/proxito: It is in charge of serving the documentation pages.
Each instance has its own settings. To make sure we test each part as close as possible to its real settings, we use pytest marks. This allow us to run each set of tests with different settings files, or skip some (like search tests):
DJANGO_SETTINGS_MODULE=custom.settings.file pytest -m mark
DJANGO_SETTINGS_MODULE=another.settings.file pytest -m "not mark"
Current marks are:
search (tests that require Elastic Search)
proxito (tests from the serve/proxito instance)
Tests without mark are from the main instance.
Continuous Integration
The RTD test suite is exercised by Circle CI on every push to our repo at GitHub. You can check out the current build status: https://app.circleci.com/pipelines/github/readthedocs/readthedocs.org