Server side search integration

Read the Docs provides server side search (SSS) in replace of the default search engine of your site. To accomplish this, Read the Docs parses the content directly from your HTML pages [*].

If you are the author of a theme or a static site generator you can read this document, and follow some conventions in order to improve the integration of SSS with your theme/site.

Indexing

The content of the page is parsed into sections, in general, the indexing process happens in three steps:

  1. Identify the main content node.

  2. Remove any irrelevant content from the main node.

  3. Parse all sections inside the main node.

Read the Docs makes use of ARIA roles and other heuristics in order to process the content.

Tip

Following the ARIA conventions will also improve the accessibility of your site. See also https://webaim.org/techniques/semanticstructure/.

Main content node

The main content should be inside a <main> tag or an element with role=main, and there should only be one per page. This node is the one that contains all the page content to be indexed. Example:

<html>
   <head>
      ...
   </head>
   <body>
      <div>
         This content isn't processed
      </div>

      <div role="main">
         All content inside the main node is processed
      </div>

      <footer>
         This content isn't processed
      </footer>
   </body>
</html>

If a main node isn’t found, we try to infer the main node from the parent of the first section with a h1 tag. Example:

<html>
   <head>
      ...
   </head>
   <body>
      <div>
         This content isn't processed
      </div>

      <div id="parent">
         <h1>First title</h1>
         <p>
            The parent of the h1 title will
            be taken as the main node,
            this is the div tag.
         </p>

         <h2>Second title</h2>
         <p>More content</p>
      </div>
   </body>
</html>

If a section title isn’t found, we default to the body tag. Example:

<html>
   <head>
      ...
   </head>
   <body>
      <p>Content</p>
   </body>
</html>

Irrelevant content

If you have content inside the main node that isn’t relevant to the page (like navigation items, menus, or search box), make sure to use the correct role or tag for it.

Roles to be ignored:

  • navigation

  • search

Tags to be ignored:

  • nav

Example:

<div role="main">
   ...
   <nav role="navigation">
      ...
   </nav>
   ...
</div>

Sections

Sections are composed of a title, and a content. A section title can be a h tag, or a header tag containing a h tag, the h tag or its parent can contain an id attribute, which will be used to link to the section.

All content below the title, until a new section is found, will be indexed as part of the section content. Example:

<div role="main">
   <h1 id="section-title">
      Section title
   </h1>
   <p>
      Content to be indexed
   </p>
   <ul>
      <li>This is also part of the section and will be indexed as well</li>
   </ul>

   <h2 id="2">
      This is the start of a new section
   </h2>
   <p>
      ...
   </p>

   ...

   <header>
      <h1 id="3">This is also a valid section title</h1>
   </header>
   <p>
      Thi is the content of the third section.
   </p>
</div>

Sections can be contained in up to two nested tags, and can contain other sections (nested sections). Note that the section content still needs to be below the section title. Example:

<div role="main">
   <div class="section">
      <h1 id="section-title">
         Section title
      </h1>
      <p>
         Content to be indexed
      </p>
      <ul>
         <li>This is also part of the section</li>
      </ul>

      <div class="section">
         <div id="nested-section">
            <h2>
               This is the start of a sub-section
            </h2>
            <p>
               With the h tag within two levels
            </p>
         </div>
      </div>
   </div>
</div>

Note

The title of the first section will be the title of the page, falling back to the title tag.

Other special nodes

  • Anchors: If the title of your section contains an anchor, wrap it in a headerlink class, so it won’t be indexed as part of the title.

<h2>
   Section title
   <a class="headerlink" title="Permalink to this headline"></a>
</h2>
  • Code blocks: If a code block contains line numbers, wrap them in a linenos or lineno class, so they won’t be indexed as part of the code.

<table class="highlighttable">
   <tr>
      <td class="linenos">
         <div class="linenodiv">
            <pre>1 2 3</pre>
         </div>
      </td>

      <td class="code">
         <div class="highlight">
            <pre>First line
Second line
Third line</pre>
         </div>
      </td>
   </tr>
</table>

Supporting more themes and static site generators

Currently, Read the Docs supports building documentation from Sphinx and MkDocs. All themes that follow these conventions should work as expected. If you think other generators or other conventions should be supported, or content that should be ignored or have an especial treatment, or if you found an error with our indexing, let us know in our issue tracker.