This document describes the current usage of
RemoteRepository objects and proposes a new normalized modeling.
De-duplicate data stored in our database.
Save only one
RemoteRepositoryper GitHub repository.
Use an intermediate table between
Userto store associated remote data for the specific user.
Make this model usable from our SSO implementation (adding
JSONFieldto store associated
Projectconnect directly to
RemoteRepositorywithout being linked to a specific
Do not disconnect
RemoteRepositorywhen a user delete/disconnects their account.
RemoteRepositoryin sync with GitHub repositories.
RemoteRepositoryobjects deleted from GitHub.
Listen to GitHub events to detect
full_namechanges and update our objects.
We may need/want some of these non-goals in the future. They are just outside the scope of this document.
When a user connect their account to a social account, we create a
allauth.socialaccount.models.SocialAccount* basic information (provider, last login, etc) * provider’s specific data saved in a JSON under
allauthsocialaccount.models.SocialToken* token to hit the API on behalf the user
We don’t create any
RemoteRepository at this point.
They are created when the user jumps into “Import Project” page and hit the circled arrows.
sync_remote_repostories task in background that updates or creates
but it does not delete them (after #7183 and #7310 got merged, they will be deleted).
RemoteRepository is created per repository the
User has access to.
In corporate, we are automatically syncing
at signup (foreground) and login (background) via a signal. We should eventually move these to community.
RemoteRepository is used?
List of available repositories to import under “Import Project”
Show a “+”, “External Arrow” or a “Lock” sign next to the element in the list * +: it’s available to be imported * External Arrow: the repository is already imported (see RemoteRepository.matches method) * Lock: user doesn’t have (admin) permissions to import this repository (uses
Avatar URL in the list of project available to import
Update webhook when user clicks “Resync webhook” from the Admin > Integrations tab
Send build status when building Pull Requests
New normalized implementation
RemoteRepository.users will be changed to be
to add extra fields in the relation that are specific only for the User.
Allows us to have only one
RemoteRepository per GitHub repository with multiple relationships to
With this modeling, we can avoid the disconnection
RemoteRepository only by removing the
All the points mentioned in the previous section may need to be adapted to use the new normalized modeling. However, it may be only field renaming or small query changes over new fields.
Use this modeling for SSO
We can get the list of
Project where a user as access:
admin_remote_repositories = RemoteRepository.objects.filter(
users__remoterelation__admin=True, # False for read-only access
Due the constraints we have in the
RemoteRepository table and its size,
we can’t just do the data migration at the same time of the deploy.
Because of this we need to be more creative here and find a way to re-sync the data from VCS providers,
while the site continue working.
To achieve this, we thought on following this steps:
1. modify all the Python code to use the new modeling in .org and .com (will help us to find out bugs locally in an easier way)
1. QA this locally with test data
1. enable Django signal to re-sync RemoteRepository on login async (we already have this in .com). New active users will have updated data immediately
1. spin up a new instance with the new refactored code
1. run migrations to create a new table for
1. re-sync everything from VCS providers into the new table for 1-week or so
Project - RemoteRepository relations
1. create a migration to use the new table with synced data
1. deploy new code once the sync is finished
See these issues for more context: * https://github.com/readthedocs/readthedocs.org/pull/7536#issuecomment-724102640 * https://github.com/readthedocs/readthedocs.org/pull/7675#issuecomment-732756118