Mirroring Github with Gitolite

This note explains how to establish a mirror of a Github repository on a locally-maintained Gitolite server.

Choose a Gitolite mirror location

Decide on a repository path, for example: github/<gh-user>/gh-repo> where gh-user is the GitHub user and gh-repo is the GitHub repository to be mirrored. That is to say that:

https://github.com/<gh-user>/<gh-repo>

maps to

git@git:github/<gh-user>/<gh-repo>

which, on the Giolite server, is a filesystem path like

~git/repositories/github/<gh-user>/<gh-repo>/

where git@git is the SSH user and hostname used to connect to Gitolite: to confirm access to Gitolite (returns a list of repositories):

$ ssh git@git info

The above scheme places all mirrored GitHub repositories into a GitHub subdirectory with a further subdirectory for each mirrored GitHub user.

Create the Gitolite Repository

Begin by creating a new, empty, repository on the Gitolite server in the usual way (by an update/commit/push of the gitolite-admin repository) by adding, for example, configuration for the new repository:

repo github/<gh-user>/<gh-repo>
desc = "Repository description [https://github.com/<gh-user>/<gh-repo>]"
owner = "My Name"
category = "GitHub"
RW+ = my_key

Now populate the new, empty, repository by doing a mirror clone followed by a mirror push (the local clone is temporary and can be deleted afterwards):

$ git clone --mirror https://github.com/<gh-user>/<gh-repo>
$ cd <gh-repo>.git
$ sed -i -e '/refs\/pull\//d' packed-refs
$ git push --mirror gitolite git@git:github/<gh-user>/<gh-repo>
$ cd ..
$ rm -rf <gh-repo>.git

The sed command is optional. It removes GitHub pull request refs from the miror before it is pushed to the local mirror. The rationale behind this optional step is discussed below, followed by a discussion about setting the default branch of the mirrored repository.

GitHub Pull Requests

GitHub repositories that have had pull requests made to them will include synthetic refs to those pull requests.

This may be problematic as described here and here.

These refs are present in the temporary clone from which the Gitolite mirror is pushed. They can be deleted from there prior to pushing - they'll either be in subdirectories of refs or entries in packed_refs. Below is a sed-script that will remove packed refs:

$ sed -i -e '/refs\/pull\//d' packed-refs

Default HEAD

If the repository being mirrored does not have a master branch then an error may occur when a client clones the mirror:

$ git clone git@git:github/<gh-user>/<gh-repo>
Cloning into ...
remote: Counting objects: 5250, done.
remote: Compressing objects: 100% (2044/2044), done.
remote: Total 5250 (delta 3167), reused 5250 (delta 3167)
Receiving objects: 100% (5250/5250), 27.34 MiB | 5.19 MiB/s, done.
Resolving deltas: 100% (3167/3167), done.
Checking connectivity... done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.

The warning happens because Git attempts to check out the current HEAD, which is set on the mirror to refs/heads/master but it lacks a master branch. The repository is still cloned but a working copy isn't checked out.

This should be a rare occurrence because repositories have a master branch by default and it is unusual to remove or rename it. If it should happen, however, there are two solutions.

The first solution is to specify the branch to check out:

$ git clone -b some_branch git@git:github/<gh-user>/<gh-repo>

The alternaive is to use a text editor to modify the HEAD file on the server so it points to the required branch, such as:

ref: refs/heads/some_branch

which should allow the usual clone to work as expected:

$ git clone git@git:github...

This problem occurs when the repository being mirrored does not have a branch called master and its HEAD is something other than master. Because HEAD is not included in the push, the bare repository's default head (master) persists and is what a cloning operation attempts (in vain) to check out.

This explains how to fix HEAD on a GitHub repository, and I asked this question to see if Git can mirror-push HEAD to the mirror.

Syncing the mirror

To sync the Gitolite mirror with GitHub, log on to the Gitolite server and perfom the following configuration:

$ cd /srv/git/repositories/github/<gh-user>/<gh-repo>
$ git remote add origin https://github.com/<gh-user>/<gh-repo>
$ git config remote.origin.fetch "+*:*"

However, this will sync all refs from the GitHub repository including any GitHub pull request refs so, if these are unwanted, more specific fetch references can be used instead: As this SO question suggests, replace the catch-all refspec with more specific ones that include all heads and tags, but not the pulls:

fetch = +refs/heads/*:refs/heads/*
fetch = +refs/tags/*:refs/tags/*
fetch = +refs/change/*:refs/change/*

which you can add from the command-line like this:

$ git config remote.origin.fetch "+refs/heads/*:refs/heads/*"
$ git config --add remote.origin.fetch "+refs/tags/*:refs/tags/*"
$ git config --add remote.origin.fetch "+refs/change/*:refs/change/*"

(note the use of --add on the second and third commands.)

Then, to sync the repo:

$ git fetch --prune

The fetch could be automated via a cron job. If you use a remote name other than origin (say, github) then it needs to be given explicitly:

$ git fetch github --prune

There appears to be no way to change the default remote from origin to something else (but see remotes.default).

Additional Notes

To review refs in a local repository:

$ git show-ref

To review refs in its remote repository:

$ git ls-remote

To clear out (empty) a Gitolite repo (so that it can be pushed again), log on to the server as the Gitolite admin user (e.g. su - git) and then:

$ rm -rf path/to/repositories/github/<gh-user>/<gh-repo>`
$ gitolite setup
Initialized empty Git repository in /srv/git/repositories/github/<gh-user>/<gh-repo>/

(This technique is based on fixing botched repos.)

A side note about backing up Git repositories and the Great KDE Disaster.

Syncing with Github. This is worth a second look on a rainy day.
Another technique is described in this gist and, come here, there's more!

This write-up is summarised as a Stack Overflow Answer.