Version control

Basic ideas

Any time multiple versions of a document exist, whether due to a document changing over time, or because multiple authors are working on it, some kind of version control is needed.

Version control allows:

  • accessing any version from the original to the most recent;
  • seeing what has changed from one version to the next;
  • giving a label of some kind to distinguish a particular version.

Examples of version control systems

The simplest method of version control is probably the most widely used in science: changing the file name.

Cartoon from "Piled Higher and Deeper" by Jorge Cham
Cartoon from "Piled Higher and Deeper" by Jorge Cham

from “Piled Higher and Deeper” by Jorge Cham www.phdcomics.com

Other examples include:
  • “track changes” in Microsoft Word
  • Time Machine in Mac OS X
  • versioning in Dropbox, Google Drive
  • formal version control systems such as CVS, Subversion, Mercurial, Git

The importance of tracking projects, not individual files

Early version control systems, such as CVS, track each file separately - each file has its own version number. The same is true of Dropbox, Microsoft Word.

This is a problem when you make changes to several files at once, and the changes in one file depend on changes in another.

In modern version control systems, and in backup-based systems such as Time Machine, entire directory trees are tracked as a unit, which means that each version corresponds to the state of an entire project at a point in time.s

Advantages of formal version control systems

  • explicit version number for each version
  • easy to switch between versions
  • easy to see changes between versions
  • tools to help merge incompatible changes

In the next sections, we will use Mercurial, one of the most commonly used, modern version control systems, to introduce the principles of version control. We will use Mercurial’s command-line interface because it is easy to use, and widely used. Following this, we will briefly discuss Git and Subversion, two other widely-used version control systems, as well as graphical user interfaces for version control.

Installing Mercurial

Mercurial is available for Linux, Mac OS X, Windows, and assorted flavours of UNIX. For Linux, it will certainly be available in your package manager. For Windows and Mac OS X, download from http://mercurial.selenic.com/wiki/Download

Once you’ve installed it, you should create a file named .hgrc in your home directory, as follows:

[ui]
username = Andrew Davison <andrew.davison@unic.cnrs-gif.fr>

Creating a repository

We start by introducing two concepts:

Working copy
the set of files that you are currently working on
Repository
a “database” containing the entire history of your project (all versions)

As an example, we will use the Brian code from this paper:

Brette R, Rudolph M, Carnevale T, Hines M, Beeman D, Bower JM, Diesmann M, Morrison A, et al. (2007) Simulation of networks of spiking neurons: A review of tools and strategies. J Comp Neurosci 23:349-98

available from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319

$ unzip destexhe_benchmarks.zip
$ cd destexhe_benchmarks
$ cp -r Brian ~/my_network_model
$ cd ~/my_network_model
$ ls
COBA.py             COBAHH.py       CUBA.py         README.txt

We’re going to take this code as the starting point for our own project, and we want to keep track of the changes we make.

The first step is to create a repository, where all the versions will be stored. This is very simple:

$ hg init

Nothing seems to have happened. In fact, the hg init command has created a new subdirectory:

$ ls -a
.           ..              .hg             COBA.py         COBAHH.py       CUBA.py         README.txt

You almost never need to care about what is in this directory: this is where Mercurial will store all the information about the repository.

Adding files to the repository

Now we need to tell Mercurial which files are part of our project:

$ hg add
ajout de COBA.py
ajout de COBAHH.py
ajout de CUBA.py
ajout de README.txt

$ hg status
A COBA.py
A COBAHH.py
A CUBA.py
A README.txt

Committing changes

These files are now queued to be added to the repository, but they are not yet there. Nothing is definitive until we make a commit (also known as a “check-in”).

$ hg commit

This pops me into a text editor where I can enter a message describing the purpose of the commit:

HG: Enter commit message.  Lines beginning with 'HG:' are removed.
HG: Leave message empty to abort commit.
HG: --
HG: user: Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
HG: branch 'default'
HG: added COBA.py
HG: added COBAHH.py
HG: added CUBA.py
HG: added README.txt

In this case, I am using vi, but you can use any editor.

Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012

HG: Enter commit message.  Lines beginning with 'HG:' are removed.
HG: Leave message empty to abort commit.
HG: --
HG: user: Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
HG: branch 'default'
HG: added COBA.py
HG: added COBAHH.py
HG: added CUBA.py
HG: added README.txt

Viewing the history of changes

The log command lists all the different versions stored in the repository. For now, of course, we have only one:

$ hg log
changeset:   0:ef57b1c87c6a
tag:         tip
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 12:33:21 2012 +0200
summary:     Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012

Now let’s run the code:

$ python COBAHH.py
Network construction time: 0.814524173737 seconds
Simulation running...
Simulation time: 45.7264661789 seconds
126014 excitatory spikes
29462 inhibitory spikes

This pops up a window with the following figure:

_images/brian_figure.png

We’d prefer to save the figure to a file for further use, rather than work with the model interactively, so let’s change the last lines of the script from:

plot(trace.times/ms,trace[1]/mV)
plot(trace.times/ms,trace[10]/mV)
plot(trace.times/ms,trace[100]/mV)
show()

to

plot(trace.times/ms,trace[1]/mV)
plot(trace.times/ms,trace[10]/mV)
plot(trace.times/ms,trace[100]/mV)
savefig("COBAHH_output.png")

Seeing what’s changed

Now if we run hg status we see:

$ hg status
M COBAHH.py

The “M” indicates that the file has been modified. To see the changes:

$ hg diff
diff -r ef57b1c87c6a COBAHH.py
--- a/COBAHH.py     Wed Jul 11 12:33:21 2012 +0200
+++ b/COBAHH.py     Wed Jul 11 15:56:05 2012 +0200
@@ -93,4 +93,4 @@
 plot(trace.times/ms,trace[1]/mV)
 plot(trace.times/ms,trace[10]/mV)
 plot(trace.times/ms,trace[100]/mV)
-show()
+savefig("COBAHH_output")

Now let’s commit the changes, and look at the log again:

$ hg commit -m 'Save figure to file'
$ hg log
changeset:   1:e323d363742a
tag:         tip
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 15:59:02 2012 +0200
summary:     Save figure to file

changeset:   0:ef57b1c87c6a
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 12:33:21 2012 +0200
summary:     Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012

Switching between versions

To switch between versions (you should not do this if you have modified any of the files - commit your changes first), use hg update:

$ hg update 0
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

This will change the files in your working copy to reflect the state they had when you committed that particular version.

Using hg summary we can see which version we are currently using:

$ hg summary
parent: 0:ef57b1c87c6a
 Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012
branch: default
commit: 1 unknown (clean)
update: 1 new changesets (update)

When specifying the version number to switch to, you can use either the short form (a decimal integer, like 0 or 1) or the hexadecimal form (like ef57b1c87c6a). The difference between these two forms is discussed below, in Collaborating with others.

With no version number, hg update switches to the most recent version:

$ hg up
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ hg sum
parent: 1:b0275b66ad2b tip
 Save figure to file
branch: default
commit: 1 unknown (clean)
update: (current)

Also note that all Mercurial commands can be abbreviated, provided the abbreviation is unambiguous.

Giving informative names to versions

Remembering the version number for a particular version of interest (for example, the version used to generate a particular figure in your manuscript) can be difficult. For this reason, the hg tag command can be used to give descriptive and memorable names to significant versions:

$ hg tag "Figure 1"

Note that this automatically makes a new commit:

$ hg log
changeset:   2:416ac8894202
tag:         tip
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 14:28:19 2012 +0200
summary:     Added tag Figure 1 for changeset b0275b66ad2b

changeset:   1:b0275b66ad2b
tag:         Figure 1
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 16:01:32 2012 +0200
summary:     Save figure to file

changeset:   0:ef57b1c87c6a
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 12:33:21 2012 +0200
summary:     Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012

You can now switch to a tagged version using the tag name:

$ hg update "Figure 1"

Recap #1

So far, we have learned how to:

  • Create a repository
  • Add files to a repository
  • Commit changes
  • Move your code-base backwards and forwards in time

These operations are so easy and so useful that there is no reason not to use them for almost any work you do as a scientist. Any time I start a new project, whether writing code or writing a paper with LaTeX, I now run hg init as soon as I’ve created a new directory for the project.

Making backups

As well as helping to keep track of different versions of a project, version control systems are hugely useful for keeping backups of your code with minimal hassle.

Making a copy of your repository is as simple as moving to the location where the backup will be, and then using the hg clone command.

$ cd /Volumes/USB_DRIVE
$ hg clone ~/my_network_model

$ cd ~/Dropbox
$ hg clone ~/my_network_model

$ ssh cluster.example.edu
(cluster)$ hg clone ssh://my_laptop.example.edu/my_network_model

You can then keep the backup in-sync with the main repository by either using hg pull in the backup location, or using hg push in your working directory:

$ cd ~/my_network_model
$ hg push /Volumes/USB_DRIVE/my_network_model
pushing to /Volumes/USB_DRIVE/my_network_model
searching for changes
aucun changement trouvé

Working on multiple computers

As an extension of the idea of backups, version control systems are excellent for keeping code in sync between multiple computers. Suppose you have a copy of your repository on your laptop, and you were working on the code in the airport.

(laptop)$ hg diff
diff -r 416ac8894202 -r 0467691f7881 CUBA.py
--- a/CUBA.py       Thu Jul 12 14:28:19 2012 +0200
+++ b/CUBA.py       Thu Jul 12 15:18:09 2012 +0200
@@ -72,4 +72,4 @@
 print Me.nspikes,"excitatory spikes"
 print Mi.nspikes,"inhibitory spikes"
 plot(M.times/ms,M.smooth_rate(2*ms,'gaussian'))
-show()
+savefig("CUBA_output.png")
(laptop)$ hg commit -m 'CUBA script now saves figure to file'

The log on your laptop now looks like this:

(laptop)$ hg log
changeset:   3:0467691f7881
tag:         tip
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 15:18:09 2012 +0200
summary:     CUBA script now saves figure to file

changeset:   2:416ac8894202
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 14:28:19 2012 +0200
summary:     Added tag Figure 1 for changeset b0275b66ad2b

changeset:   1:b0275b66ad2b
tag:         Figure 1
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 16:01:32 2012 +0200
summary:     Save figure to file

changeset:   0:ef57b1c87c6a
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 12:33:21 2012 +0200
summary:     Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012

Meanwhile, you’ve started running some simulations on a local cluster, and you’re investigating the effect of changing some parameters:

(cluster)$ hg diff
diff -r 416ac8894202 CUBA.py
--- a/CUBA.py       Thu Jul 12 14:28:19 2012 +0200
+++ b/CUBA.py       Thu Jul 12 15:19:49 2012 +0200
@@ -25,9 +25,9 @@
 import time

 start_time=time.time()
-taum=20*ms
-taue=5*ms
-taui=10*ms
+taum=15*ms
+taue=3*ms
+taui=5*ms
 Vt=-50*mV
 Vr=-60*mV
 El=-49*mV
(cluster)$ hg commit -m 'Changed time constants in CUBA model'
(cluster)$ hg log
changeset:   3:243c20657dc4
tag:         tip
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 15:20:17 2012 +0200
summary:     Changed time constants in CUBA model

changeset:   2:416ac8894202
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 14:28:19 2012 +0200
summary:     Added tag Figure 1 for changeset b0275b66ad2b

changeset:   1:b0275b66ad2b
tag:         Figure 1
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 16:01:32 2012 +0200
summary:     Save figure to file

changeset:   0:ef57b1c87c6a
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Wed Jul 11 12:33:21 2012 +0200
summary:     Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012

Now the repositories on the two machines are out of sync. The first three commits are the same on both, but the fourth is different on the two machines. Note that versions 0, 1, and 2 have the same hexadecimal version number on both machines, but that version 3 has a different hex number:

Laptop Cluster
0:ef57b1c87c6a 0:ef57b1c87c6a
1:b0275b66ad2b 1:b0275b66ad2b
2:416ac8894202 2:416ac8894202
3:0467691f7881 3:243c20657dc4

This is the reason for having both the short, integer number and the hex version: the short integer is local to each machine, while the hex number is global.

So, how do we get the two machines in sync? This can be done from either machine. Here, we’ll do it from the laptop.

(laptop)$ hg pull -u ssh://cluster.example.edu/my_network_model
pulling from ssh://cluster.example.edu/my_network_model
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files (+1 heads)
not updating: crosses branches (merge branches or update --check to force update)

Note that hg pull -u is equivalent to running hg pull followed by hg update. “Pull” pulls changes into the local repository, but does not change the working copy, i.e. it does not change your files. “Update” is the part that changes your files.

Here, the pull succeeded, but the update failed, because we made two different commits on different machines.

(laptop)$ hg merge
merging CUBA.py
0 files updated, 1 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)

Because Mercurial is clever enough to realize that we’d edited different parts of the file CUBA.py, it can automatically merge the two changes. If there was a conflict (if we’d edited the same lines on both machines), the merge would fail and we’d have to manually merge the files (see below).

Mercurial does not automatically commit after the merge, so we have the chance to check we are happy with how Mercurial has merged the files before committing.

(laptop)$ hg commit -m 'merge'

Now we can see the full history, with all changes:

(laptop)$ hg log -r5:2
changeset:   5:12fddba7aaa7
tag:         tip
parent:      3:16e621976c95
parent:      4:243c20657dc4
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 15:54:29 2012 +0200
summary:     merge

changeset:   4:243c20657dc4
parent:      2:416ac8894202
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 15:20:17 2012 +0200
summary:     Changed time constants in CUBA model

changeset:   3:16e621976c95
user:        apdavison
date:        Thu Jul 12 15:40:04 2012 +0200
summary:     CUBA script now saves figure to file

changeset:   2:416ac8894202
user:        Andrew Davison <andrew.davison@unic.cnrs-gif.fr>
date:        Thu Jul 12 14:28:19 2012 +0200
summary:     Added tag Figure 1 for changeset b0275b66ad2b

(Note that we’ve truncated the output by asking for only a subset of the commits).

To complete the sync, we now push the merged repository back to the cluster:

(laptop)$ hg push ssh://cluster.example.edu/my_network_model
pushing to ../my_network_model
searching for changes
adding changesets
adding manifests
adding file changes
added 2 changesets with 2 changes to 1 files

Collaborating with others

Using version control systems to collaborate with others is essentially no different to working solo on multiple machines, except that you perhaps have less knowledge of exactly what changes have been made by others.

Suppose my colleague Barbara has also been working on the same code: she cloned my repository at version 0, and since then has been working independently. I’m a little wary of pulling in her changes, so first I can take a look at what she’s changed:

$ hg incoming /Users/barbara/our_network_model
comparaison avec /Users/barbara/our_network_model
searching for changes
changeset:   1:40f575c2c5a4
user:        Barbara Bara <barbara@example.com>
date:        Thu Jul 12 16:16:58 2012 +0200
summary:     Changed some parameters in CUBA.py, and saved figure to postscript

changeset:   2:2024998fd5ec
tag:         tip
user:        Barbara Bara <barbara@example.com>
date:        Thu Jul 12 16:17:58 2012 +0200
summary:     Save COBAHH figure to postscript

Looks like there may be some problems, since I’ve also changed parameters in that file, and I’m saving figures to PNG format. Oh, well, deep breath, let’s plunge in:

$ hg pull -u /Users/barbara/our_network_model
pulling from /Users/barbara/our_network_model
searching for changes
adding changesets
adding manifests
adding file changes
added 2 changesets with 2 changes to 2 files (+1 heads)
not updating: crosses branches (merge branches or update --check to force update)
$ hg merge
merging COBAHH.py
warning: conflicts during merge.
merging COBAHH.py failed!
merging CUBA.py
warning: conflicts during merge.
merging CUBA.py failed!
0 files updated, 0 files merged, 0 files removed, 2 files unresolved
use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon

Unlike last time, when our changes were in different parts of the file, and so could be merged automatically, here Barbara has changed some of the same lines as me, and Mercurial can’t choose which changes to keep.

If we now look at CUBA.py, we can see the conflicts marked with <<<<<<< and >>>>>>>:

...
from brian import *
import time

start_time=time.time()
<<<<<<< local
taum=15*ms
taue=3*ms
taui=5*ms
=======
taum=25*ms
taue=5*ms
taui=10*ms
>>>>>>> other
Vt=-50*mV
Vr=-65*mV
El=-49*mV

...

<<<<<<< local
savefig("CUBA_output.png")
=======
savefig("firing_rate_CUBA.eps")
>>>>>>> other

Well, it makes sense for both me and Barbara to explore different parameters, and it makes sense to allow different file formats, so let’s move the parameters into a separate file, and parameterize the file format. The file now looks like this:

...
from brian import *
import time
from parameters import TAU_M, TAU_E, TAU_E, FILE_FORMAT

start_time=time.time()
taum = TAU_M*ms
taue = TAU_E*ms
taui = TAU_I*ms
Vt=-50*mV
Vr=-65*mV
El=-49*mV

...

assert FILE_FORMAT in ('eps', 'png', 'jpg')
savefig("firing_rate_CUBA.%s" % FILE_FORMAT)

After manually editing COBAHH.py as well, I need to tell Mercurial that all the conflicts have been resolved, before I do a commit:

$ hg resolve -m
$ hg add parameters.py
$ hg commit -m "Merged Barbara's changes; moved parameters to separate file"

I’ve decided to add the new parameters.py to the repository. This means Barbara and I will still have conflicts in future if we’re using different parameters, but at least the conflicts will be localized to this one file. It might have been better not to have parameters.py under version control, since it changes so often, but then we need another mechanism, in addition to version control, to keep track of our parameters. For more on this issue, see the section on Provenance tracking.

I send Barbara an e-mail to tell her what I’ve done. Now all she has to do is run hg pull -u.

(barbara)$ cd ~/our_network_model
(barbara)$ hg pull -u /Users/andrew/my_network_model
pulling from /Users/andrew/my_network_model
searching for changes
adding changesets
adding manifests
adding file changes
added 6 changesets with 8 changes to 4 files
4 files updated, 0 files merged, 0 files removed, 0 files unresolved

Now she has the new file, parameters.py, as well as the modified versions of CUBA.py and COBAHH.py.

Recap #2

You should now be able to use Mercurial for:

  • quick and easy backups of your code
  • keeping your work in sync between multiple computers
  • collaborating with colleagues

A comparison of Git and Mercurial

Git is another popular version control system, which shares many concepts and even command names with Mercurial. For simple use there is little to choose between them. The main difference is that Git has the additional concept of a staging area for arranging exactly what gets committed. With Mercurial, hg commit will commit all modified files, while with Git, modified files have to be added to the staging area using git add, otherwise they will not be committed.

The following table shows the approximate equivalence between the most common Mercurial and Git commands.

hg clone <url> git clone <url>
hg diff git diff HEAD
hg status git status
hg commit git commit -a
hg help <command> git help <command>
hg paths git remote -v
hg add git add
hg rm git rm
hg push git push
hg pull git fetch
hg pull -u git pull –rebase
hg revert -a git reset –hard
hg revert <some_file> git checkout <some_file>
hg outgoing git fetch ; git log FETCH_HEAD..master
hg incoming git fetch ; git log master..FETCH_HEAD
hg update <version> git checkout <version>
.hg/hgrc .git/config
.hgignore .gitignore

Read up on both, pick one, although if you collaborate a lot with others you will probably end up using both anyway.

A comparison of Subversion and Mercurial

Subversion is a centralized, not distributed, version control system, in that the repository sits on a central server and each user has only a working copy (in contrast to Mercurial and Git, where each user has both repository and working copy). This means that a network connection is required for operations such as log and commit. It is apparently not as good at merging as Git, Mercurial.

A few years ago, Subversion was by far the most popular open-source version control system, but it is now losing ground to distributed tools such as Git, Mercurial and Bazaar.

The following table shows the approximate equivalence between the most common Subversion and Mercurial commands.

svn checkout <url> hg clone <url>
svn update hg pull -u
svn commit hg commit; hg push
svn log hg log
svn status hg status
svn info hg summary
svn rm hg rm

Graphical tools

As well as the command-line interface, graphical tools are available for all major version control systems. The following screenshot shows MacHg, a tool for working with Mercurial on Mac OS X: note the graph of branching, showing where the laptop repository, cluster repository and Barbara’s repository branched off and then were merged back together.

_images/MacHg.png

The following Wikipedia entries may provide good starting points for investigating graphical version control clients:

Web-based tools

There are many web-based services for hosting version control repositories, for example Google Code, Sourceforge, GitHub and BitBucket. The following table shows which version control systems are supported by these four services, and whether they provide free private repositories (all support free public repositories):

Service Subversion Mercurial Git Bazaar Free private repositories
Sourceforge x x x x  
Google Code x x x    
BitBucket   x x   x
GitHub     x