Version control =============== Basic ideas ----------- Any time multiple versions of a document exist, whether due to a document changing over time, or because multiple authors are working on it, some kind of version control is needed. Version control allows: * accessing any version from the original to the most recent; * seeing what has changed from one version to the next; * giving a label of some kind to distinguish a particular version. Examples of version control systems ----------------------------------- The simplest method of version control is probably the most widely used in science: changing the file name. .. figure:: http://www.phdcomics.com/comics/archive/phd052810s.gif :alt: Cartoon from "Piled Higher and Deeper" by Jorge Cham :target: http://www.phdcomics.com/comics.php?f=1323 .. figure:: http://www.phdcomics.com/comics/archive/phd101212s.gif :alt: Cartoon from "Piled Higher and Deeper" by Jorge Cham :target: http://www.phdcomics.com/comics.php?f=1531 from *"Piled Higher and Deeper" by Jorge Cham* www.phdcomics.com Other examples include: * "track changes" in Microsoft Word * Time Machine in Mac OS X * versioning in Dropbox, Google Drive * formal version control systems such as CVS, Subversion, Mercurial, Git The importance of tracking projects, not individual files --------------------------------------------------------- Early version control systems, such as CVS, track each file separately - each file has its own version number. The same is true of Dropbox, Microsoft Word. This is a problem when you make changes to several files at once, and the changes in one file depend on changes in another. In modern version control systems, and in backup-based systems such as Time Machine, entire directory trees are tracked as a unit, which means that each version corresponds to the state of an entire project at a point in time.s Advantages of formal version control systems -------------------------------------------- * explicit version number for each version * easy to switch between versions * easy to see changes between versions * tools to help merge incompatible changes In the next sections, we will use Mercurial_, one of the most commonly used, modern version control systems, to introduce the principles of version control. We will use Mercurial's command-line interface because it is easy to use, and widely used. Following this, we will briefly discuss Git_ and Subversion_, two other widely-used version control systems, as well as graphical user interfaces for version control. Installing Mercurial -------------------- Mercurial is available for Linux, Mac OS X, Windows, and assorted flavours of UNIX. For Linux, it will certainly be available in your package manager. For Windows and Mac OS X, download from http://mercurial.selenic.com/wiki/Download Once you've installed it, you should create a file named :file:`.hgrc` in your home directory, as follows: .. code-block:: bash [ui] username = Andrew Davison Creating a repository --------------------- We start by introducing two concepts: **Working copy** the set of files that you are currently working on **Repository** a "database" containing the entire history of your project (all versions) As an example, we will use the Brian code from this paper: Brette R, Rudolph M, Carnevale T, Hines M, Beeman D, Bower JM, Diesmann M, Morrison A, et al. (2007) Simulation of networks of spiking neurons: A review of tools and strategies. *J Comp Neurosci* **23**:349-98 available from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 .. code-block:: bash $ unzip destexhe_benchmarks.zip $ cd destexhe_benchmarks $ cp -r Brian ~/my_network_model $ cd ~/my_network_model $ ls COBA.py COBAHH.py CUBA.py README.txt We're going to take this code as the starting point for our own project, and we want to keep track of the changes we make. The first step is to create a repository, where all the versions will be stored. This is very simple: .. code-block:: bash $ hg init Nothing seems to have happened. In fact, the :command:`hg init` command has created a new subdirectory: .. code-block:: bash $ ls -a . .. .hg COBA.py COBAHH.py CUBA.py README.txt You almost never need to care about what is in this directory: this is where Mercurial will store all the information about the repository. Adding files to the repository ------------------------------ Now we need to tell Mercurial which files are part of our project: .. code-block:: bash $ hg add ajout de COBA.py ajout de COBAHH.py ajout de CUBA.py ajout de README.txt $ hg status A COBA.py A COBAHH.py A CUBA.py A README.txt Committing changes ------------------ These files are now *queued* to be added to the repository, but they are not yet there. Nothing is definitive until we make a *commit* (also known as a "*check-in*"). .. code-block:: bash $ hg commit This pops me into a text editor where I can enter a message describing the purpose of the commit: .. code-block:: bash HG: Enter commit message. Lines beginning with 'HG:' are removed. HG: Leave message empty to abort commit. HG: -- HG: user: Andrew Davison HG: branch 'default' HG: added COBA.py HG: added COBAHH.py HG: added CUBA.py HG: added README.txt In this case, I am using :command:`vi`, but you can use any editor. .. code-block:: bash Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012 HG: Enter commit message. Lines beginning with 'HG:' are removed. HG: Leave message empty to abort commit. HG: -- HG: user: Andrew Davison HG: branch 'default' HG: added COBA.py HG: added COBAHH.py HG: added CUBA.py HG: added README.txt Viewing the history of changes ------------------------------ The log command lists all the different versions stored in the repository. For now, of course, we have only one: .. code-block:: bash $ hg log changeset: 0:ef57b1c87c6a tag: tip user: Andrew Davison date: Wed Jul 11 12:33:21 2012 +0200 summary: Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012 Now let's run the code: .. code-block:: bash $ python COBAHH.py Network construction time: 0.814524173737 seconds Simulation running... Simulation time: 45.7264661789 seconds 126014 excitatory spikes 29462 inhibitory spikes This pops up a window with the following figure: .. image:: images/brian_figure.png We'd prefer to save the figure to a file for further use, rather than work with the model interactively, so let's change the last lines of the script from: .. code-block:: python plot(trace.times/ms,trace[1]/mV) plot(trace.times/ms,trace[10]/mV) plot(trace.times/ms,trace[100]/mV) show() to .. code-block:: python plot(trace.times/ms,trace[1]/mV) plot(trace.times/ms,trace[10]/mV) plot(trace.times/ms,trace[100]/mV) savefig("COBAHH_output.png") Seeing what's changed --------------------- Now if we run :command:`hg status` we see: .. code-block:: bash $ hg status M COBAHH.py The "M" indicates that the file has been modified. To see the changes: .. code-block:: bash $ hg diff diff -r ef57b1c87c6a COBAHH.py --- a/COBAHH.py Wed Jul 11 12:33:21 2012 +0200 +++ b/COBAHH.py Wed Jul 11 15:56:05 2012 +0200 @@ -93,4 +93,4 @@ plot(trace.times/ms,trace[1]/mV) plot(trace.times/ms,trace[10]/mV) plot(trace.times/ms,trace[100]/mV) -show() +savefig("COBAHH_output") Now let's commit the changes, and look at the log again: .. code-block:: bash $ hg commit -m 'Save figure to file' $ hg log changeset: 1:e323d363742a tag: tip user: Andrew Davison date: Wed Jul 11 15:59:02 2012 +0200 summary: Save figure to file changeset: 0:ef57b1c87c6a user: Andrew Davison date: Wed Jul 11 12:33:21 2012 +0200 summary: Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012 Switching between versions -------------------------- To switch between versions (you should not do this if you have modified any of the files - commit your changes first), use :command:`hg update`: .. code-block:: bash $ hg update 0 1 files updated, 0 files merged, 0 files removed, 0 files unresolved This will change the files in your working copy to reflect the state they had when you committed that particular version. Using :command:`hg summary` we can see which version we are currently using: .. code-block:: bash $ hg summary parent: 0:ef57b1c87c6a Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012 branch: default commit: 1 unknown (clean) update: 1 new changesets (update) When specifying the version number to switch to, you can use either the short form (a decimal integer, like ``0`` or ``1``) or the hexadecimal form (like ``ef57b1c87c6a``). The difference between these two forms is discussed below, in `Collaborating with others`_. With no version number, :command:`hg update` switches to the most recent version: .. code-block:: bash $ hg up 1 files updated, 0 files merged, 0 files removed, 0 files unresolved $ hg sum parent: 1:b0275b66ad2b tip Save figure to file branch: default commit: 1 unknown (clean) update: (current) Also note that all Mercurial commands can be abbreviated, provided the abbreviation is unambiguous. Giving informative names to versions ------------------------------------ Remembering the version number for a particular version of interest (for example, the version used to generate a particular figure in your manuscript) can be difficult. For this reason, the :command:`hg tag` command can be used to give descriptive and memorable names to significant versions: .. code-block:: bash $ hg tag "Figure 1" Note that this automatically makes a new commit: .. code-block:: bash $ hg log changeset: 2:416ac8894202 tag: tip user: Andrew Davison date: Thu Jul 12 14:28:19 2012 +0200 summary: Added tag Figure 1 for changeset b0275b66ad2b changeset: 1:b0275b66ad2b tag: Figure 1 user: Andrew Davison date: Wed Jul 11 16:01:32 2012 +0200 summary: Save figure to file changeset: 0:ef57b1c87c6a user: Andrew Davison date: Wed Jul 11 12:33:21 2012 +0200 summary: Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012 You can now switch to a tagged version using the tag name: .. code-block:: bash $ hg update "Figure 1" Recap #1 -------- So far, we have learned how to: * Create a repository * Add files to a repository * Commit changes * Move your code-base backwards and forwards in time These operations are so easy and so useful that there is no reason not to use them for almost any work you do as a scientist. Any time I start a new project, whether writing code or writing a paper with LaTeX, I now run :command:`hg init` as soon as I've created a new directory for the project. Making backups -------------- As well as helping to keep track of different versions of a project, version control systems are hugely useful for keeping backups of your code with minimal hassle. Making a copy of your repository is as simple as moving to the location where the backup will be, and then using the :command:`hg clone` command. .. code-block:: bash $ cd /Volumes/USB_DRIVE $ hg clone ~/my_network_model $ cd ~/Dropbox $ hg clone ~/my_network_model $ ssh cluster.example.edu (cluster)$ hg clone ssh://my_laptop.example.edu/my_network_model You can then keep the backup in-sync with the main repository by either using :command:`hg pull` in the backup location, or using :command:`hg push` in your working directory: .. code-block:: bash $ cd ~/my_network_model $ hg push /Volumes/USB_DRIVE/my_network_model pushing to /Volumes/USB_DRIVE/my_network_model searching for changes aucun changement trouvé Working on multiple computers ----------------------------- As an extension of the idea of backups, version control systems are excellent for keeping code in sync between multiple computers. Suppose you have a copy of your repository on your laptop, and you were working on the code in the airport. .. code-block:: bash (laptop)$ hg diff diff -r 416ac8894202 -r 0467691f7881 CUBA.py --- a/CUBA.py Thu Jul 12 14:28:19 2012 +0200 +++ b/CUBA.py Thu Jul 12 15:18:09 2012 +0200 @@ -72,4 +72,4 @@ print Me.nspikes,"excitatory spikes" print Mi.nspikes,"inhibitory spikes" plot(M.times/ms,M.smooth_rate(2*ms,'gaussian')) -show() +savefig("CUBA_output.png") (laptop)$ hg commit -m 'CUBA script now saves figure to file' The log on your laptop now looks like this: .. code-block:: bash (laptop)$ hg log changeset: 3:0467691f7881 tag: tip user: Andrew Davison date: Thu Jul 12 15:18:09 2012 +0200 summary: CUBA script now saves figure to file changeset: 2:416ac8894202 user: Andrew Davison date: Thu Jul 12 14:28:19 2012 +0200 summary: Added tag Figure 1 for changeset b0275b66ad2b changeset: 1:b0275b66ad2b tag: Figure 1 user: Andrew Davison date: Wed Jul 11 16:01:32 2012 +0200 summary: Save figure to file changeset: 0:ef57b1c87c6a user: Andrew Davison date: Wed Jul 11 12:33:21 2012 +0200 summary: Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012 Meanwhile, you've started running some simulations on a local cluster, and you're investigating the effect of changing some parameters: .. code-block:: bash (cluster)$ hg diff diff -r 416ac8894202 CUBA.py --- a/CUBA.py Thu Jul 12 14:28:19 2012 +0200 +++ b/CUBA.py Thu Jul 12 15:19:49 2012 +0200 @@ -25,9 +25,9 @@ import time start_time=time.time() -taum=20*ms -taue=5*ms -taui=10*ms +taum=15*ms +taue=3*ms +taui=5*ms Vt=-50*mV Vr=-60*mV El=-49*mV (cluster)$ hg commit -m 'Changed time constants in CUBA model' (cluster)$ hg log changeset: 3:243c20657dc4 tag: tip user: Andrew Davison date: Thu Jul 12 15:20:17 2012 +0200 summary: Changed time constants in CUBA model changeset: 2:416ac8894202 user: Andrew Davison date: Thu Jul 12 14:28:19 2012 +0200 summary: Added tag Figure 1 for changeset b0275b66ad2b changeset: 1:b0275b66ad2b tag: Figure 1 user: Andrew Davison date: Wed Jul 11 16:01:32 2012 +0200 summary: Save figure to file changeset: 0:ef57b1c87c6a user: Andrew Davison date: Wed Jul 11 12:33:21 2012 +0200 summary: Initial version, downloaded from http://senselab.med.yale.edu/modeldb/showmodel.asp?model=83319 on July 21st 2012 Now the repositories on the two machines are out of sync. The first three commits are the same on both, but the fourth is different on the two machines. Note that versions 0, 1, and 2 have the same hexadecimal version number on both machines, but that version 3 has a different hex number: ============== ============== Laptop Cluster ============== ============== 0:ef57b1c87c6a 0:ef57b1c87c6a 1:b0275b66ad2b 1:b0275b66ad2b 2:416ac8894202 2:416ac8894202 3:0467691f7881 3:243c20657dc4 ============== ============== This is the reason for having both the short, integer number and the hex version: the short integer is local to each machine, while the hex number is global. So, how do we get the two machines in sync? This can be done from either machine. Here, we'll do it from the laptop. .. code-block:: bash (laptop)$ hg pull -u ssh://cluster.example.edu/my_network_model pulling from ssh://cluster.example.edu/my_network_model searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) not updating: crosses branches (merge branches or update --check to force update) Note that :command:`hg pull -u` is equivalent to running :command:`hg pull` followed by :command:`hg update`. *"Pull"* pulls changes into the local *repository*, but does not change the *working copy*, i.e. it does not change your files. *"Update"* is the part that changes your files. Here, the pull succeeded, but the update failed, because we made two different commits on different machines. .. code-block:: bash (laptop)$ hg merge merging CUBA.py 0 files updated, 1 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) Because Mercurial is clever enough to realize that we'd edited different parts of the file :file:`CUBA.py`, it can automatically merge the two changes. If there was a conflict (if we'd edited the same lines on both machines), the merge would fail and we'd have to manually merge the files (see below). Mercurial does not automatically commit after the merge, so we have the chance to check we are happy with how Mercurial has merged the files before committing. .. code-block:: bash (laptop)$ hg commit -m 'merge' Now we can see the full history, with all changes: .. code-block:: bash (laptop)$ hg log -r5:2 changeset: 5:12fddba7aaa7 tag: tip parent: 3:16e621976c95 parent: 4:243c20657dc4 user: Andrew Davison date: Thu Jul 12 15:54:29 2012 +0200 summary: merge changeset: 4:243c20657dc4 parent: 2:416ac8894202 user: Andrew Davison date: Thu Jul 12 15:20:17 2012 +0200 summary: Changed time constants in CUBA model changeset: 3:16e621976c95 user: apdavison date: Thu Jul 12 15:40:04 2012 +0200 summary: CUBA script now saves figure to file changeset: 2:416ac8894202 user: Andrew Davison date: Thu Jul 12 14:28:19 2012 +0200 summary: Added tag Figure 1 for changeset b0275b66ad2b (Note that we've truncated the output by asking for only a subset of the commits). To complete the sync, we now push the merged repository back to the cluster: .. code-block:: bash (laptop)$ hg push ssh://cluster.example.edu/my_network_model pushing to ../my_network_model searching for changes adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 1 files Collaborating with others ------------------------- Using version control systems to collaborate with others is essentially no different to working solo on multiple machines, except that you perhaps have less knowledge of exactly what changes have been made by others. Suppose my colleague Barbara has also been working on the same code: she cloned my repository at version 0, and since then has been working independently. I'm a little wary of pulling in her changes, so first I can take a look at what she's changed: .. code-block:: bash $ hg incoming /Users/barbara/our_network_model comparaison avec /Users/barbara/our_network_model searching for changes changeset: 1:40f575c2c5a4 user: Barbara Bara date: Thu Jul 12 16:16:58 2012 +0200 summary: Changed some parameters in CUBA.py, and saved figure to postscript changeset: 2:2024998fd5ec tag: tip user: Barbara Bara date: Thu Jul 12 16:17:58 2012 +0200 summary: Save COBAHH figure to postscript Looks like there may be some problems, since I've also changed parameters in that file, and I'm saving figures to PNG format. Oh, well, deep breath, let's plunge in: .. code-block:: bash $ hg pull -u /Users/barbara/our_network_model pulling from /Users/barbara/our_network_model searching for changes adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files (+1 heads) not updating: crosses branches (merge branches or update --check to force update) .. code-block:: bash $ hg merge merging COBAHH.py warning: conflicts during merge. merging COBAHH.py failed! merging CUBA.py warning: conflicts during merge. merging CUBA.py failed! 0 files updated, 0 files merged, 0 files removed, 2 files unresolved use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon Unlike last time, when our changes were in different parts of the file, and so could be merged automatically, here Barbara has changed some of the same lines as me, and Mercurial can't choose which changes to keep. If we now look at :file:`CUBA.py`, we can see the conflicts marked with ``<<<<<<<`` and ``>>>>>>>``: .. code-block:: python ... from brian import * import time start_time=time.time() <<<<<<< local taum=15*ms taue=3*ms taui=5*ms ======= taum=25*ms taue=5*ms taui=10*ms >>>>>>> other Vt=-50*mV Vr=-65*mV El=-49*mV ... <<<<<<< local savefig("CUBA_output.png") ======= savefig("firing_rate_CUBA.eps") >>>>>>> other Well, it makes sense for both me and Barbara to explore different parameters, and it makes sense to allow different file formats, so let's move the parameters into a separate file, and parameterize the file format. The file now looks like this: .. code-block:: python ... from brian import * import time from parameters import TAU_M, TAU_E, TAU_E, FILE_FORMAT start_time=time.time() taum = TAU_M*ms taue = TAU_E*ms taui = TAU_I*ms Vt=-50*mV Vr=-65*mV El=-49*mV ... assert FILE_FORMAT in ('eps', 'png', 'jpg') savefig("firing_rate_CUBA.%s" % FILE_FORMAT) After manually editing :file:`COBAHH.py` as well, I need to tell Mercurial that all the conflicts have been resolved, before I do a commit: .. code-block:: bash $ hg resolve -m $ hg add parameters.py $ hg commit -m "Merged Barbara's changes; moved parameters to separate file" I've decided to add the new :file:`parameters.py` to the repository. This means Barbara and I will still have conflicts in future if we're using different parameters, but at least the conflicts will be localized to this one file. It might have been better not to have :file:`parameters.py` under version control, since it changes so often, but then we need another mechanism, in addition to version control, to keep track of our parameters. For more on this issue, see the section on :doc:`provenance_tracking`. I send Barbara an e-mail to tell her what I've done. Now all she has to do is run :command:`hg pull -u`. .. code-block:: bash (barbara)$ cd ~/our_network_model (barbara)$ hg pull -u /Users/andrew/my_network_model pulling from /Users/andrew/my_network_model searching for changes adding changesets adding manifests adding file changes added 6 changesets with 8 changes to 4 files 4 files updated, 0 files merged, 0 files removed, 0 files unresolved Now she has the new file, :file:`parameters.py`, as well as the modified versions of :file:`CUBA.py` and :file:`COBAHH.py`. .. todo:: section on fixing mistakes: revert, rollback, backout Recap #2 -------- You should now be able to use Mercurial for: * quick and easy backups of your code * keeping your work in sync between multiple computers * collaborating with colleagues .. todo:: section on branching, using hg clone, bookmark, branch A comparison of Git and Mercurial --------------------------------- Git_ is another popular version control system, which shares many concepts and even command names with Mercurial. For simple use there is little to choose between them. The main difference is that Git has the additional concept of a staging area for arranging exactly what gets committed. With Mercurial, :command:`hg commit` will commit all modified files, while with Git, modified files have to be added to the staging area using :command:`git add`, otherwise they will not be committed. The following table shows the approximate equivalence between the most common Mercurial and Git commands. ======================= ====================================== hg clone git clone hg diff git diff HEAD hg status git status hg commit git commit -a hg help git help hg paths git remote -v hg add git add hg rm git rm hg push git push hg pull git fetch hg pull -u git pull --rebase hg revert -a git reset --hard hg revert git checkout hg outgoing git fetch ; git log FETCH_HEAD..master hg incoming git fetch ; git log master..FETCH_HEAD hg update git checkout .hg/hgrc .git/config .hgignore .gitignore ======================= ====================================== .. todo:: Add branching commands to above table .. todo:: Mention index, show figures that show difference btw hg add and git add, mention fact that modified files are not automatically staged for commit in git Read up on both, pick one, although if you collaborate a lot with others you will probably end up using both anyway. A comparison of Subversion and Mercurial ---------------------------------------- Subversion is a centralized, not distributed, version control system, in that the repository sits on a central server and each user has only a working copy (in contrast to Mercurial and Git, where each user has both repository and working copy). This means that a network connection is required for operations such as :command:`log` and :command:`commit`. It is apparently not as good at merging as Git, Mercurial. A few years ago, Subversion was by far the most popular open-source version control system, but it is now losing ground to distributed tools such as Git, Mercurial and Bazaar. The following table shows the approximate equivalence between the most common Subversion and Mercurial commands. ===================== =================== svn checkout hg clone svn update hg pull -u svn commit hg commit; hg push svn log hg log svn status hg status svn info hg summary svn rm hg rm ===================== =================== Graphical tools --------------- As well as the command-line interface, graphical tools are available for all major version control systems. The following screenshot shows MacHg, a tool for working with Mercurial on Mac OS X: note the graph of branching, showing where the laptop repository, cluster repository and Barbara's repository branched off and then were merged back together. .. image:: images/MacHg.png :width: 100% The following Wikipedia entries may provide good starting points for investigating graphical version control clients: * http://en.wikipedia.org/wiki/Comparison_of_Subversion_clients * http://en.wikipedia.org/wiki/Mercurial * http://en.wikipedia.org/wiki/Category:GIT_Tools Web-based tools --------------- There are many web-based services for hosting version control repositories, for example `Google Code`_, Sourceforge_, GitHub_ and BitBucket_. The following table shows which version control systems are supported by these four services, and whether they provide free private repositories (all support free public repositories): =========== ========== ========= === ====== ========================= Service Subversion Mercurial Git Bazaar Free private repositories ----------- ---------- --------- --- ------ ------------------------- Sourceforge x x x x Google Code x x x BitBucket x x x GitHub x =========== ========== ========= === ====== ========================= .. todo:: add launchpad .. _Mercurial: http://mercurial.selenic.com .. _Git: http://git-scm.com/ .. _Subversion: http://subversion.apache.org/ .. _`Google Code`: http://code.google.com/ .. _Sourceforge: http://sourceforge.net/ .. _GitHub: https://github.com/ .. _BitBucket: https://bitbucket.org/