[How-to:] Merging Git Repositories
For a few months now, I have been slowly converting my subversion repositories into full fledged git repositories. At first, I used git as a front end to various subversion repositories until I became convinced that git was stable and robust enough for my needs.
However, before the conversion I had to answer some philosophical questions about how I wanted my files organized. Should I have one or more large repositories, or dozens of smaller repositories? How should I organize the projects? On a per client basis? Categorically? Topically?
I currently have several subversion repositories based on broad topical categories, like "business", "personal", or "archive." Files were further categorized into more specialized categories until they drilled down to the final category. As a benefit to this structure, I can checkout just the directories I need at the moment, make the changes, then delete my local checkout when I am finished.
I could certainly go forward with this approach, and clone the repositories directly into one or more gargantuan git repositories. But is this this the best way? In my opinion it isn't, so I decided to break apart the large subversion repositories into smaller, topically organized git repositories.
Further, from my experiments with working with very large repositories, when a git repository gets past 7GB, you start running into memory issues, primarily when cloning and packing the repository (if you are cloning an already packed repository, then you are fine). In the future, as more operating systems cross the 64-bit 4GB memory limitation this will be less of an issue.
Bring out your dead.
In GTD, you have a physical file labeled, "dead" where you put reference files that are no longer needed, but are just too important to throw away. They are just dead space that await their fate. At some interval, such as every year, you clean out the dead file and throw away or re-file.
Likewise, I created a dead.git repository that I stuff client files (time sheets, invoices, quotes, contracts, etc), that I don't need to actively reference anymore -- but are still useful. Once a year I will merge them into my archive, which I lovely named tomb.git.
Get it? tomb houses dead bodies... I'm so clever.
Which brings me to merging.
I need to move the entire contents (with history) to another repository.
I have two repositories: 1) tomb.git and 2) source_archive.git. I want to merge the contents of source_archive.git into tomb.git.
The process is relatively simple. First, we fetch a copy of the master branch from the source archive repository into a newly created branch named "sourcemerge." Issuing a 'git branch' shows that the new branch has been created:
1: $ cd tomb.git
2: $ git fetch ../source_archive.git master:sourcemerge
3: From ../source-archive
4: * [new branch] master -> sourcemerge
5: $ git branch
6: * master
7: sourcemerge
Next, we checkout out the newly created branch:
1: $ git checkout -f sourcemerge
2: Checking out files: 100% (20975/20975), done.
3: Switched to branch "sourcemerge"
Now, everything should be there from the old repository. You can check with gitk to see the history has been pulled in. Next, we have to jump back to the master and merge:
1: $ git checkout -f master
2: Checking out files: 100% (20975/20975), done.
3: Switched to branch "master"
4: $ git merge sourcemerge
See? It is dead simple. You can now delete the temporary branch.
Labels: git, Programming, SCM