While working on a project managed with Git, I needed to split some files into a new repository. In order to maintain the git history on these files, I used git filter-branch. Here are two methods I used.

First, create a copy of the original repo. git filter-branch is destructive and will rewrite your repos history!

git clone original-project/.git new-project
cd new-project

In one case, the files I wanted to keep were all isolated to a subdirectory in the original repo (eg: lib/keep_us). In this case, one can use the --subdirectory-filter option of `git filter-branch:

git filter-branch --subdirectory-filter lib/keep_us

Once this command is complete, new-project will contain only the files from lib/keep_us (git history only includes these files as well) in the project root. That is, any files previously in lib/keep_us would be in the git root.

For another repo, I needed to split files across different directories. This is a bit more complicated. In order to do this, you need to get a list of all files you want to keep. Then, you must traverse each commit and remove any files that are not in this list.

I used the following ruby script to do this:

Dir.chdir "new-project"

all_files_ever = `git log --pretty=format: --name-only --diff-filter=A | sort -`.
                 chomp.split("\n").reject {|line| line.chomp == ""}
keepers = %W(
  app/models/foo.rb
  spec/models/foo_spec.rb
  lib/tasks/foo.rake
)

delete_us = all_files_ever - keepers

`git filter-branch --prune-empty --index-filter "git rm -f --cached --ignore-unmatch #{delete_us.join(' ')}" HEAD`

After running this script, you’ll be left with only the files specified as “keepers”. Unlike the --subdirectory-filter option, these files are left in their original location. That is, app/models/foo.rb is still app/models/foo.rb after the rewrite.

Hope it helps someone else!