Wednesday, May 14, 2008

Easy Forks, Hard Forks

I've started playing with GitHub. Basically, it's a repository for projects using the Git source code management system.  Think SourceForge for a good analogy. First of all, this place is very cool. There's a ton of Ruby stuff on here, and some really neat plugins. You can find everything from a plugin for changing URLs for RESTful routes to a little utility that gives you team information for a Japanese basketball team (I did not make that up!).

But... (there's always a "but", isn't there?) GitHub has one thing that just makes me cringe. When you're looking at a project, there's a big button right at the top that says "Fork". Click it, and you get a whole new project based on the existing project.

This just kills me. In my opinion*, this is one of the hardest things about Ruby and about Rails. There are about 42 ways to do something, and about 15 plugins and gems and code samples to accomplish it. It makes it really hard to build a community when they're all a little bit different.

To take a simple example, a lot of people want to run Selenium tests for Rails projects. Selenium gives you that last UI layer that Test::Unit and RSpec don't. When I last went looking for ways to integrate Selenium with my Rails project, I found a lot of options:
  • Just use it totally separately, without any integration at all.
  • Selenium-on-Rails plugin, from the OpenQA repository
  • selenium gem (gem install selenium from the default repos)
  • selenium-fu
  • polonium (newer, renamed, eventually hopefully better selenium-fu)
  • downloadable Ruby client driver from the OpenQA Selenium RC page.
This, for the record, is just from the first two pages of search results. So I picked one (I happened to use the Selenium gem). Once you get up and running, you'll invariably have questions - there are definite quirks in this setup.

Here's where things go downhill.

The first place I go when I'm having a problem and can't noodle through it myself is to a search engine. Mailing lists, blogs, etc. may give me a clue to point me in the right direction. The trouble is, I get results from people using all of the different Selenium-Rails integration tools above. So now I not only have to figure out if they're addressing a similar issue, I also have to figure out if they're using a similar tool. In bad cases this can waste hours and hours.

Forking may be necessary sometimes, but making forking so easy is really something that I disagree with. Choosing to fork a project is effectively saying, "I think this version is incompatible with my goals and so I'm going to make something similar that works in a different way." Once or twice and it's no big deal. If this happens a lot, you wind up with a lot of different tools that look a lot alike but that all behave a bit differently. And then your users get confused and too frustrated to work effectively. Now you've got a real problem.

So to review: forking is okay, but forking too easily only harms the community as a whole.

* I started to say, "in my humble opinion", but I'm not particularly humble and no one who knows me would pretend otherwise!


  1. One of the things about Git is that it uses terminology you are familiar with in a different way. That is to say, "git checkout" is not "svn checkout" and "git revert" is not "svn revert." To that end, git "fork" is not svn "fork." Every time you "git clone" you fork a project -- this is the nature of git. Forking on GitHub is used to help collaborate back to the parent (which is always easy to find when viewing the fork in question). It's actually a very awesome thing, you should try forking a project and testing the waters!

  2. This is true, as far as it goes. Git does make merging between forks very very easy.

    My concern is more the mindspace issue. For as long as forks that try to accomplish the same thing but behave differently exist, then there's potential for user confusion - which one to use? Is this blog entry about my version of this thing or a different one?

    One thing that I hope to see GitHub help people do successfully is essentially "unfork". When the time for/need of the differing code has passed, bringing the forks back together can certainly help solve this problem.

    Far from hopeless; I'm definitely interested to see how things shake out.

    Oh, and welcome, Chris!