The branch-per-task workflow explained
My goal is to explain how the branch-per-task pattern (a.k.a. issue branches, task branches, feature branches, or bug branches) works and why it is so important.
Branching and merging are key topics these days, especially due to the raise in popularity of DVCSs, so I think you'll find this post on branch and merge strategies worth your time. :)
I'll cover the relationship between this pattern and agile methodologies like Scrum and also highlight why the branch-per-task approach is the core of parallel development and why it's much better than serialized development or trunk development.
What is a task?
Let me ask you a question first: are you using an issue tracking system? If the answer is yes, then skip this section and jump to the next. If not, keep reading!
So, you do not have an issue tracking system in place!!! Let's get this fixed first. Go and grab one. No, don't start a long and boring evaluation, just go and grab one, any system is better than not using one. In fact, read Joel's test: 12 steps to better code and triple check point 4.
There are many systems out there you can use:
- Free ones: Bugzilla, Mantis, Trac
- Commercial: Jira, OnTime, Rally, VersionOne
And many more! The issue tracking systems are meant to track issues, or bugs, but don't limit yourself to only bugs.
Rule of thumb - everything is a new task. Every change you make in your code will have an associated task (or issue).
Yes, it can sound extreme if you're not yet doing it, but please consider it: never do a change in your code again without having an associated task. It doesn't matter whether you're fixing a bug or aligning a button or implementing a brand new feature: create a task for it!
Note: Do not embrace a project management nightmare. Do not try to force developers to make unnecessary effort to keep the issue tracking system updated. The object is just to have a database of tasks, not a PRETTY database of tasks that's super-detailed and complete with nice pictures. Keep it simple.
Note (yep, again): Developers must understand issue tracking as a core team practice, as something that helps them on a daily basis to keep track of what they've been doing. If they perceive it as a management control mechanism, it will turn out to be much less useful than what it needs to be.
Tasks must be short!
Tasks are meant to be short, as are bug fixes. Do not introduce a task that's supposed to take 3 weeks to be completed; instead, split it into shorter tasks. If you're familiar with agile methods like Scrum, you'll remember the key rule is trying to keep tasks under 16 hours. I think this is a very good practice.
The task workflow.
The next figure shows how the task workflow works, and its relationship with the entire Scrum process. The tasks I refer to are the ones you use when you decompose the user stories during sprint planning. (Click to enlarge!)
Version control sits at the center of the graphic as coordinator of the entire picture, but let's see what the main components are:
What are tasks? Everything will be a task, and tasks are managed by your favorite issue tracking system, as described in the previous topic. More importantly, tasks come from the product backlog (or whatever list of tasks or work breakdown structure you use).
Who uses tasks? Everyone! A developer can enter a task, just as a project manager can. Testers can introduce bugs, defects, and so on. The task is the central single-point for project coordination.
What are branches? They are the core of the developer's work. Developers will always work on branches, never on the main branch (or trunk). Every task will always start from a well-known point, a baseline, or stable release (which closes the cycle).
Who uses branches? Developers, in their daily work. Integrators, when they need to create releases. Every task is tested and validated before marking it as "resolved" in your issue tracking system.
What is meant by integration? Once a number of tasks are finished, it's time to create a new release. Remember that "release early, release often" is a best practice worth following to avoid big bang integration (the root of all evil in the SCM world).
Who uses it? The integrators. Integrators can be developers playing the role once a week or once a day (depending on the release creation frequency). The integrator role can be a full-time job on big projects. Integrators not only merge the branches back to main, but are responsible of getting the build done right. They'll take a quick look at the code and they'll ask the developers for more information if something in the code is not clear, or if conflicts arise. Responsibility is key to the role of the integrator.
Note for continuous integration maniacs: Yes, there's life beyond continuous integration, and in fact, the branch-per-task pattern is the answer to the problems of the common CI. It even leverages it up to the next level. Check the previous link for info about the next steps mentioned in Duvall's book on CI.
What are releases? The integration result is a new stable release. Stable means it passes the entire test suite, so no known errors are left (or they're well-known).
Who uses them? The entire team. Once a release is finished, it can be passed to the testing group (if it exists). Developers will use the newly created release as the starting point of the new tasks they'll work on during the next working cycle.
Frequently asked questions:
FAQ: What exactly is a task branch?
ANSWER: You've probably heard of concepts like bug branches, topic branches, and so on, right? Ok, even if you haven't, here's the answer: it's a branch you use to implement a given task. It's short-lived and its purpose is to be used only to implement a given feature or bugfix.
FAQ: But aren't branches supposed to be evil incarnate?
ANSWER: Who told you that? I bet you found that on some Subversion guide, forum, or manual, maybe even on some other SCM website, didn't you? Branches are excellent tools for developers, but they're not correctly handled by most of the version control systems out there, including CVS, Subversion, SourceSafe, Perforce, Team Foundation Server (TFS), and many others. That's why they say branching is not good. It's not true -- branching is great and you should use it on a daily basis, but you need the right tool for that.
The branch-per-task approach is not about DVCS.
DVCS (distributed version control system), is the buzzword on all programmers' forums these days. Git and Mercurial contributed majorly to get tons of developers interested in DVCS. What's even more important, they got people interested in branching and merging.
The branch-per-task approach is the core workflow used by most of the DVCS practitioners (including Plastic SCM). This has less to do with the fact that these systems can work in a distributed way and more to do with their actual ability to handle branching and (especially) merging correctly.
Many DVCSs handle branching and merging well, but the branch-per-task pattern is not restricted to distributed systems. Centralized systems are equally capable of using the same pattern, however, many of these, like Subversion, CVS, TFS, and Perforce, have inadequate branching and merging functionality that's questionable at best.
Why branch per task is better?
I've described in detail the branch-per-task pattern and I've also talked about the task cycle and its main elements, so hopefully, you've already concluded why the branch-per-task approach is a good practice.
My intention now is to highlight, in a detailed way, why the branch-per-task pattern is the best way to develop and collaborate for nearly every team, nearly all the time (there will be circumstances where you won't need to branch that often, but believe me, it won't be so common).
Colliding worlds: serial vs parallel development
Let's take a look at a typical project following the serial development pattern, better known as trunk development or mainline development. It just means there's a single branch where everyone checks in his or her changes. It's very easy to set up and very easy to understand. It's the way most developers are used to working with tools like Subversion, CVS, SourceSafe, etc.
As you can see in the figure, the project evolves through check-ins made on a single branch (the trunk or main branch). Every developer does a series of check-ins for his or her changes and, since it's the central point of collaboration for everyone, developers have to be very careful to avoid breaking the build by doing a check-in that doesn't build correctly.
In the example figure, we see how Pat creates a new loading form, but makes a mistake (cset: 10476) and then has to fix it in a later check-in (cset:10478). It means the build has been broken between 10476 and 10478. Every developer updating his or her workspace in between would have been hit by the bug, and it most likely happened to Pablo after he checked in cset:10477 and updated his workspace accordingly.
Also, if you look carefully, you'll see we're mixing together important changes like the one made on cset:10474 (big change on one of the core queries, which could potentially break the application in weird ways) with safer ones like the typo fixed in cset:10475. What does that mean? It means that if we had to release right now, the baseline would not be ready or stable enough.
Let's see how the same scenario would look using parallel development with the branch-per-task method:
As you can see there are several branches involved since every task is now a branch, and there are merge arrows (the green lines) and baselines. We could have created baselines before, but by using a branching pattern, you'll find it's much easier to know when to create them.
Using this example as a basis, I'll start going through the problems we can find in serial development, how to fix them with parallel development (branch-per-task), and why the parallel model is better.
Code is always under control
How often do you check in when you're working on trunk (mainline) development? I bet you're very careful before checking in because you don't want your co-workers coming to your desk, shouting about the code not building anymore, right? Would you agree?
So, when you're working on a difficult task -- something hard to change that will take a few days to complete -- where's your code? Is it under source control? Most likely it won't be since you're reluctant to check things in that don't compile, that are not complete, or that simply collide with other changes.
This is a big issue and something pretty normal with mainline development; changes are outside of version control for long periods of time, until developers are completely done. The version control tool is used just as a delivery mechanism instead of a fully-fledged VCS.
With the branch-per-task approach, you won't have this problem: you can check in as often as you want to, in fact, it's encouraged. This enables you to create frequent checkpoints, which preserve your own development process.
Side bar - But it was working 5 minutes ago!! I bet you've said that before! You're working on a change, your code is working, then you change something, it doesn't work all of a sudden, and you lose time trying to figure out what you did wrong (normally commenting and uncommenting code here and there, too). It's pretty common when you're experimenting with changes, learning an API, or carrying out some difficult tasks. If you have your own branch, why don't you check in after each change? Then you don't have to rely again on commenting code in and out for the test.
Keep the main branch pristine
Breaking the build is something very common using mainline development. You check-in some code that you didn't test properly, and you'll be breaking some tests or even worse: introducing code that doesn't compile anymore. Keeping the main branch pristine is one of the goals of branch per task: you specifically control everything that enters the main branch, so there's no simple way to break the build. Also keep in mind the usage of the main branch is totally different with branch per task: instead of being the single rendezvous point for the entire team, where everyone gets synchronized continuously and instability can happen, the main branch is now a stable point, with well-known baselines.
Keep the main branch pristine
Breaking the build is something very common when using mainline development. You check in some code that you didn't test properly, and you end up breaking some tests or even worse, introducing code that doesn't compile anymore.
Keeping the main branch pristine is one of the goals of the branch-per-task method. You carefully control everything that enters the main branch, so there's no easy way to accidentally break the build.
Also keep in mind that the usage of the main branch is totally different with a branch-per-task pattern. Instead of being the single rendezvous point for the entire team, where everyone gets synchronized continuously and instability can happen, the main branch is now a stable point, with well-known baselines.
Have well-known starting points - do not shoot moving targets!
When you're working in mainline mode, it's often not easy to describe the exact starting point of your working copy.
Let me elaborate. You update your workspace to /main at a certain point in time, as you can see in the following picture. What's that point? It's not BL130, because there are a few changes after that. So if you find an error, is it because of the previous changes or due to the ones you just introduced?
You can easily say, "Well, if you're using continuous integration, you'll try to ensure the build is always ok, so whatever you download will be ok." First off, that's a pretty reactive technique -- where you first break the build and later fix it. Secondly, yes, you're right, but still, what is this configuration? If you update at the indicated point, you'll be working with an intermediate configuration, something that's not really well-known -- you'll be shooting a moving target!
Now take a look at the situation using the branch-per-task pattern:
As you can see, there's a big difference. All your tasks start from a well-known point. There's no more guessing, no more unstable configurations. Every task (branch) has a clear starting point, a well-known stable one. It really helps you stay focused on your own changes.
Enforce baseline creation
Creating baselines is a best practice. Using the branch-per-task method, baselines become a central part of your daily work. There's no better way to enforce a best practice than making it an integral part of your workflow.
Every task starts from a well-known starting point: a baseline. Tasks are usually independent of each other. This simple fact gives you huge flexibility. Let's take a look at the following figure. As you can see, it's the same example we used above, but I annotated it with some circles and arrows. What do they mean? Every task is linked to the previous one. Not due to a functional requirement, but simply due to the check-in order.
Why are they linked? Because with every check-in, if you modified some of the files that were previously modified before, you'll be merging changes from different tasks together. Even if you didn't modify the same set of files, you'll be testing the changes together, creating a soft dependency between them. This simply doesn't happen when you have task branches.
It is also important to note that when using task branches, the entire release process is stronger since you choose what goes into the next release (selecting which branches to merge) instead of just taking what's on the mainline.
Branches as unit of change
Let's take a look deep into the history of SCM. At the beginning, there weren't changesets; all work was handled on a per-file basis. There wasn't a good way to know which changes (check-ins) where related to which tasks. Stopgap measures included keeping a list of the files in the issue tracking software, setting labels (can you say "overkill"!?), or writing the changes down somewhere (just plain not cool).
Then changesets entered the picture and life got better. Now you can relate change 10474 to the core fix in database query. Often, the issue trackers will keep a pointer to the changeset or vice versa. The problem is that it forces developers to use changesets as units of change, and that's not a very good idea. Let me explain to you why. By definition, a changeset can only contain one revision of one given file or directory. So what if you need to do more than one check-in to the same file while working on your task? You can't. You'll end up checking in less often than best practice dictates. Wrong!
Then a new generation of SCMs entered the scene and they were able to handle branching and merging correctly. That's why now, with Plastic, it's possible to use branches as the real units of change. Branches are not restricted to the number of revisions you create for a certain file or directory. You're welcome to commit often and frequently!
Stop bug spreading
People dealing with dangerous materials work in controlled environments, and usually behind closed doors, in order to prevent catastrophes from spreading if they ever happen. We can learn a valuable lesson from them.
Look at the following mainline example: Pat introduces a bug, and immediately everyone hitting the main branch will be affected by it. There's no contention, there's no prevention, and the actions are entirely reactive. You break the code and yes, you fix it, but the build should not have been so easily broken in the first place.
Now let's take a look at the same situation with the branch-per-task pattern. The bug will still be there, but we have a chance to fix it before it ever hits the mainline. There's contention and there's a preventive strategy in place.
Easier said than done: look at branch task115. Do you know which is the related task on your preferred issue tracking system? I guess 115, right? Cool. Linked. Full CMMI-level traceability achieved. It couldn't be easier.
After that we can implement awesome integrations (and we often do with Plastic) to let you double click on a branch and check the related task, go to the task and check the modified files, or create code reviews, and so on.
But the basic, key capability here is that traceability is maintained by a really simple naming convention. You can't do this with changeset-based approaches or mainline development.
It's been a long post! I hope I've shared all the ideas I have regarding the task workflow and how to implement it with versioning systems that utilize branching and merging, like Plastic SCM. We've used the pattern internally for years with very good results. Of course we do combine more than one pattern together. The branch-per-task approach is used for short-term (tactical) purposes while the long-term activities like bug fix releases, new maintenance code lines, etc. are kept in longer (strategical) branches. Check out this article for integration strategies.