My team is using Agile and we found a bug… Now what?!

I have noticed a tendency in teams that are relatively new to Agile (especially Scrum) to really struggle with the particulars on how to treat bugs.  I can say that I have personally dealt with different attitudes and opinions on this topic in almost every project that I have used Scrum on.  I have also lived this problem both from the perspective of a consultant delivering services to a customer under the terms of a rigid contract, and from the other side of the table as a customer leading a mixed team of vendors and employees.

I am willing to bet that the discussion that follows is probably pretty familiar to alot of you out there who have in one way or another gone through the same process and perhaps have come to the same conclusions.

Here are some of the more common questions:

  1. What do we do with bugs that we find on features that we are currently developing?
  2. Should we give story point estimates to bugs?
  3. Should we fix the bugs that were found in this Sprint (for instance Sprint 1) within the next consecutive sprint (Sprint 2)?
  4. We are committed to delivering quality software, shouldn’t we fix defects “for free”?

I’ll provide a scenario for looking at these questions in turn:

The Scenario:

Your team has been working very hard all sprint long to deliver on a set of user stories for the company’s new Enterprise Poodle Ranger system.  The sprint is coming to a close in the next day or two and all looks good.  The testers are reporting good results and then… in the very last hour the discovery is made that something doesn’t quite work right when evaluated against the acceptance criteria in the story for registering your poodle’s favorite colors. 

The favorite colors story is a big one, worth 13 points; there’s not enough time to fix something and create a new release for the testing crew, and not being able to claim those points would really bring down your team’s velocity.  Not getting those points is not going to look good on the ol product burndown chart at all.  Everyone worked really hard on it and it really meets the spirit of the requirements, it just has some minor warts.

Does your team:

A) Mark the favorite colors story as “Done” and introduce a new bug in the backlog for each issue in the story.  Claim 13 points.  Marvel at the beauty of your velocity report.

B) Defer marking the colors story as “Done” and carry the scope over to the next sprint keeping the original user story intact.  Do not claim the 13 points until you have fixed the issues identified.  Explain your velocity report to upper management.

C) “Slice” the story.  One slice for the finished part of the story and one for the yet unfinished part.  Divide the points proportionally and assign them to each slice respectively considering the amount of effort accomplished and the amount of effort remaining.  Mark the slice for the “accomplished” scope “Done”.  Marvel at the beauty of your velocity report.

The issue of “what to do when we find a bug” is something that really can’t be dealt with outside of the more fundamental issue of what your team’s “Definition of Done” is.  Does your team have a clearly posted “Definition of Done” that describes what exactly your team is committed to providing when you say something is finished?  If so (and your team should), my suggestion is that it includes language that is absolutely intolerant of calling a story “Done” when it has known issues, bugs, or lack of precise alignment with both the documented acceptance critieria and the spirit of the story.

Think about it, when your team estimated the story, did your estimate envision a working piece of software at the end or… a mostly working piece of software with some stuff left to do?

The right answer to the above scenario-based question is B.  There will be some short-term pain for the team, and you might have some explaining to do, but the trust between the team and their Product Owner won’t be compromised, your customers will be aware that you have a stringent standard for quality and that you are being transparent about progress.

You also will have learned something valuable as a team – perhaps that your team isn’t capable of completing 13 point stories in one sprint.  Slice the scope into thinner pieces next time.  Plan to succeed!

The issue of what this does to the team’s velocity report is of no consequence when you consider the fact that the effort involved in fixing the issues is probably much smaller than the effort to build the feature.  Your team should in most cases be able to close the item out early in the next sprint.  The law of averages will work in your favor as you should have an above average velocity in the next sprint.

Now, let’s modify the scenario above in this way to look at the first question from another angle:

The (Slightly Different) Scenario:

… and then… with one week left in the Sprint, the discovery is made that something doesn’t quite work right when evaluated against the acceptance criteria in the story for registering your poodle’s favorite colors. 

Uh oh, you have encountered a bug, on software that is currently under development.

Do you?

A) File a bug in your team’s ALM system.  Ensure it is point estimated by the team and schedule it for the current Sprint.

B) Don’t file a bug in the team’s ALM system.  Have the tester who found the bug collaborate closely with the developers who implemented the feature and bring the issue to a resolution.

C) File a bug in your team’s ALM system.  Don’t put a point estimate on the bug, and assign it to the developers who implemented the feature.

What you choose here to some degree depends on your team’s work environment and whether you use your ALM tool as a communication tool (for instance in a split onshore/offshore teams) vs a project reporting tool (perhaps where you have other richer means of communication available and are co-located).  This variable could cause your team to choose either B or C.

There are some wrong answers though.  Namely A.  The bulk of the wrongness revolves around assigning a point value to the bug.  Again I refer you to the “Definition of Done”.  Story points are an approximation of the Effort, Complexity, and Risk involved in completing scope.  Finding an issue with a feature prior to the completion of its development does not suddenly increase effort, complexity, or risk and you wouldn’t want the backlog to show that scope has increased (as it would if you assigned points to the bug).  Assuming you haven’t learned anything that funamentally changes your understanging of scope, you considered all of these aspects when assigning a point estimate originally.

If I had my druthers, I would opt for B.  This is consistent with the guiding principle of preferring people and interactions over processes and tools.  Work together to resolve the issue, communicate face to face and get a working feature to the finish line.

The (Slightly Different with a twist) Scenario:

Let’s change the scenario again to consider when we would want to assign points to a bug.

… and then… with [one week] left in the Sprint, the discovery is made that something doesn’t quite work right with the poodle dating game feature.  We must have broken something in the process of writing our latest code.  This is a real issue.  The dating game feature was one of the first features our team released, and that was 2 sprints ago.  Alot has changed since then and I fear it will take longer than the time we have remaining to find the root cause and fix the issue.

This is in fact the ideal case to:

A) Drop everything and fix the bug, we can’t afford to introduce regressions into our product.  We might not finish all the user stories we had forecasted, but we’ll get this one to work right

B) File a bug in the team’s ALM system.  Allow the product owner to prioritize it against the rest of the backlog.  Ensure that the team estimates the bug in story points.

C) File a bug in the team’s ALM system for tracking purposes.  Do not estimate the bug using story points.  We’ll fix it for “free” as early as we can in the next sprint.

Ok, let’s get something straight.  Bugs happen.  They will exist, and nobody makes a bug free product out of the gate.  We are fallable humans and the inevitable will happen from time to time.  Of course you can minimize the frequency with which it happens by using good engineering practices (decoupling features, building cohesive components, unit testing your code, refactoring when needed etc).  All that aside though what do you do now?  Well as it turns out, this is about as legitimate a case as it gets to create a bug on the backlog.  The reasons I say this are:

1) You didn’t leave anything on the table.  The story that the bug’s scope is most closely related to was not passed off as “Done” knowing that there was something left to do or otherwise wrong with the deliverable.

2) The presence of the bug is not impeding the team’s ability to deliver on the scope of anything they are currently working on as per the acceptance criteria and the spirit of the stories.

3) The fix isn’t obvious and it seems unlikely that you would be able to adequately address the issue during the sprint without compromising the expectations that the team set during sprint planning.

The answer is B although there are some interesting team dynamics that result from both B and C that merit some discussion.  Let me start by saying that I’ve tried C.  I actually did this on a team and felt good at the time about what that policy said regarding our committment to quality.

The end result of C was that the team grew to really hate bugs.  The product owners appreciated the gesture and I think it did build some good-will.  The team though felt like they never got ‘credit’ for the quality that they were delivering in fixing bugs and felt like we couldn’t show the total progress we had made using the available reporting mechanisms.  I did not find that doing this changed team behavior in a way that produced better quality code.  We were a consulting team, and fixing bugs for ‘free’ encouraged team members to stay later than they normally would, logging extra time so that they could fix bugs and keep the team’s velocity up on user story work that had point values.  This was completely unsustainable.  Since we didn’t estimate bugs using story points, it wasn’t obvious from a backlog perspective that we were amassing a weight of technical debt that made visible forward progress really difficult to show or quantify. C is bad news, and it is even worse if you combine it with the dysfunction of adding bugs for unfinished scope (for the record I never tried that but I’ve seen it fail miserably).

B also has some interesting dynamics that usually require some coaching of the Product owner to get through but trust me, this is the way you want to go.  Bugs are scope in every way that a user story is.  Implementing a fix costs money and time.  Record them in your ALM system.  Get the Product Owner to review and prioritize them.  Assign point values to them.  Plan for them.  Product Owners, especially those who have been exposed to the dysfunction of C will sometimes think it’s unfair that the team should be able to assign a point value to a bug.  They sometimes get the wacky idea that the team would be incentivized to create bugs in order to get the points when fixing them, ultimately in some ill-conceived ploy to inflate the team’s average velocity.

The argument against this misunderstanding is to point out that yes, compared to a situation where the team regularly fixes bugs for ‘free’, we might have an increased velocity that number by itself is meaningless, because it only exists as a tool to measure how much longer it will take the team to finish the remaining scope of work for the product – which by the way also would be proportionally inflated in B.

In his book on Kanban, David Anderson introduces an economic model as a metaphor for viewing and discussing waste in Lean.  Bugs translate to “Failure Load” in this model.

“… [failure load is] demand generated by the customer that might have been avoided through delivering higher quality earlier.” – David J. Anderson

You can see in the below graphic (sourced from the book) that Failure Load would gradually and additively diminish the amount of Value-Added work (think new User Stories) that the team can do.  If you are not quantifying the bugs on the backlog using story points then the failure load remains an invisible force, pulling your team’s velocity down gradually in the case that you are devoting some portion of your sprint to addressing bugs, or worse yet, in the case that you aren’t creating invisible scope that will lie around and wait for the day when your team has run out of new Value-Added work to do and show up as a mountain of “project stabilization” time that needs to get tacked on to the end of the project.

12-14-2012 7-32-16 AM

Another important point about prioritizing bugs is that in asking your Product Owner to prioritize bugs, you afford instead of denying them the chance to place new Value Added work ahead of small, perhaps strategically meaningless Failure Load.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s