Kanban Reboot, Part I

The team I’m working with now has been using Scrum for a few years. Over the first couple of weeks I noticed we were having a bunch of challenges and based on my experience I knew Scrum wasn’t a good fit for the type of work we were doing. The work was generally adhoc by nature, had multiple input streams and were of drastically varying sizes.

Some of the outcomes we were experiencing included not meeting the sprint commitment and missed release targets and client deliveries. The team had also done things like having 3 day sprints or an extra 1 day sprint after the 2 week sprint had finished so they could finish all the work. We had determined that the causes of these outcomes were largely due to poorly defined stories and frequent interruptions. Going a bit deeper we determined poorly defined stories were the result of needing to keep people busy which meant stories were rushed into the sprint which stemmed from a general lack of a vision for the product. Sprint demos often ended in disaster (“thats not what i wanted!!) and it just “felt like” the team was always exhausted and not motivated to do anything differently to get different outcomes. We talked about many other causes but given the insanely fast growth of the company, we were all doing the best we could with what we had. After all, if you only know how to use a hammer, you’ll use a hammer even if it’s not the best tool of choice.

I had asked the team a few months ago, ok, maybe 5 months ago now, about how Scrum was working and maybe a more flow-based approach might be better. I find out they had tried Kanban last year and they said it didn’t work. I was curious so I probed. Some of reasons were:

  • “with kanban you just keep working and never release” – I attributed that one to a general mis-understanding of how to use Kanban. That was most of the data I needed but we continued on
  • kanban became an excuse for no prioritization
  • kanban became an excuse to push all kinds of adhoc work because there was no sprint to interrupt
  • lack of product direction
  • poorly defined stories
  • no WIP limits
  • no explicit policies

I could go on, but it was clear to me Kanban didn’t NOT work, there was very little understanding of what it was and how to apply in our context. I’ve found that a common occurrence. On the surface Scrum and Kanban are brain-dead simple, how to implement them in your context is the tricky part.

I kept probing other areas of the business and un-covered some other challenges:

  • no predictability around releases
  • many workarounds being implemented because “product development” was a black hole
  • half-done or ‘good enough’ changes being applied in production

Some other interesting systemic effects were happening as well. Inputs into the product team were circumventing the “process” by setting all issues as blockers just to get some attention. We had a bad habit of making fires out of pretty much anything just to get the attention of the developer team.

This may sound like a pretty dysfunctional place but in reality, the business was doing pretty well. I was amazed to see how much was actually getting done, sales continued to rise and due to the nature of our business, the demand for new features wasn’t high so balancing new features vs maintenance wasn’t as stressful as I’ve seen elsewhere.

So where to start? It took about 4 months to cultivate the seed that Scrum wasn’t a good fit, the status quo was, and still is, very strong. This was a challenge for me, as a consultant you’re brought in with the intent of change. As an employee, it’s more difficult especially because I am very opinionated and while I don’t know very much, the few things I do know, I know very well. It’s hard to shut my mouth sometimes!

I managed this problem by being patient and asking the people affected by these challenges what they saw, compiled the information and suggested alternatives. That seemed to work and the inputs to the developer team thought we were headed in the right direction. I’ll post more about that in a separate post.

Now the good stuff!

We are starting simple:

  • 1 expedite class of service with an explicit policy:
    • can exceed WIP limit by 1
    • expedited items are agreed upon by team and stakeholders (because we usually consider work items as emergencies even when they’re not.  This is a policy to help us relax and assess it first before jumping on it)
  • WIP limit of 3 for in development items (there are 4 people on the team, I don’t count because I’m the product owner!)
  • Done defined for each workflow step

Our workflow is simple:

Queue (5) Analysis/Acceptance Tests (2) Dev/Test (3) Business Acceptance (5) Queued for Deployment (5) Post Production Validation (5) Done

Some Notes:

  • while it is completely evil of course, we are batching releases because we can’t continuously deploy yet and our customers couldn’t handle more frequent releases right now
  • the queue will be fed by weekly prioritization meetings with all input streams
  • Analysis/Acceptance step defines ‘done’ for the work item
  • Dev/Test step includes code review, unit and/or functional tests where possible, built and deployed with all tests passing on our integration environment
  • Business Acceptance step includes the a-ok from your truly
  • Queue for Deployment includes all production change scripts and release plan
  • Post Production Validation includes our newly created smoke tests and manual validation for specific work items that need it.
  • we’ll t-shirt size stuff to get a sense of cycle time and lead time for those items
  • bi-weekly retrospectives and standups will continue to happen

The outcomes we are looking to achieve:

  • smaller and more frequent releases (our current release cycle is > 6 months so by ‘more frequent’ I mean 2 – 3 months)
  • focusing on doing the right work instead of keeping people busy
  • provide better visibility into the work being done
  • using cycle/lead time data for better planning

Some challenges we expect:

  • balancing client vs product vs maintenance work.  We thought about multiple input queues with limits in each or dedicated capacity but decided on starting simple and adjust based on what outcomes we get.
  • not having explicit policies for other classes of service (we’ll figure that out as we go)
  • pissed off stakeholders because they will hear NO or ‘not right now’ more often than before!
  • we release updates to clients in batches so they’re not really ‘done’ until all clients have been updated, we may add a ‘release kanban’ or sub-columns under “post production validation” to manage this

We see this as a positive change, time will tell how well it works.  The status quo of dropping work and re-focusing on other work is very strong, just yesterday an ‘issue’ came up and the first thought was that it’s a high priority!!! Do it now!!!  After a quick discussion we realized it could wait because the WIP limit was full and we figured out a couple of those in progress items would be finished the next day.

I’m going to update our progress as we go so you can see how we progressed through the transition.  The great thing is that we have defined a process based on our context and used business outcomes as the driver, not process adherence.    I’m sure it’ll be a bumpy ride!  Let me know your thoughts!