Some Metrics are Ok, Most are Stupid

Yesterday at Quality Conference in Waterloo, Paul Holland gave a great keynote on bad metrics.  Anybody who knows me knows I’m not a fan of metrics.  Pretty much every metric can be gamed or mis-used.  One aspect missing from the keynote, in my opinion, was use of metrics at the appropriate level for the right reason.

As an example, in the eyes of a business person, 100% code coverage via tests sounds great.  Why wouldn’t you want 100% code coverage?  That has to be good doesn’t it?  Well, not really.   When a business person hears “100% code coverage” they can interpret that as “awesome, we have perfect quality!”.   I worked with a client where the team actually said to the stakeholders that the quality would be excellent because they had 100% code coverage.    What if customers couldn’t figure out how to use the software and had to keep calling the helpdesk which drives up cost?  What if the infrastructure is brittle and the software is frequently offline?  What if the feature that has “100% code coverage” doesn’t even work because there’s some stupid validation problem with data being entered?  Is that ‘awesome quality’?

Nope.

That said, measuring code coverage is a great metric FOR THE TEAM.    Teams can use tools like Sonar to analyse the code so they can see what areas have coverage, what areas don’t, what areas are more complex than others etc.  It’s not a metric that should be used by the business and stakeholders as a quality measurement.

A much more effective measurement of ‘quality’ from a business perspective can be inbound helpdesk calls.  If your call volume keeps going up, you have a problem.  Who knows what that problem is, it could be more usage from more users, it could be legit ‘quality’ problems and more.   Use that metric as a way to start a conversation.

One interesting point I hadn’t considered before was that executives like ‘stupid metrics’ possibly because of liability issues.  Let’s say the company get’s sued for some reason because the service was offline or broken.  They may need some metrics to show what quality controls exist.   I can understand that but I think the metrics used in such a situation don’t have to be the stupid metrics Paul was talking about.  In that situation, showing your quality controls is more important than say,% feature coverage.

Stupid Metrics and How They Can be Gamed

  • pass/fail %: if 100% of your tests pass but the tests aren’t actually doing anything.  It’s easy to write stupid tests to bring that number up.
  • bugs reported per tester: easily gamed, promotes bad competition between testers (IE: “ooh, this pixel is out of place, I’m reporting a bug!”)
  • Total # of Test Cases: more test cases is meaningless, you want quality, not quantity
  • Test Cases executed by team/individual: test cases vary by size and complexity
  • Target % of Test Cases Executed: Is it better to have 95% of test cases executed where the last 5% are gamed than to have 90% legitimate test cases executed when you have a go or no-go release meeting?
These are only a few, Paul had a long list of other stupid metrics.  As this conference was tester-focused, these metrics were obviously skewed towards testing.  The moral of the keynote for me, which I preach regularly, is use metrics as a way to start a conversation, not as a conclusion.  Your business evolves so your metrics should too.    Challenge why you’re collecting these metrics otherwise you end up blindly reporting the same metrics that are more than likely gamed for the sake of, well, status quo I suppose.

What are Better “Quality” Metrics for the Business?

  • Escaped Defects: defects reported from the field
  • Feature Usage: build your system to track important feature usage.  Define a tolerance for identifying that something isn’t working quite right.   Suppose your system has a file uploader and you usually get an average of 500 file uploads a day and suddenly it dips to 450.  Is that a problem? I dunno, you define the threshold based on what makes sense for your business.
  • Registrations: obviously this is dependant on what your system does, if your consumer SaaS software gets 1000 registrations per day and one day it dips to 200, something is wrong.  People will argue “well, that shoulda been caught by QA”.  Support the ‘registration test cases’ passed but the problem was somewhere else.  Suppose it’s a config problem or deployment problem that wasn’t caught.  There could be lots of reasons, the point is, this is meaningful for business people.
  • InBound Support: Why are customers calling you?  Is the software busted or buggy?  Are customers having a hard time figuring out how to use a feature?  A team I worked with had a feature that changed the filename by adding random characters at the end of the filename in order to make the file name not guessable.  This was in a regulated industry and clients would name their file “filename-Q1-2010.pdf” so snoopers would simply request “filename-Q2-2010.pdf” on the chance that was the file name.  It resulted in some big companies having sensitive financial data exposed before it was supposed to be.  Customers would upload their file to insert onto their website for publishing later but the file would disappear from the uploader control because the name was different and they couldn’t find it.  Worse, they weren’t told about this change.  The featured “worked as designed” and the tests passed but customers were pissed.  Is that quality?
  • Defects by Component or Feature: This can help the business prioritize the right work.

This list can go on and on, the point is, with any metric you need to discover what makes sense in your context and use them appropriately.  Some metrics are best if only used by the team for feedback.  Make sure you’re not using metrics to measure team or individual productivity or to punish seemingly under-performers.  Use metrics as a conversation starter, not as a false sense of security about the quality of your product and make sure your metrics evolve as your business evolves.