Tuesday, June 28, 2016

Curious Customer

I currently work on a pretty small team, 4 devs (including myself). We have no one dedicated strictly to QA. A few years ago we ran into a few unexpected issues with our software. I hesitate to call them bugs, because they only appeared when you did things that made little sense. We write internal-only software, thus we expect a minimum level of competency from our users. In addition, it's tempting justify ignoring problematic nonsensical behavior in the name of not having to write and maintain additional software.

But, when I wasn't in denial, I was willing to admit that these were in fact bugs and they were costing us time.

The problems caused by these bugs were small, e.g. a burst of worthless emails, a blip in data flowing to the application. The emails could be quickly deleted, and the application was eventually consistent. Thus I pretended as though these issues were of low importance, and that the pain was low for both myself and our customers. I imagine that sounds foolish; in retrospect, it was foolish. The cost of developer context switching is often very high, higher if it's done as an interrupt. Introducing noise into your error reporting devalues your error reporting. Users can't as easily differentiate between good data, a blip of bad data due to something they did, and actual bad data, thus they begin to distrust all of the data.

The cost of these bugs created by nonsensical behavior is high, much higher than the cost of writing and maintaining the software that eliminated these bugs.

Once we eliminated these bugs, I spent notably more time happily focused on writing software. For me, delivering features is satisfying; conversely, tracking down issues stemming from nonsensical behavior always feels like a painfully inefficient task. I became very intent on avoiding that inefficiency in the future. The team brainstormed on how to address this behavior, and honestly we came up with very little. We already write unit tests, load tests, and integration tests. Between all of our tests, we catch the majority of our bugs before they hit production. However, this was a different type of bug, created by behavior a developer often wouldn't think of, thus a developer wasn't very likely to write a test that would catch this issue.

I proposed an idea I wasn't very fond of, the Curious Customer (CC): upon delivery of any feature you could ask another developer on the team to use the feature in the staging environment, acting as a user curiously toying with all aspects of the feature.

Over a year later, I'm not sure it's such a bad idea. In that year we've delivered several features, and (prior to release) I've found several bugs while playing the part of CC. I can't remember a single one of them that would have led to a notable problem in production; however all of them would have led to at least one support call, and possibly a bit less trust in our software.

My initial thought was: asking developers to context switch to QAing some software they didn't write couldn't possibly work, could it? Would they give it the necessary effort, or would they half-ass the task and get back to coding?

For fear of half-ass, thus wasted effort, I tried to define the CC's responsibilities very narrowly. CC was an option, not a requirement; if you delivered a feature you could request a CC, but you could also go to production without a CC. A CC was responsible for understanding the domain requirements, not the technical requirements. It's the developers responsibility to get the software to staging, the CC should be able to open staging and get straight to work. If the CC managed to crash or otherwise corrupt staging, it was the developers responsibility to get things back to a good state. The CC doesn't have an official form or process for providing feedback; The CC may chose email, chat, or any mechanism they prefer for providing feedback.

That's the idea, more or less. I've been surprised and pleased at the positive impact CC has had. It's not life changing, but it does reduce the number of support calls and the associated waste with tracking down largely benign bugs, at least, on our team.

You might ask how this differs from QA. At it's core, I'm not sure it does in any notable way. That said, I believe traditional QA differs in a few interesting ways. Traditional QA is often done by someone whose job is exclusively QA. With that in mind, I suppose we could follow the "devops" pattern and call this something like "devqa", but that doesn't exactly roll off the tongue. Traditional QA is also often a required task, every feature and/or build requires QA sign off. Finally, the better QA engineers I've worked with write automated tests that continually run to prevent regression; A CC may write a script or two for a single given task, but those scripts are not expected to be valuable to any other team member now or for anyone (including the author) at any point in the future.

Thursday, June 16, 2016

Maintainability and Expect Literals

Recently, Stephen Schaub asked the following on the wewut group:
Several of the unit test examples in the book verify the construction of both HTML and plain text strings. Jay recommends using literal strings in the assertions. However, this strikes me as not a particularly maintainable approach. If the requirements regarding the formatting of these strings changes (a very likely scenario), every single test that verifies one of these strings using a literal must be updated. Combined with the advice that each test should check only one thing, this leads to a large number of extremely brittle tests.

Am I missing something here? I can appreciate the reasons Jay recommends using literals in the tests. However, it seems that we pay a high maintainability price in exchange for the improved readability.
I responded to Stephen; however, I've seen similar questions asked a few times. Below are my extended thoughts regarding literals as expected values.

In general, given the option of having many similar strings (or any literal) vs a helper function, I would always prefer the literal. When a test is failing I only care about that single failing test. If I have to look at the helper function I no longer have the luxury of staying focused on the single test; now I need to consider what the helper function is giving me and what it's giving all other callers. Suddenly the scope of my work has shifted from one test to all of the tests coupled by this helper function. If this helper function wasn't written by me, this expansion in scope wasn't even my decision, it was forced upon me by the helper function creator. In the best case the helper function could return a single, constant string. The scope expansion becomes even worse when the helper function contains code branches.

As for alternatives, my solution would depend on the problem. If the strings were fairly consistent, I would likely simply duplicate everything knowing that any formatting changes can likely be addressed using a bulk edit via find and replace. If the strings were not consistent, I would look at breaking up the methods in a way that would allow me to verify the code branches using as little duplication as possible, e.g. if I wanted to test a string that dynamically changed based on a few variables, I would look to test those variables independently, and then only have a few tests for the formatting.

A concrete example will likely help here. Say I'm writing a trading system and I need to display messages such as

"paid 10 on 15 APPL. $7 Commission. spent: $157"
"paid 1 on 15 VTI. Commission free. spent: $15"
"sold 15 APPL at 20. $7 Commission. collected: $293"
"sold 15 VTI at 2. Commission free. collected: $30"

There's quite a bit of variation in those messages. You could have 1 function that creates the entire string:
confirmMsg(side, size, px, ticker)

However, I think you'd end up with quite a few verbose tests. Given this problem, I would look to break down those strings into smaller, more focused functions, for example:

describeOrder(side, size, px, ticker)
describeCommission(ticker)
describeTotal(side, size, px, ticker)

Now that you've broken down the function, you're free to test the code paths of the more focused functions, and the test for confirmMsg becomes trivial. Something along the lines of
assertEquals("paid 10 on 15 APPL",
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("sell 15 APPL at 10",
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"standard"}))

assertEquals("$7 Commission", 
  describeCommission({tickerName:"APPL",commission:"standard"}))
assertEquals("Commission free", 
  describeCommission({tickerName:"APPL",commission:"free"}))

assertEquals("spent: $157", 
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("collected: $143", 
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("spent: $150", 
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"free"}))
assertEquals("collected: $150", 
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"free"}))

assertEquals("order. commission. total", 
  confirmMsg("order", "commission", "total"))
I guess I could summarize it by saying, I should be able to easily find and replace my expected literals. If I cannot, then I have an opportunity to further break down a method and write more focused tests on the newly introduced, more granular tests.