Tuesday, June 28, 2016

Curious Customer

I currently work on a pretty small team, 4 devs (including myself). We have no one dedicated strictly to QA. A few years ago we ran into a few unexpected issues with our software. I hesitate to call them bugs, because they only appeared when you did things that made little sense. We write internal-only software, thus we expect a minimum level of competency from our users. In addition, it's tempting justify ignoring problematic nonsensical behavior in the name of not having to write and maintain additional software.

But, when I wasn't in denial, I was willing to admit that these were in fact bugs and they were costing us time.

The problems caused by these bugs were small, e.g. a burst of worthless emails, a blip in data flowing to the application. The emails could be quickly deleted, and the application was eventually consistent. Thus I pretended as though these issues were of low importance, and that the pain was low for both myself and our customers. I imagine that sounds foolish; in retrospect, it was foolish. The cost of developer context switching is often very high, higher if it's done as an interrupt. Introducing noise into your error reporting devalues your error reporting. Users can't as easily differentiate between good data, a blip of bad data due to something they did, and actual bad data, thus they begin to distrust all of the data.

The cost of these bugs created by nonsensical behavior is high, much higher than the cost of writing and maintaining the software that eliminated these bugs.

Once we eliminated these bugs, I spent notably more time happily focused on writing software. For me, delivering features is satisfying; conversely, tracking down issues stemming from nonsensical behavior always feels like a painfully inefficient task. I became very intent on avoiding that inefficiency in the future. The team brainstormed on how to address this behavior, and honestly we came up with very little. We already write unit tests, load tests, and integration tests. Between all of our tests, we catch the majority of our bugs before they hit production. However, this was a different type of bug, created by behavior a developer often wouldn't think of, thus a developer wasn't very likely to write a test that would catch this issue.

I proposed an idea I wasn't very fond of, the Curious Customer (CC): upon delivery of any feature you could ask another developer on the team to use the feature in the staging environment, acting as a user curiously toying with all aspects of the feature.

Over a year later, I'm not sure it's such a bad idea. In that year we've delivered several features, and (prior to release) I've found several bugs while playing the part of CC. I can't remember a single one of them that would have led to a notable problem in production; however all of them would have led to at least one support call, and possibly a bit less trust in our software.

My initial thought was: asking developers to context switch to QAing some software they didn't write couldn't possibly work, could it? Would they give it the necessary effort, or would they half-ass the task and get back to coding?

For fear of half-ass, thus wasted effort, I tried to define the CC's responsibilities very narrowly. CC was an option, not a requirement; if you delivered a feature you could request a CC, but you could also go to production without a CC. A CC was responsible for understanding the domain requirements, not the technical requirements. It's the developers responsibility to get the software to staging, the CC should be able to open staging and get straight to work. If the CC managed to crash or otherwise corrupt staging, it was the developers responsibility to get things back to a good state. The CC doesn't have an official form or process for providing feedback; The CC may chose email, chat, or any mechanism they prefer for providing feedback.

That's the idea, more or less. I've been surprised and pleased at the positive impact CC has had. It's not life changing, but it does reduce the number of support calls and the associated waste with tracking down largely benign bugs, at least, on our team.

You might ask how this differs from QA. At it's core, I'm not sure it does in any notable way. That said, I believe traditional QA differs in a few interesting ways. Traditional QA is often done by someone whose job is exclusively QA. With that in mind, I suppose we could follow the "devops" pattern and call this something like "devqa", but that doesn't exactly roll off the tongue. Traditional QA is also often a required task, every feature and/or build requires QA sign off. Finally, the better QA engineers I've worked with write automated tests that continually run to prevent regression; A CC may write a script or two for a single given task, but those scripts are not expected to be valuable to any other team member now or for anyone (including the author) at any point in the future.

Thursday, June 16, 2016

Maintainability and Expect Literals

Recently, Stephen Schaub asked the following on the wewut group:
Several of the unit test examples in the book verify the construction of both HTML and plain text strings. Jay recommends using literal strings in the assertions. However, this strikes me as not a particularly maintainable approach. If the requirements regarding the formatting of these strings changes (a very likely scenario), every single test that verifies one of these strings using a literal must be updated. Combined with the advice that each test should check only one thing, this leads to a large number of extremely brittle tests.

Am I missing something here? I can appreciate the reasons Jay recommends using literals in the tests. However, it seems that we pay a high maintainability price in exchange for the improved readability.
I responded to Stephen; however, I've seen similar questions asked a few times. Below are my extended thoughts regarding literals as expected values.

In general, given the option of having many similar strings (or any literal) vs a helper function, I would always prefer the literal. When a test is failing I only care about that single failing test. If I have to look at the helper function I no longer have the luxury of staying focused on the single test; now I need to consider what the helper function is giving me and what it's giving all other callers. Suddenly the scope of my work has shifted from one test to all of the tests coupled by this helper function. If this helper function wasn't written by me, this expansion in scope wasn't even my decision, it was forced upon me by the helper function creator. In the best case the helper function could return a single, constant string. The scope expansion becomes even worse when the helper function contains code branches.

As for alternatives, my solution would depend on the problem. If the strings were fairly consistent, I would likely simply duplicate everything knowing that any formatting changes can likely be addressed using a bulk edit via find and replace. If the strings were not consistent, I would look at breaking up the methods in a way that would allow me to verify the code branches using as little duplication as possible, e.g. if I wanted to test a string that dynamically changed based on a few variables, I would look to test those variables independently, and then only have a few tests for the formatting.

A concrete example will likely help here. Say I'm writing a trading system and I need to display messages such as

"paid 10 on 15 APPL. $7 Commission. spent: $157"
"paid 1 on 15 VTI. Commission free. spent: $15"
"sold 15 APPL at 20. $7 Commission. collected: $293"
"sold 15 VTI at 2. Commission free. collected: $30"

There's quite a bit of variation in those messages. You could have 1 function that creates the entire string:
confirmMsg(side, size, px, ticker)

However, I think you'd end up with quite a few verbose tests. Given this problem, I would look to break down those strings into smaller, more focused functions, for example:

describeOrder(side, size, px, ticker)
describeTotal(side, size, px, ticker)

Now that you've broken down the function, you're free to test the code paths of the more focused functions, and the test for confirmMsg becomes trivial. Something along the lines of
assertEquals("paid 10 on 15 APPL",
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("sell 15 APPL at 10",
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"standard"}))

assertEquals("$7 Commission", 
assertEquals("Commission free", 

assertEquals("spent: $157", 
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("collected: $143", 
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("spent: $150", 
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"free"}))
assertEquals("collected: $150", 
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"free"}))

assertEquals("order. commission. total", 
  confirmMsg("order", "commission", "total"))
I guess I could summarize it by saying, I should be able to easily find and replace my expected literals. If I cannot, then I have an opportunity to further break down a method and write more focused tests on the newly introduced, more granular tests.

Thursday, June 25, 2015

Drop Books

The vast majority of books I purchase are for my own enjoyment, but not all of them. There are a few books that I buy over and over, and drop on the desks of friends and colleagues. These books, all technical, are books that I think most programmers will benefit from reading. I call these books "Drop Books"; I drop them and never expect them to be returned.

My main motivation for dropping books is to spread what I think are great ideas. Specifically, I'm always happy to spread the ideas found in the following books:
I know a few of my friends buy Drop Books as well. Spreading solid ideas and supporting authors seems like a win/win to me; hopefully more and more people will begin to do the same.

Monday, April 06, 2015

Unit Testing Points of View, Probably

Michael Feathers, Brian Marick, and I are collaborating to create a new book: Unit Testing Points of View ... probably.


In 2014 Martin Fowler provided Technical Review for Working Effectively with Unit Tests. As part of his feedback he said something along the lines of: I'm glad you wrote this book, and I'll likely write a bliki entry noting what I agree with and detailing what I would do differently. I'm still looking forward to that entry, and I think the idea is worth extending beyond Martin.

Unit testing is now mainstream, has tons of entry level books, and has a great reference book. The result is a widely accepted idea that you can quickly and easily adopt; unfortunately, I've found little in the way of documenting pattern trade-offs. I believe that combination leads to a lot of waste. To help avoid some of that waste, I'd like to see more written about why you may want to choose one Unit Testing style over another. That was the inspiration and my goal for Working Effectively with Unit Tests. Growing Object-Oriented Software is another great example of a book that documents both the hows and whys of unit testing. After that, if you're looking for tradeoff guidance... good luck.

Is this a problem? I think so. Without discussion of tradeoffs, experience reports, and concrete direction you end up with hours wasted on bad tests and proclamations of TDD's death. The report of TDD's death was an exaggeration, and the largest clue was the implication that TDD and Unit Testing were synonymous. To this day, Unit Testing is still largely misunderstood.

What could be

Working Effectively with Unit Tests starts with one set of opinionated Unit Tests and evolves to the types of Unit Tests that I find more productive. There's no reason this evolution couldn't be extended by other authors. This is the vision we have for Unit Testing Points of View. Michael, Brian, and I began by selecting a common domain model. The first few chapters will detail the hows and whys of how I would test the common domain model. After I've expressed what motivates my tests, Brian or Michael will evolve the tests to a style they find superior. After whoever goes second (Brian or Michael) finishes, the other will continue the evolution. The book will note where we agree, but the majority of the discussion will occur around where our styles differ and what aspect of software development we've chosen emphasize by using an alternative approach.

Today, most teams fail repeatedly with Unit Tests, leading to (at best) significant wasted time and (at worst) abandoning the idea with almost nothing gained. We aim to provide tradeoff guidance that will help teams select a Unit Testing approach that best fits their context.

As I said above: There's no reason this evolution couldn't be extended by other authors. In fact, that's our long term hope. Ideally, Brian, Michael and I write volume one of Unit Testing Points of View. Volume two could be written by Kevlin Henney, Steve Freeman, and Roy Osherove - or anyone else who has interest in the series. Of course, given the original inspiration, we're all hoping Martin Fowler clears some time in his schedule to take part in the series.

Why "Probably"?

Writing a book is almost always a shot in the dark. Martin, Michael, Brian, and I all think this a book worth writing (and a series worth starting), but we have no idea if people actually desire such a book. In the past you took the leap expecting failure and praying for success. Michael, Brian, and I believe there's a better way: leanpub interest lists (here's Unit Testing Points of View's). We're looking for 15,000 people to express interest (by providing their email addresses). I'm writing the first 3 chapters, and they'll be made available when we cross the 15k mark. At that point, Michael and Brian will know that the interest is real, and their contributions need to be written. If we never cross the 15k mark then we know such a book isn't desired, and we shouldn't waste our time.

If you'd like to see this project happen, please do sign up on the interest list and encourage others to as well (here's a tweet to RT, if you'd like to help).

Sunday, March 15, 2015

My Answers for Microservices Awkward Questions

Earlier this year, Ade published Awkward questions for those boarding the microservices bandwagon. I think the list is pretty solid, and (with a small push from Ade) I decided to write concise details on my experience.

I think it's reasonable to start with a little context building. When I started working on the application I'm primarily responsible for microservices were very much fringe. Fred George was already giving (great) presentations on the topic, but the idea had gained neither momentum nor hype. I never set out to write "microsevices"; I set out to write a few small projects (codebases) that collaborated to provide a (single) solid user experience.

I pushed to move the team to small codebases after becoming frustrated with a monolith I was a part of building and a monolith I ended up inheriting. I have no idea if several small codebases are the right choice for the majority, but I find they help me write more maintainable software. Practically everything is a tradeoff in software development; when people ask me what the best aspect of a small services approach is, I always respond: I find accidental coupling to be the largest productivity drain on projects with monolithic codebases. Independent codebases reduce what can be reasonably accidentally coupled.

As I said, I never set out to create microservices; I set out to reduce accidental coupling. 3 years later, it seems I have around 3 years of experience working with Microservies (according to the wikipedia definition). I have nothing to do with microservices advocacy, and you shouldn't see this entry as confirmation or condemnation of a microservices approach. The entry is an experience report, nothing more, nothing less.

On to Ade's questions (in bold).

Why isn't this a library?:
It is. Each of our services compile to a jar that can be run independently, but could just as easily be used as a library.

What heuristics do you use to decide when to build (or extract) a service versus building (or extracting) a library?:
Something that's strictly a library provides no business value on it's own.

How do you plan to deploy your microservices?:
Shell scripts copy jars around, and we have a web UI that provides easy deployment via a browser on your laptop, phone, or anywhere else.

What is your deployable unit?:
A jar.

Will you be deploying each microservice in isolation or deploying the set of microservices needed to implement some business functionality?:
In isolation. We have a strict rule that no feature should require deploying separate services at the same time. If a new feature requires deploying new versions of two services, one of them must support old and new behavior - that one can be deployed first. After it's run in production for a day the other service can be deployed. Once the second service has run in production for a day the backwards compatibility for the first service can be safely removed.

Are you capable of deploying different instances (where an instance may represent multiple processes on multiple machines) of the same microservice with different configurations?:
If I'm understanding your question correctly, absolutely. We currently run a different instance for each trading desk.

Is it acceptable for another team to take your code and spin up another instance of your microservice?:

Can team A use team B's microservice or are they only used within rather than between teams?:
Anyone can request a supported instance or fork individual services. If a supported instance is requested the requestor has essentially become a customer, a peer of the trading desks, and will be charged the appropriate costs allocated to each customer. Forking a service is free, but comes with zero guarantees. You fork it, you own it.

Do you have consumer contacts for your microservices or is it the consumer's responsibility to keep up with the changes to your API?:
It is the consumer's responsibility.

Is each microservice a snowflake or are there common conventions?:
There are common conventions, and yet, each is also a snowflake. There are common libraries that we use, and patterns that we follow, but many services require sightly different behavior. A service owner (or primary) is free to make whatever decision best fits the problem space.

How are these conventions enforced?:
Each service has a primary, who is responsible for ensuring consistency (more here). Before you become a primary you'll have spent a significant amount of time on the team; thus you'll be equipped with the knowledge required to apply conventions appropriately.

How are these conventions documented?:
They are documented strictly by the code.

What's involved in supporting these conventions?:
When you first join the team you pair exclusively for around 4 weeks. During those 4 weeks you drive the entire time. By pairing and being the primary driver, you're forced to become comfortable working within the various codebases, and you have complete access to someone who knows the history of the overall project. Following those 4 weeks you're free to request a pair whenever you want. You're unlikely to become a service owner until you've been on the team for at least 6 months.

Are there common libraries that help with supporting these conventions?:
A few, but not many to be honest. Logging, testing, and some common functions - that's about it.

How do you plan to monitor your microservices?:
We use proprietary monitoring software managed by another team. We also have front line support that is on call and knows who to contact when they encounter issues they cannot solve on their own.

How do you plan to trace the interactions between different microservices in a production environment?:
We log as much as is reasonable. User generated interactions are infrequent enough that every single one is logged. Other interactions, such as snapshots of large pieces of data, or data that is updated 10 or more times per second, are generally not logged. We have a test environment were we can bring the system up and mirror prod, so interactions we miss in prod can (likely) be reproduced in dev if necessary. A few of our systems are also experimenting with a logging system that logs every interaction shape, and a sample per shape. For example, if the message {foo: {bar:4 baz:45 cat:["dog", "elephant"]}} is sent, the message and the shape {foo {bar:Int, baz:Int, cat:[String]}} will be logged. If that shape previous existed, neither the shape nor the message will be logged. In practice, logging what you can and having an environment to reproduce what you can't log is all you need 95% of the time.

What constitutes a production-ready microservice in your environment?:
Something that provides any value whatsoever, and has been hardened enough that it should create zero production issues under normal working circumstances.

What does the smallest possible deployable microservice look like in your environment?:
A single webpage that displays (readonly) data it's collecting from various other services.

I'm happy to share more of my experiences with a microservice architecture, if people find that helpful. If you'd like elaboration on a question above or you'd like an additional question answered, please leave a comment.