Are You A/B Testing Your Designs? Quantifying Design Success

By Laura Klein (Principal, Users Know)
The other day, I posted something I strongly believe on Twitter. A few people disagreed. I’d like to address the arguments, and I’d love to hear feedback and counter-arguments in the comments where you have more than 140 characters to tell me I’m wrong.

My original tweet was, “I don’t trust designers who don’t want their designs a/b tested. They’re not interested in knowing if they were wrong.”

Here are some of the real responses that I got on Twitter with my longer form response.

“There’s a difference between A/B testing (public) and internally deciding. Design is also a matter of taste.”

I agree. There is a big difference between A/B testing in public and internally deciding. That’s why I’m such a huge fan of A/B testing. You can debate this stuff for weeks, and often it’s a huge waste of time.

When you’re debating design internally, what you should be asking is “which of these designs will be better for the business and users.”
A/B testing tells you conclusively which side is right. Debate over!

Ok, there’s the small exception of short term vs. long term effects, which is addressed later, but in general, it’s more definitive than the opinion of the people in the room.

With regard to the “matter of taste,” that’s both true and false. Sure, different people like different designs. What you’re saying by refusing to A/B test your designs is that your taste as a designer should always trump that of the majority of your users. As long as you like your design, you don’t care whether users agree with you.

If you want your design aesthetic to override that of your users, you should be an artist. I love art. I even, very occasionally, buy some of it.

But I pay for products all the time, and I tend to buy products that I think are well designed, not necessarily ones where the designer thought they were well designed.

“If Apple had done A/B tests for the iPod in 2001 with a user-replaceable battery, that version would’ve likely won—initially.”

Honestly, it still might win. Is taking your iPod to the Apple store when the battery dies really a feature? No! It’s a design tradeoff. They couldn’t create something with the other design elements they wanted that still had a replaceable battery. That’s fine. ?

But all other things about the iPod being totally equal, wouldn’t you buy the one where you could replace the battery yourself? I would. The key there is the phrase “totally equal.”

“Seeing far into the future of technology is not something consumers are particularly great at.”

I feel like the guy who made this argument was confusing A/B testing with bad qualitative testing or just asking users what they would like to see in a product.

This isn’t what A/B testing does. A/B testing measures actual user behavior right now. If I make this change, will they give me more money? It has literally nothing to do with asking users to figure out the future of technology.

“A/B testing has value but shouldn’t be litmus test for designer or a design”

Really? What should be the litmus test for a designer or a design if not, “does this change or set of changes actually improve the key metrics of my company”?

In the end, isn’t that the litmus test for everybody in a company? Are you contributing to the profitability of the business in some way?

If you have some better way of figuring out if your design changes are actually improving real metrics, I’d love to hear about it. We can make THAT the litmus test for design.

“Data is valuable but must be interpreted. Doesn’t “prove” wrongness or rightness. Designer still has judgment.”

I agree with the first sentence. Data certainly must be interpreted. I even agree that certain design changes may hurt certain metrics, and that can be ok if they’re improving other metrics or are shown to improve things in the long run.

But the only way to know if your overall design is actually making things better for your users is by scientifically testing it against a control.

If your overall design changes aren’t improving key metrics, where’s the judgement there? If you release something that is meant to increase the number of signups and it decreases the number of signups, I think that pretty effectively “proves wrongness.”

The great thing about A/B testing is that you know when this happens.

“Is it the designers fault, surely more appropriate to an IA? After all the IA should dictate the feel/flow.”

First off, I don’t work for companies that are big enough to draw a distinction between the two, but I’m sure there’s enough blame to go around.

Secondly, I think that everybody in an organization has the responsibility to improve key metrics. If you think that your work shouldn’t increase revenue, retention, or other numbers you want higher, why should you be employed?

Design of all kinds is important and can have a huge impact on company profitability. That impact can and should be measured. You don’t get a pass just because you’re not changing flow.

“A/B tests are a snapshot of current variables. They don’t embody nor convey a bigger strategy or long-term vision.”

Also, “That’s only an absolute truth you can rely on if you A/B test for the entire lifespan of the product, which defeats the point.”

These are excellent points, and they are a drawback of A/B testing. It’s sometimes tough to tell what the long term effects of a particular design change are going to be from A/B testing. Also, A/B testing doesn’t easily account for design changes that are a part of a larger design strategy.

In other words, sometimes you’re going to make changes that cause problems with your metrics in the short term, because you strongly believe that it’s going to improve things long term.

However, I believe that you address this by recognizing the potential for problems and designing a better test, not by refusing to A/B test at all.

Just because this particular tool isn’t perfect doesn’t mean we get to fall back on “trust the designers implicitly and never make them check their work.” That doesn’t work out so well sometimes either.

There’s one really good argument that I didn’t get, although some of the above tweets touched on it. Sometimes changes that individually test well don’t test well as a whole.

This is a really serious problem with A/B testing because you can wind up with Frankenstein-style interfaces. Each individual decision wins, but the combination is a giant mess.

Again, you don’t address this by not A/B testing. You address it by designing better tests and making sure that all of your combined decisions are still improving things.

Look, if I’m hiring for a company that wants to make money (and most of them do), I want my designers to understand how their changes actually affect my bottom line.

No matter how great a designer thinks his or her design is, if it hurts my revenue and retention or other key metrics, it’s a bad design for my company and my users.

Saying you’re against having your designs A/B tested sounds like you’re saying that you just don’t care whether what you’re changing works for users and the company. As a designer, you’re welcome to do that, but I’m not going to work with you.

This post was originally posted at Users Know.

About the guest blogger: Laura Klein is a Principal at Users Know, helping you get to know your users and create better products. Her goal is to help lean startups and other small companies improve their connection to their users and design better products, working directly with startups as a member of the team, not only to design a great product, but also to help you learn how to involve your users in the design process. Follow her on Twitter at @lauraklein.