Design is decisions. How do you make those decisions? There are many competing schools of thought. I’ll break them down and give you my experience with them all.
A/B or Multivariate Testing
Google tested 41 shades of blue in their google logo. Avinash Kaushik (bless his heart) told me on my first day at Intuit, “You don’t know anything! Only testing can tell you the truth.” I took courses there on DoE (Design of Experiments) which explained the statistical and scientific models that support this method. This kind of testing can not be done in 60 seconds, it takes time to set up the proper environment.
The good side of testing is that it gives clear answers. The test has scientific results. Black & white winners and losers. It’s easy to make decisions when you have data that is unequivocal. Multivariate tests can even tell you the best combination of multiple combinations of elements on the page. What is not to like about clear, unambiguous results and conclusions?
Unfortunately, there is a down-side. Testing relies on something called Statistical Confidence. It means you need a decent sample size to get past the possibility that your results are just randomness. Example: Flip a coin 2-3 times. You may get a result that says heads will show up much more often. It’s not valid, it’s a random result. However, flip the coin 100 times and you will see closer to a 50/50 split. The higher the volume, the more confident you can be in the results.
But what if you want to test the design of a new web application that has no users. No volume in your audience severely limits the amount of DoE testing that is possible. Testing is just not feasible for many situations. I couldn’t imagine using testing on an enterprise application that is difficult to create alternate pathways for a set of users that is numbered in the hundreds, not millions.
An additional problem with this kind of testing is that it lends itself to incremental improvements and not revolutionary ideas. I designed many experiments at Intuit for their websites including Turbotax.com, Quickbooks.com, Quicken.com and Intuit.com. When you wanted to test a big idea, the entire page usually changed. This made it difficult to understand what about the change was good and what was bad. However, smaller changes on existing designs were easier to test, track and understand. I still got to test some big ideas, but I didn’t get to increment those ideas. This meant that I was stuck on Local Max Island.
In the end, I think testing works wonderfully for the situations that support it like incremental changes to existing designs for audiences that can generate statistical confidence. However, it stinks for other kinds of situations like revolutionary changes or new designs for audiences that don’t exist yet.
In the next few parts I will explore other methods of making design decisions.
Whatya think?