sa·cred cow
An idea, custom, or institution held, esp. unreasonably, to be above criticism.
In UX circles, testing is above reproach. It is a self-evident good. Anyone who doesn’t test is a moron. Test early, test often, test even more, test all the time.
Killing a sacred cow is a time-honored practice. It gives you the chance to say, “What if we didn’t do it the way we have always done it?” It is freedom to make new choices and to examine current truths. In my opinion, this sacred cow is ripe for discussion that few designers want to have.
Testing is not always a good idea and in fact can sometimes stifle innovation or great design.
– Glen Lipka (May 31, 2013)
Don’t get me wrong. I think testing has its place. I just think designers or executives will often use it inappropriately or in a counter-productive manner. Here are some use cases/anecdotes and why testing is good or not good.
A/B Testing
Done right: The best kind of testing is when you can do it with high volumes. An example of this is when I worked at Intuit with Avinash Kaushik. He showed me how this kind of testing really worked. Example: A change would be designed for the shopping cart. A hypothesis would be written that says what the intended change would be. (Such as: It will decrease shopping cart drop off.) Then, It would be shown to ~10% of the audience for a long enough period to gain statistical confidence. The results would show which is better, the “control” (the original) or the new design. The winner would be the new champion. I did this exact thing and earned Intuit $13MM additional dollars in 2006. Yay!
Done wrong: Quicken.com wanted something tested. I saw it and thought the change was pretty bad. So I made an additional test to go along side of it. The test with my changes made Quicken.com $250k more money for the year. The marketing executive refused to make it the new control. Why? Because she didn’t see it before it was tested. It was an ego thing. Testing failed to convince the human being to accept the test. Why bother testing in the first place if human egos negate the results?
Done REALLY wrong: Quicken (after that episode) hired Razorfish to redo the entire site. I told them I would do it for free and do a better job. Apparently, they didn’t believe me. Anyway, they paid $750K to Razorfish and had a site. I looked at it and said, “This is going to fail.” (I had very good reasons to back up that claim). They tested the site and it failed miserably. Purchases fell 60%. Millions of dollars were lost. Did they take the site down? Nope. They kept it up.
One might say it was a better global maxima. I might believe that except that they took the new site down completely within 18 months and redid it again, again. Testing failed to protect the company from losing millions of dollars because of human beings. Dumb human beings.
Summary: Large volume testing is the best case for testing in general. It goes wrong when decision-makers refuse to accept the findings.
Focus Group Testing
Done right: In the beginning of developing a new product/service, you often can gain insight into the personalities of your target audience. We did this at Marketo. We had a round table of marketers and asked them all kinds of interesting questions. I learned more about them in those little focus group sessions than I thought possible. It helped me design a great solution for them. It helped me sharpen the focus of my empathy for them.
Done wrong: Famous story. SONY in the 1980’s had a focus group and asked them all kinds of questions about black vs yellow music equipment like Walkmans (Walkmen?) and boom boxes. The overwhelming answer was that yellow was better. At the end, they said, “Go outside and pick out a thank you gift from us. Anything you’d like. Have a great day.” Outside were black and yellow electronics. What color did they choose? All black. People aren’t always the best judge of their past/present/future preferences. Predictably Irrational is a great book on the subject.
Done REALLY wrong: Seinfeld, All in the Family and other shows tested HORRIBLY in focus groups. They tested about the worst in the history of TV testing. However, they ended up being some of the best shows ever. How many shows failed to even make it on the air because no one stood up for the show to counteract the testing? Testing probably crushed many a would-be popular show while clunkers somehow make it on the air easily.
Summary: Use for exploration, not for proof if something is good or bad.
Usability Testing
Done right: Sometimes it is confusing why users are having trouble with a design. Is it the layout? The wording? The mental models? How can you tell? Usability testing helps the designers see exactly what is happening. There is a technique called “Hallway Usability” or “Ghetto Usability” which basically applies the usability testing process in a very, very informal way. In essence, you grab someone and make them use the system while you watch. I use this technique all of the time, very informally. I grab people who are users and ones who aren’t. I do it early and often in the process. I don’t keep track of any task completion rates or any other stats. It is to help understand the design in the hands of others. It is to help with empathy.
Done wrong: At a software company I was at a while back, they did formal usability testing. The results came in and, to my eye, were very inconclusive. There were many contradictory findings. The product managers and executives all argued over the results and what to do. Rather than design the best solution, they created a Frankenstein monster out of multiple parts of other designs. It was a mess and much worse than the original design. They refused to believe some parts of the findings (saying it was too few candidates). On the other hand, they adopted other parts despite it being only a single person’s experience. They basically took what they wanted and discarded the rest. It was impossible to design anything great after the testing because the human beings involved did not understand what the testing was for.
Done REALLY wrong: At Intuit (sorry for the multiple references) an executive told me that he wanted “Unexpected Wow!” I took the challenge and designed a great interface. It used jQuery, even though it was an early, early beta (2006). They did usability testing, of course. Intuit has two-way glass and video cameras..the works. Task completion was high and one of the users exclaimed, “Wow! That was unexpected! I like it!” Mission accomplished, right? Wrong. The executive said he didn’t like it. Why bother testing it in the first place? Its horribly frustrating as a designer to make something great, have it proven to be effective and then an executive throws it all away.
Summary: Grab random people and stick a mouse in their hand. I even had great luck with remote people by using on GotoMeeting or Join.me and making them the presenter. Watching their mouse was pretty helpful. Don’t write down the results. Don’t make it formal. Keep it simple. Just explore their usage. See where they get stuck. See where it goes right. Learn, iterate. Don’t get executives involved.
Other Testing Techniques
Ethnography: Sometimes known as “Follow Me Homes”, ethnographic research aims to help design by seeing the specific setting and details of the environment the user is in. Nothing wrong with this, but it takes alot of time. I suggest it in the beginning of a new product/service, but don’t overdo it. It can be a time suck and have diminishing returns.
Eye Tracking: Some really cool technology out there to do real-time eye movement replay. It’s fun and can lead to some insights, but unless you have a high volume site, it is probably a waste of time/money.
Multivariate Testing: I took several courses on DOE (Design Of Experiments) and learned way more about this than I ever wanted to. It is a way to mix up different elements to see the best “recipe”. This is great for e-commerce sites, but really complicated and not helpful for business applications and services. I couldn’t believe how complex it is to run and analyze this sort of testing. Experts exist, but this is not for the feint of heart.
How Testing Affects Designers
This is the key reason that I want to kill the sacred cow. I believe testing weakens products and makes designers abdicate their responsibility to make something awesome. When developing a new application online, you need to design a solution that people are going to love. This requires insight, creativity, hard-work and skill. It does not require any kind of formal testing.
Formal testing makes people think they can throw anything at the wall and take the best of the bunch. It diminishes the trust designers receive from executives and co-workers. It reduces the role of designer from original thinker to statistician. It lets executives think everyone has an equal opinion and that testing is all luck. Designers have no special skills in a world dominated by testing.
At Intuit, Avinash told me in our first meeting, “You don’t know anything. Only testing can give the truth.” My reply, “Who gets to decide what is tested?” Who is doing the design work? It turns out that Intuit (in 2006) was a “design by committee” environment. Horrible designs were tested because non-designers ran the process to make the tests. Thank goodness Avinash let me throw in my own tests into the mix, but if I wasn’t aggressive and confident, I would never have produced a single test worth while.
Designers should talk and collaborate with lots of people. Design works better when you have varied input from diverse channels. However, designers also need to feel a sense of ownership over their work. They need to be on the hook for some version of success. Otherwise, they will pass the buck to someone else in the organization. Sense of ownership is what is lacking from most of the designers I meet. They are beaten down by others in the organization.
Research and Testing has its place, but not at the expense of a designers skill and effort. Designers need to make great things and no amount of testing will yield an iPhone or a Tesla or Marketo. There needs to be vision and inspiration. I don’t see this in most products, sites or services. However, when I do, it wasn’t testing that made it that way. It was design.
Testing a New Complex Feature
I lead a team of UX designers on a business software product. The goal most of the time is to get a design accomplished in minimal time that covers a complex set of requirements, which users also love. These are tall orders. We do our “hallway usability” thing and collaborate with engineers and product managers. We work hard on the design specs and detail every little thing the feature does. Every edge case and system behavior properly defined. Then we launch the feature.
At this point, we could TEST the feature and polish it to a finer edge. We could sand down any splinters. However, the business always has new features that need design and inspiration. Product managers have the responsibility of defining the roadmap. If they don’t want to prioritize improving existing features, then the design team isn’t going to work on those things.
As a fast growing company, we are conquering new territory and building out the capabilities quickly. Our support team receives input from customers. Additionally, discussion boards and “Ideas” are a great place to get user feedback, not to mention User Group meetings. I hear the issues from these channels loud and clear. I usually agree with them or see a way to improve the UI. Even if “polishing” existing features was prioritized, testing doesn’t seem to offer more detail than these existing channels provide.
Wrap it up
Testing is a big giant set of disparate techniques for different purposes. This is probably the longest blog post I have ever written. There are lots of books written on the topic. Ultimately, I think testing is overused and misused on a regular basis. Many people out there assume it to be good without giving it proper though and attention.
My goal is to get people to think and get designers to design. I am not sure if this did any of that. Maybe I could test the blog post.
Whatya think?