Google Website Optimizer and the iPhone...

What do they have in common? Even with the recent opening of Google Website Optimizer, much like the iPhone, you still have to hack GWO to get maximum value out of your testing. While I am grateful for improved support for factorial analyses and more help content, it would have been so easy to do better.

Here's the problem: Google Website Optimizer restricts your understanding of the effects of your experiment to a single outcome variable, like a conversion.

While getting consensus on a single overall evaluation criteria (OEC) is critical to successful ongoing testing and iteration in a business organization, you also want to use tests to improve the product teams understanding of the customer, products, the website experience, and their interactions.

Without the ability to go deeper to explain why an experimental condition drove the most conversions, you're simply playing roulette with your pixels, not building a better tuned product team.

So, yes, also like the iPhone, this limitation reduces the need for complex skills like statistical significance testing. (Not exposing a command line in the iPhone eliminates the challenge of unix).

There is a solution for those who aren't afraid of the truth... A set of enterprising analyst / coders have reverse engineered the GWO cookie and demonstrated how to port the values back over to Google Analytics. ROI Revolution shows how extract the GWO condition. While he shows integrating it with synthetic page tracker calls, I'd recommend using the "user defined" segmentation values via utmSetVar (old school) or pageTracker._setVar (new school ga.js).

Find more GWO power user tricks in my delicious feed gwo.

Getting Serious About Testing: Learn from the Pros

Last week's SXSW panel on AB Testing: Designer Friend or Foe left me wishing for a more robust treatment of the experimental design issues around online testing. It was a great panel, and I appreciated the real world experience of the panelists, but aside from Micah, the approach was very much from a design world. This is fine, but issues came up that stats exist to solve, and the distinction between multivariate and AB testing was glossed over.

In particular, designed well, multivariate testing can be used to test hypotheses about user models, not just a way to play roulette with font colors and sizes.

There is a robust body of knowledge that lives between statistics, traditional experimental psychology, cognitive modeling, and resting on the shoulders of giants in practical business success through experimentation.

The Exp Platform, led by Ronny Kohavi, at MSFT publishes from this position of strength. Their latest, 7 pitfalls to controlled experiments on the web, is a solid read for those aspiring to live in this space.

AB testing might indeed be a foe to the designer when done without appropriate expert support -- at least for more aggressive evaluations.

Here's a recap of the Seven Pitfalls:

  1. Avoiding experiments because computing the success metric is hard.
  2. Attempting to run experiments without the pre-requisites: representative & sufficient traffic, appropriate instrumentation, agreed upon metrics.
  3. Hubris: Over-optimization without collecting data along the way.
  4. Bad math: inappropriately deployed confidence intervals, % change, and interaction effects.
  5. Use of composite metrics when power is insufficent. An example, not in the paper, is the use of checkout completion for a product page change, when add-to-cart % would be more sensitive.
  6. Un-even sampling: bad balance between control and test distributions.
  7. Lack of robot detection.

I've blogged the guide to practical web experiments and it's also highly recommended. It provides an overview of the key issues to deal with in setting things up including sampling, failure versus success evaluation, and common pitfalls like day of the week effects.

More from the historical '05 SXSW Design perspective with How to Inform Design: How to Set Your Pants on Fire March 14th, 2005 presented by Nick Finck, Kit Seeborg, and Jeffrey Veen

SXSW: Driving Design From User Data

I wrote about the crucial conversation at SXSW with Micah Alpern a few weeks ago. The time has come!

In talking through this with Micah, we came back to the crucial insight that the availability of artifacts of the usage of internet software creates an opportunity and challenge for designers. What follows is a reference for our conversation, which will include a short intro and mostly conversation. Subject to conversational flow, we'll be asking the participants to share stories:

  1. What's your favorite HIPPO story? For those of you who haven't encountered the hippo meme, it's about decision making based upon something more than the highest paid individual's personal opinion.
  2. What business or user goal would you like to be informed by metrics?

The talk precis:

Design Metrics: Better Than 'Because I Said So':

Too often designers are put in a position of defending design decisions based on personal preference or an unarticulated sense of expertise. We'll discuss how to use metrics to understand user and business goals. Then how these metrics can be used to evaluate design decisions, make tradeoffs, and shape strategies.

Our goal is to better enable productive conversations with key stakeholders, using the tools of metrics to understand and advocate a position.

In the most productive cases, this means designing with measurement toward end goals in mind. In less developed scenarios, there may be some foundations in need of construction.

There are a lot of reasons to test designs with live users. The most pedestrian is business acceptance testing. We'll be more focused on using metrics to resolve internal debate, multivariate testing learn more about the motivations, mental models, and personas of users, as well as value estimation.

We believe the "Role of Designer " is to drive hypotheses about the user and to internalize results and use to inform future design.

Of course, testing is not the only tool in the toolbox. You have to choose the right tool for the job. Key dimensions:

  • Quantitative, Qualitative
  • Small vs. large scale
  • Advanced techniques: Sequence modeling, learnability metrics
  • Repeat vs. non-repeat visitor
That said, creative techniques with Greasemonkey or limited scale prototypes can make testing available in situations you might not think it's possible.

We're scheduled for 11:30 AM in Ballroom E on Monday. Hope to see you there. If you can't make it, stay tuned for a follow-up.

A taxonomy of motivations for website testing (A/B split, multivariate)

The intense competition for business on the internet has created an environment in which user interface experimentation is a critical process that can provide exceptional return on investment.

Here's my take on an inventory of ways to apply testing:

  • Business acceptance testing
  • Value estimation
  • Design choice determination
  • Customer understanding

Business Acceptance Testing

Example: You want to add yet another shortcut on the homepage for a new sub-audience. Use testing to validate you didn't mess up the other functions of the homepage.

Value Estimation

That new search function is going to cost you X thousands of dollars. How long will it take to provide a positive return on investment.

Design Choice Determination

The boss thinks the logo should be purple. You don't.

Customer Understanding

Multivariate methods are really valuable for this use case. Say you wanted to ask the question, "Is copy at the top of our product pages worthwhile? Or should we just drop it and get more products above the fold?"
A multivariate test can let you vary the size & quality of the copy, along with other elements that push the product down the page, and assess the general impact of products higher on the page as well as the general impact of good copy.

Built with BlogCFC, version 5.9. Contact Andy Edmonds or read more at Free IQ or SurfMind. © 2007.