Google Analytic Gems #2: Quantifying Deliberative Conversions

Another little known gem in Google Analytics is the Time to Purchase and Visits to Purchase.

For MyWeddingFavors, where the purchase is an exceptionally meaningful one for our customers, some 40% of our sales happen on subsequent visits:
.

Careful though! Looking into Days to Purchase, we see only 25% of sales happen on a different day than the introduction to the customer.



So many (15%) of our shopping experiences are stretched out over a day, while only 25% happen on a subsequent day visit. Understanding this pattern has some serious implications for design, business strategy, and e-commerce feature set.

Google Analytic Gems #1: Split Test Evaluation with Only New Users

A question on LinkedIn, now closed, asked "what are some hard to find but useful reports in GA?" While clicks to task completion is far from the ultimate metric, Google Analytics does suffer from some slightly onerous depth issues for specific data points.

We do a lot of split testing with Google Analytics by defining custom segments and serving each segment a different UI. One issue with understanding the impact of new features, particularly for sites with lots of repeat visitors (e.g. content / blog sites vs ecommerce), is the novelty effect. New features, or even simple changes in layout, can have a short term halo as users notice and engage with the changed content.

Google Analytics does allow you to look at your user defined segments for new and repeat visitors, but it does require a few clicks. Follow along with the picture:


Starting in the visitors submenu (1), the New & Returning report allows you to drill into New users (2). The segment drop down has lots of useful pivots, including "user defined" (3).

Picture #4 shows the results for new users of a split test that moved an mailing list subscribe box from left to right. The magnitude of the effect diminished over time as we tested this. However, by drilling into only new users, we see the original effect size. Looking at all users, the effect is smaller, and looking at returning users, the effect is smaller still.

Dealing with the halo effect is one of the reservations that was expressed during "AB Testing: Designer Friend or Foe" at SXSW. Splitting users into new and returning is one of the easiest strategies for seeing through this confound.

Getting Serious About Testing: Learn from the Pros

Last week's SXSW panel on AB Testing: Designer Friend or Foe left me wishing for a more robust treatment of the experimental design issues around online testing. It was a great panel, and I appreciated the real world experience of the panelists, but aside from Micah, the approach was very much from a design world. This is fine, but issues came up that stats exist to solve, and the distinction between multivariate and AB testing was glossed over.

In particular, designed well, multivariate testing can be used to test hypotheses about user models, not just a way to play roulette with font colors and sizes.

There is a robust body of knowledge that lives between statistics, traditional experimental psychology, cognitive modeling, and resting on the shoulders of giants in practical business success through experimentation.

The Exp Platform, led by Ronny Kohavi, at MSFT publishes from this position of strength. Their latest, 7 pitfalls to controlled experiments on the web, is a solid read for those aspiring to live in this space.

AB testing might indeed be a foe to the designer when done without appropriate expert support -- at least for more aggressive evaluations.

Here's a recap of the Seven Pitfalls:

  1. Avoiding experiments because computing the success metric is hard.
  2. Attempting to run experiments without the pre-requisites: representative & sufficient traffic, appropriate instrumentation, agreed upon metrics.
  3. Hubris: Over-optimization without collecting data along the way.
  4. Bad math: inappropriately deployed confidence intervals, % change, and interaction effects.
  5. Use of composite metrics when power is insufficent. An example, not in the paper, is the use of checkout completion for a product page change, when add-to-cart % would be more sensitive.
  6. Un-even sampling: bad balance between control and test distributions.
  7. Lack of robot detection.

I've blogged the guide to practical web experiments and it's also highly recommended. It provides an overview of the key issues to deal with in setting things up including sampling, failure versus success evaluation, and common pitfalls like day of the week effects.

More from the historical '05 SXSW Design perspective with How to Inform Design: How to Set Your Pants on Fire March 14th, 2005 presented by Nick Finck, Kit Seeborg, and Jeffrey Veen

Design Metrics Wrapup

What fun! The SXSW conversation format is quite cool, though it really needs a dedicated space as our group size was limited by how far voices carried.

Drop a line in the comments if we promised to follow up on something I haven't posted. This is a work in progress, so check back if there are some empty items when you visit.

References during the chat

Blogs

Resource Lists

Analytics

Usability Training

Books

...

Thanks to everyone who participated, and to Micah for bringing me along.

SXSW: Driving Design From User Data

I wrote about the crucial conversation at SXSW with Micah Alpern a few weeks ago. The time has come!

In talking through this with Micah, we came back to the crucial insight that the availability of artifacts of the usage of internet software creates an opportunity and challenge for designers. What follows is a reference for our conversation, which will include a short intro and mostly conversation. Subject to conversational flow, we'll be asking the participants to share stories:

  1. What's your favorite HIPPO story? For those of you who haven't encountered the hippo meme, it's about decision making based upon something more than the highest paid individual's personal opinion.
  2. What business or user goal would you like to be informed by metrics?

The talk precis:

Design Metrics: Better Than 'Because I Said So':

Too often designers are put in a position of defending design decisions based on personal preference or an unarticulated sense of expertise. We'll discuss how to use metrics to understand user and business goals. Then how these metrics can be used to evaluate design decisions, make tradeoffs, and shape strategies.

Our goal is to better enable productive conversations with key stakeholders, using the tools of metrics to understand and advocate a position.

In the most productive cases, this means designing with measurement toward end goals in mind. In less developed scenarios, there may be some foundations in need of construction.

There are a lot of reasons to test designs with live users. The most pedestrian is business acceptance testing. We'll be more focused on using metrics to resolve internal debate, multivariate testing learn more about the motivations, mental models, and personas of users, as well as value estimation.

We believe the "Role of Designer " is to drive hypotheses about the user and to internalize results and use to inform future design.

Of course, testing is not the only tool in the toolbox. You have to choose the right tool for the job. Key dimensions:

  • Quantitative, Qualitative
  • Small vs. large scale
  • Advanced techniques: Sequence modeling, learnability metrics
  • Repeat vs. non-repeat visitor
That said, creative techniques with Greasemonkey or limited scale prototypes can make testing available in situations you might not think it's possible.

We're scheduled for 11:30 AM in Ballroom E on Monday. Hope to see you there. If you can't make it, stay tuned for a follow-up.

SXSW Coming Up! Design Metrics: Better Than 'Because I Said So'

I'm greatly looking forward to SXSW 08 in a couple of weeks. I'll be doing a "core conversation" with Micah Alpern:

Core Conversation: Design Metrics: Better Than 'Because I Said So': Too often designers are put in a position of defending design decisions based on personal preference or an unarticulated sense of expertise. We'll discuss how to use metrics to understand user and business goals. Then how these metrics can be used to evaluate design decisions, make tradeoffs, and shape strategies.

While design efficacy can be treated as a contributor to overall site success, there are some more subtle metrics which can reveal specific strengths and weaknesses of design. I'll post a recap following the gig.

Looking for Usability Test Participants in Atlanta

We'll be running an eye tracking protocol in our usability lab from Thursday, the 21st through the following Thursday, the 28th.

There's a small cash compensation for an hour of participation at our offices in Norcross. Be sure to tell us you're an AB Testing reader and we'll give you the super gory debrief and show you some of the results from our state of the art Tobii eye tracker.

Our participant registration form is on survey monkey.

Online Video Metrics: How to Deal with Scrubbing?

Over at webmetricsguru.com, Marshall quotes the following key video metrics from Dennis @ Visual Revenue:

9 Essential Online Video Metrics

  • Online video started
  • Online video Pre-roll advertisement started*
  • Online video core content started
  • Online video Post-roll advertisement started*

  • Online video positive consumption action
  • Online video negative consumption action

  • Online video ended
  • Online video played, percentage of total
  • Online video played, seconds
As another blogger points out, things get really interesting when you start to consider embedded videos.

There is a challenge that neither of these authors mention -- what about user timeline scrubbing? Video complete doesn't mean the same thing if the user fast forwarded through most of it. Logging total time, % viewed, and complete gives you a bit of insight into this. Consider this range of user behavior:
DescriptionTotal Time Played% viewedComplete
Full view12:00100%Yes
Fast forward to watch a 2 minute segment2:1818%No
Screencast how-to view with pause, play actions while following instructions 16:00110%Yes
Quick Scan, fast foward, watch, etc5:0024%Yes

There are a lot of subtleties here: Do you double credit re-watching to allow > 100% viewed? If so, you confuse the real meaning of %. It's a good justification for logging % in addition to time, as otherwise, you could simply compute % as a normalization of user behavior across different video lengths.

We've created custom logging in the FreeIQ video player, both for the video embedder (who uses Google Analytics) and the management of the FreeIQ site. We simply log complete, but are working on a efficient way to capture some of the sublteties here.

From this logging, we computed an average 25% video completion for our Going Natural 2 series videos -- not bad given that these are greater than 20 minutes in length.

Why the Mouse Doesn't Always Keep Up with the Eye

There's a lot of buzz around "mouse tracking" and analytic tools that record mouse position, like the super nifty Robot Replay. It's natural to wonder if mouse tracking might offer some of the value of eye tracking at much lower cost and much greater scale. I've written about the state of understanding of mouse and eye synchronization before; this post looks at a different viewpoint, setting a maximum bound on the potential relationship between the mouse and the eye.

Fitt's law states that the time to move the mouse from one point to another is heavily influenced by both the distance of the mouse from the target and the size of the target. While it's very easy to over-apply this to site and software design, it is a solid truism of human computer interaction.

This relationship between time and distance doesn't hold in the same way for eye-movements. Given this, research from Google surprisingly shows an increased duration of "sweeps", or leftward motions like carriage returns homing the eye to the next line of text.

Beymer, D., Russell, D. M., Orton, P.Z. An Eye Tracking Study of How Pictures Influence Online Reading, INTERACT Conference, Rio de Janerio, Brazil (September, 2007) PDF-515Kb.
While the researchers do see time increase with distance, looking at the data more closely shows this is due to a large number of correction saccades, not an increase in the duration of the first saccade.

To illustrate this point in another way, when the font size changes from 10 to 14 point, the average distance of saccades increase. So, you read very similarly with different font sizes -- the increased size neither speeds or slows your saccades, despite changing the total distance.

This finding is also supported by Silbert, Jakob, et al. (2000):
Evaluation And Analysis Of Eye Gaze Interaction - Sibert ... 5 Speed and accuracy of saccadic eye movements: Characteristic. ... 2 Fitts' law and the microstructure of rapid discrete movement..
Eye movement is largely independent of distance.

However, for visual items that are not well separated by white space, like typical successive lines of text in the Beymer study, the accuracy of saccades does decrease as the distinctiveness of the target decreases.

While operating system designers, and perhaps browser designers -- any highly used software with toolbars -- do need to pay attention to button size, the key for good design for vision is not distance. It's an elusive property of contrast, shape, color, and even typographical semantics.

To return the the title theme of this post, the mouse simply can't keep up in many cases. The eye is a capable of moving more rapidly than the hand can move the mouse. Hence, Silbert, et all, and many other researchers have been able to get user efficiency gains from gaze based selection. The challenge of course, is distinguishing between intentional selection and simple inspection (aka the Midas Touch problem).

So, while mouse tracking can inform on where the users attention is focused, and it certainly is a great way to visualize user activity, the mouse is simply slower than the eye and destined to reveal less of the users behavior than eye tracking.

Have you authored a CLICK HERE link lately?

Tsk tsk. "Click here" is one of the most extensively used bad interface elements on the web. Descriptive hyperlinks are way more effective -- a drum that industry gurus have been arguing for some time.

There's some new data from Internet-Based Research group at UWash.

Spyridakis, J.H., Mobrand, K.A., Cuddihy, E.,and C.Y. Wei. Using Structural Cues to Guide Readers on the Internet. Information Design Journal, (in press).

Users got more out of the content in the Sem/Org link configuration than any other. Sem was the next runner up.

This illustrates the value of descriptive, informative links... not simply mechanical expressions like "next" or "click here".

One productive way to think about this is to imagine that the links the user clicks on are part of a conversation. It's quite boring to talk about the mechanics of using your web browser. If you want to impact the user in a significant way, whether it be to inform or sell, writing descriptive and informative links is the way to go.

If you're not convinced, check out some other views.

More Entries

Built with BlogCFC, version 5.9. Contact Andy Edmonds or read more at Free IQ or SurfMind. © 2007.