How many of you have run usability tests that look like this: Individual, one-hour sessions, in which the participant is performing one or more tasks from a scenario that you and your team have come up with, on a prototype, using bogus or imaginary data. It’s a hypothetical situation for the user, sometimes, they’re even role-playing.

Anyone? That’s what I thought. Me too. I just did it a couple of weeks ago.

But that model of usability testing is broken. Why? Because one of the first things we found out is that the task we were asking people to do – doing some basic financial estimates based on goals for retirement – involved more than the person in the room with me.

For the husbands, the task involved their wives because the guys didn’t actually know what the numbers were for the household expenses. For the women, it was their children, because they wanted to talk to them about medical expenses and plans for assisted living. For younger people it was their parents or grandparents, because they wanted to learn from them how they’d managed to save enough to help them through school and retire, too.

There’s a conversation there. There’s a support network there. And that’s what’s broken about usability testing. It always has been.

I first started thinking about this when Google launched Buzz. Buzz used Gmail users’ contacts to automatically generate online social networks to connect users’ most frequent contacts. Google employees – 20,000 of them – had been using Buzz inside the garden walls for a year. A nice, big sample. The problem became evident, however, when Buzz was let into the wild — almost immediately. One example: A blogger who calls herself Harriet is one of the most famous cases. She wrote about how one of her most frequent correspondents in Gmail was her boyfriend. Another was her abusive ex-husband. Now they were publicly connected, and this made her very, very unhappy. In fact, the post was titled, Fuck You, Google.

There might have been no harm done in the retirement planning study. But there might. Would the 31-year-old who broke down crying in the session because her mother was in late-stage ALS have had a better experience if we’d tested in her context, where she could work with her closest advisor – her dad? Might it have been a calming process, where she felt in control and became engaged in envisioning her independent future because someone she trusted could give her perspective that I could not? Maybe.

As for Buzz, Harriet certainly wasn’t pleased, and she was left with a mess to clean up. How to unconnect two people who were now connected?

When companies started doing usability testing with regularity in the 1980s, it was about  finding design problems in what now look like fairly simple UIs that frustrated or hindered users. It was one person, one machine, as the human did a usually work-based task. That’s why it was called computer-human interaction.

But today, technology large and small is fully integrated into peoples’ lives in a much more ephemeral, less compartmentalized way. It is rare to sit next to a corded phone holding the handset only talking and listening.

When you look at what the social web is, there are some characteristics that I think we’re not taking into account very well in doing usability tests:

– It’s about relationships among people
– In context
– Conducted fluidly, across time across time and space, continuously

I also think that people who are new to usability testing and user research are not going to do a good job of testing social interaction design, because constructing a study cannot be very formal or controlled. Measuring what’s happening is much more complex. And scope and scale make a difference. Testing Buzz with 20,000 Googlers for a year wasn’t enough; it took letting it out to a million people who hadn’t drunk the Koolaid to find the *real* problems, the real frustrations, the real hindrances that truly affect uptake and adoption.

The nature of online is social
Let’s back up and talk about a key definition. What I mean by “social” is anything that someone does that changes the behavior of someone else.

This is how I can say that being online *is* social. Email is social. Publishing a flat HTML document is social. Putting something on a calendar is social. Everything is social. Social isn’t the special secret sauce that you pour on top of an experience. Social is already there. Choosing a bank is social. Planning a vacation is social. Buying an appliance is social. I SMSd a series of photos to my boyfriend the other day, of me in different eyeglass frames because I couldn’t decide by myself. This was *not* a computer-centered, or an app-centered interaction. This was a decision being made by two people, a conversation, mediated by fully integrated technology in fluid activities in different contexts for two people. It was social.

Social isn’t sauce. It’s sustenance. It’s already there, and we’re not seeing it. So we’re not researching it, and we’re definitely not testing for it.

We have to stop thinking about human-computer interaction. That model by default is too limiting. Look around. It’s really about human relationships and interactions mediated by technology. Technology is supporting the communication, not driving it. Ask any parent who has used Facetime or Skype to have a video chat with their baby for the first time.

Scale is the game changer

Discount usability testing is great for some things, but what we’re really studying when doing user research and usability testing for the social web is social behavior. And that takes scale. That takes connections. That takes observing people’s real networks and understanding what makes those work, what makes those friends, family, colleagues, neighbors, acquaintances, associates, clients, vendors, pen pals, drinking buddies, partners for life, or friends with benefits.

Those are rich, life-framing relationships that affect how someone interacts with a user interface that most of us are not even scratching the surface of when when we “micro-test” a commenting feature on an online invitation web site.

“Task” doesn’t mean what you think it means

For the retirement planning tool, I did a little interview to start the session that I hoped would set some context for the participant to do the behavior that I wanted to observe. But it was woefully inadequate. Don’t get me wrong, the client wasn’t unhappy; they thought it was a cool technique. But as soon as I learned who the participant went to for financial advice, where was I? Putting the participant in a situation where they had to pretend. They did, and they did a fair job of it. But it was lacking.

But tasks are the wrong unit. What we’re asking people to do in usability tests is like attending a cocktail party while grocery shopping. Even with an interview, even with careful recruiting, it’s incongrous. There are very few discrete tasks in life. Instead there are activities that lead people to goals. Multiple activities and goals might be intermixed on the way to achieving any one of them. In the meantime, the technology is completely integrated, ambient, almost incidental. LIke asking your bf which eyeglass frames look nerdy-sexy, versus just nerdy.

The activity of interest isn’t computer based. Look at retirement planning. It’s *retirement planning*! That’s not the task. The activity is planning for the future, a future in which you have no real idea of what is going to happen, but you have hopes, aspirations.

Using Buzz is not a task. It’s not an activity. People who use Buzz don’t have the goal of connecting to other people, not in that deliberate way. They’re saying, hey, I’ve read something interesting you might be interested in, too. The task isn’t “sharing.” It’s putting  themselves out in the world hoping that people they care about will notice. How do you make that scenario in a usability test?

Satisfaction may now equal user control

The ISO measures of usability are efficiency, effectiveness, and satisfaction. What is effective about having Tweetdeck open all day long while you’re also writing a report, drafting emails, taking part in conference calls, attending virtual seminars, going back to writing a report, calling your mother?

When the efficiency measure went into the ISO definition, most people were measuring time on task. But if you don’t have a discrete task, how do you measure time?

Satisfaction may be the most important thing in the end for the social web, and that may be the degree to which the user feels she has control of the activities she’s doing while she’s using your tool. How much is the UI forcing her to learn the tool, versus quickly integrating it into her life?

Measuring success in the social web often defies what we’ve been taught to count for data. How do you measure engagement in the social web? Is it about time on the site? I could lurk on or Facebook all day. Am I engaged? Is it about minutes spent pursuing and perusing content? Is it about how likely someone is to recommend something to someone else? I wrote my first product review, EVER last week, for a pair of jeans on the Lands End web site. Am I engaged with the site? I would say no.

We have to look hard at the goodness of conventional metrics. They’re not translating to anything meaningful, I don’t think, because we’ve been thinking about all this all wrong – or not enough. What is goodness to a user? Control of her life. Control of her identify. Control of her information.

Users are continuously designing your UI

What does Task mean, what does Success mean, how do you measure new features that users create with your UI on the fly? Twitter has hashtags and direct messages. Users created those. Facebook is continuously being hacked for fresh activities. Look at commenting systems on blog posts or articles. Spammers, yes, but people are also talking to one another, arguing, flirting, solving problems, telling their own stories. No matter what you build, and what your intentions were in designing it, users are going to hijack it to make it useful to them. How do you test for that?

Ideas from smart people
I had all these questions and more when I met with a bunch of smart people who have been working in researching the social web. Out of that discussion came some great stories about what people had tried and worked, and what had not worked so well.

For creating task scenarios for usability tests, getting participants to tell stories of specific interactions helped. Doing long interviews helped learn context, scope, priorities, connections. Getting people to talk about their online profiles and explain relationships helped set the scene for activities. Getting them to use their own log-ins with their real relationships helped everyone know whether the outcomes were useful, usable, and desirable. Whether the outcomes were satisfying and even enriching.

Some of the people in this informal workshop also offered these ideas:

  • Screen sharing with someone outside the test lab or test situation
  • Making video diaries and then reviewing them retrospectively
  • Developing and testing at the same time, with users
  • Including friends or other connections in the same test session, setting up multi-user sessions
  • Sampling the experience in the same way that Flow was discovered: prompting people by SMS at select or random moments to ask people to report their behavior

There’s also bodystorming, critical incident analysis, co-designing paper or other prototypes. A few things seemed clear through that discussion. To make user research and usability testing useful to designers, we have to rethink how we’re doing it. It’s got to reflect reality a bit better, which means it takes more from social science and behavioral science than psychology. It takes more time. It takes more people. It takes a wider view of task, success, and engagement. And we’re just beginning to figure all that out.

Rethink research and testing

Everything is social. Scale is the game changer. Tasks aren’t what you think they are. User satisfaction may be about control. Users are continuously designing your UI. I invite you to work with me on rethinking how we’re doing user research and usability testing for what’s really happening in the world: fluid, context-dependent, relationships mediated by technology.

I want to thank Brynn Evans, Chris Messina, Nate Bolt, Ben Gross, Erin Malone, and Jared Spool for spending the better part of a day talking with me about their experiences in researching social. These musings come from that cooperative, ahem, social effort.