Archives for posts with tag: usability testing

She wrote to me to ask if she could give me some feedback about the protocol for a usability test. “Absolutely,” I emailed back, “I’d love that.”

By this point, we’d had 20 sessions with individual users, conducted by 5 different researchers. Contrary to what I’d said, I was not in love with the idea of getting feedback at that moment, but I decided I needed to be a grown-up about it. Maybe there really was something wrong and we’d need to start over.

That would have been pretty disappointing – starting over – because we had piloted the hell out of this protocol. Even my mother could do it and get us the data we needed. I was deeply curious about what the feedback would be, but it would be a couple of days before the concerned researcher and I could talk.

This was a protocol for a usability test of county election websites. It was just before the November 2012 Presidential election, and Cyd Harrell and I wanted to seize that moment to learn where voters looked for answers to their questions about elections, and whether they were successful finding useful, clear answers that they could act on. There was a tight window to conduct this research in. We had wanted to do as many individual remote moderated usability test sessions between the end of September and Election Day as we could manage. We needed help.

Fortunately, we had 300 new friends from a Kickstarter campaign and a roster of UX researchers collected over the years who we could call on. Amazingly, 30 people volunteered to help us. But not all were known to us, and many told us that they had not done this kind of thing before. There was no way we were going to turn away free labor. And it seemed important to include as many people in the research as possible. How were we going to train a bunch of (generous, awesome) strangers who were remote from us to do what we needed done?

Clearly, we needed to leave no room for error. So, even though the study participants would be exploring as they tried to find answers to their questions on county election websites, this would not be an exploratory study. Cyd and I agreed that we needed to design support into the research design. (We also agreed that we wouldn’t allow anyone who didn’t do the training to conduct sessions.)

Focus on the research question

Everything in a study should be done in the service of the research question. But it’s easy to lose sight of the Big Question when you’re planning logistics. So, in the same way that I’m constantly asking every team I work with, “What do you want the user’s experience to be?”, Cyd and I kept asking ourselves, “Does what we’re planning help us answer our research questions?” We had two questions:

  • What questions do voters have about elections and voting?
  • How well do county election department websites answer voters’ questions?

We developed an instrument for our volunteer researchers that combined a script with data collection in a SurveyMonkey form. (SurveyMonkey, as you might have guessed from the name, is a tool for setting up and conducting surveys.) SurveyMonkey wasn’t meant to do what we were making it do, so there were some things about the instrument that were clunky. But pilot testing the instrument helped smooth out the wording in the scripted parts, the prompts for the data collection, and the order of the questions and tasks.

Pilot test and then pilot test again

I wrote the instrument and did a dry run. Cyd tried it out. We made some changes. Then Cyd asked her mom to try it. We made some more changes. Whitney Quesenbery joined us, tried the instrument, and gave us feedback. We got one of the volunteers to try it out. We made even more changes. After about 6 pilot tests of the instrument, we decided it was ready for our volunteer researchers to be trained on.


To train our volunteers, we scripted a walkthrough of the instrument, what to do and what to say, and then delivered the training through GoToMeeting in 45 minutes. We held several of these sessions over a few evenings (a couple of them quite late Eastern Time to make it possible for people in Pacific Time to attend after regular working hours). The training demonstrated the instrument and gave volunteers a chance to ask questions. Anyone who wanted to conduct sessions with participants was required to attend training. Of the 30 people who originally volunteered, 16 attended training and ended up conducting sessions.

Snowball rehearsals, pairing pros with proto-pros

There were a few volunteers who looked like pros on paper, but none of the core team knew them and their skills. So Cyd or Whitney or I paired with these folks for a session, and if they did well, they were allowed to do sessions on their own. There were still a few people who weren’t user researchers at all, but who were interested in learning or who had other kinds of interviewing experience. We paired these people, who we called proto-pros, with approved pros to take notes or to do the data collecting while the pro conducted the interview. Some of them graduated to doing sessions on their own, too.

And this, my friends, is how we got 41 30-minute sessions done over a few weeks.

Office hours

Cyd and I also made ourselves available online to answer questions or address issues through Campfire, a closed group chat tool from 37Signals. We invited all the volunteers to Campfire, and sent out notices by email of when we’d be holding office hours there. A few volunteers did indeed have questions, which we could then clarify right in Campfire and then send out answers by email. Nothing came up that meant changing the script.

Check the data

Every now and then I wanted to know how many sessions we’d done, but I also wanted to make sure that the data was good and clean. Because the instrument was set up in SurveyMonkey, I could go look at the data as it came in. I could tell who had collected it by the participant numbers assigned, which used the researcher’s initials along with the date. This way, if I had a question or something didn’t seem right, I could go back to the researcher to correct the data. Fortunately, we never needed to do that.

A solid script focused the session

So many interesting things happened as we observed people trying to find answers to their questions on their own county election websites. We learned about what the most-asked questions and how people asked them. We heard what people had to say about whether and why they had or had not been to the site before. And we learned whether people found the answers to their questions.

We did not track where they went on the way to finding answers or giving up. And that is what the earnest volunteer researcher had wanted to talk about. “Can’t we add some fields to the data collector to put in notes about this? People get so lost! Some of these sites are train wrecks!” she wanted to know. My answer: “It’s fascinating, isn’t it? But, no.” What would we do with that data over 40 or more sessions? It wasn’t as if we were going to send feedback to every site. I reminded her of the research questions. “We want to know whether voters find answers, not how they find answers on the sites,” I said. “But if you want to take notes about how it happens, those notes could be helpful to understanding the rest of the data. And I can’t tell you how much the whole team appreciates you doing this. We don’t want you to take on extra work! We’ve already asked so much of you.”

Whew. No need to start over. “You’ll get to do more sessions within the time you have available,” I said, “If you just stick to the script.”


We’re seeking people who think they’re going to “retire” – whatever that means – within the next 5 years to tell us about their experiences with preparing for the next phase financially and otherwise.

It’s a 2-hour conversation. We’ll bring cookies. (Or fruit, if you’ve had too many cookies over the holidays).

We put “retire” in quotes because we’re wondering what it really means these days. And that’s the conversation.

I’m going to ask questions about what your big worries are in preparing, what you think the event of retiring will be like, and what happens afterward. We will talk about money, but no specifics.

Times and dates available – BOSTON:

  • January 11 evening
  • January 12 afternoon or evening
  • January 13 late morning
  • January 26 afternoon

Besides the cookies, we’ll also pay the participant $150.

Ideally, we’d also like to have the person there who the participant makes all their big decisions with. That person will also get cookies and $150.

I will come to their house, if they’re okay with that.

We don’t care how old they are or what they’re retiring from. Timing is more important. And if they’ve “retired” more than once, we’d love to hear about that.

Interested? Know someone who is? Contact Sandy Olson or Dana Chisnell.


There’s a usability testing revival going on. I don’t know if you know that.

This new testing is leaner, faster, smarter, more collaborative, and covers more ground in less time. How does that happen? Everyone on the team is empowered to go do usability testing themselves. This isn’t science, it’s sensible design research. At it’s essence, usability testing is a simple thing: something to test, somewhere that makes sense, with someone who would be a real user.

But not everyone has time to get a Ph.D. in Human Computer Interaction or cognitive or behavioral psychology. Most of the teams I work with don’t even have time to attend a 2-day workshop or read a 400-page manual. These people are brave and experimental, anyway. Why not give them a tiny, sweet tool to guide them, and just let them have at it? Let us not hold them back.

Introducing the
Usability Testing Pocket Guide


11 simple steps to ensure users can use your designs

This 32-page, 3.5 x 5-inch book includes steps and tips, along with a quick checklist to help you know whether what you’re doing will work.

The covers are printed on 100% recycled chipboard. The internal pages are vegetable-based inks on 100% recycled papers. The Field Guides are printed by Scout Books and designed by Oxide Design Co.

These lovelies are designed for designers, developers, engineers, product managers, marketers, and executives to learn useful techniques within minutes. The prescriptions within come from masters of the craft, who have been doing and teaching usability testing for as long as the world has known about the method.

Printed copies will be available for sale in January 2013.

Here’s a view inside:

Oxide_USW-Usability-Testing-Guide_03-2_Page_05   Oxide_USW-Usability-Testing-Guide_03-2_Page_06

I’ve seen it dozens of times. The team meets after observing people use their design, and they’re excited and energized by what they saw and heard during the sessions. They’re all charged up about fixing the design. Everyone comes in with ideas, certain they have the right solution to the remedy frustrations users had. Then what happens?

On a super collaborative team everyone is in the design together, just with different skills. Splendid! Everyone was involved in the design of the usability test, they all watched most of the sessions, they participated in debriefs between sessions. They took detailed, copious notes. And now the “what ifs” begin:

What if we just changed the color of the icon? What if we made the type bigger? What if we moved the icon to the other side of the screen? Or a couple of pixels? What if?

How do you know you’re solving the right problem? Well, the team thinks they’re on the right track because they paid close attention to what participants said and did. But teams often leave that data behind when they’re trying to decide what to do. This is not ideal.

Getting beyond “what if”

On a super collaborative team, everyone is rewarded for doing the right thing for the user, which in turn, is the right thing for the business. Everyone is excited about learning about the goodness (or badness) of the design by watching users use it. But a lot of teams get stuck in the step after observation. They’re anxious to get to design direction. Who can blame them? That’s where the “what ifs” and power plays happen. Some teams get stuck and others try random things because they’re missing one crucial step: going back to the evidence for the design change.

Observing with an open mind

Observations tell you what happened. That is, you heard participants say things and you saw them do things — many, many interesting, sometimes baffling things. Good things, and bad things. Some of those things backed up your theories about how the design would work. Some of the observations blew your theories out of the water. And that’s what we do usability testing to see: In a low risk situation like a small, closed test, what will it be like when our design is out in the wild.

Brainstorming the why

The next natural step is to make inferences. These are guesses or judgments about why the things you observed happened. We all do this. It’s usually what the banter is all about in the observation room.

“Why” is why we do this usability testing thing. You can’t get to why from surveys or focus groups. But even in direct observation, with empirical evidence, why is sometimes difficult to ferret out. A lot of times the participants just say it. “That’s not what I was looking for.” “I didn’t expect it to work that way.” “I wouldn’t have approached it that way.” “That’s not where I’d start.” You get the idea.

But they don’t always tell you the right thing. You have to watch. Where did they start? What wrong turns did they take? Where did they stop? What happened in the 3 minutes before they succeed or failed? What happened in the 3 minutes after?

It’s important to get judgments and guesses out into the fresh air and sunshine by brainstorming them within the team. When teams make the guessing of the why an explicit act that they do in a room together, they test the boundaries of their observations. It’s also easy to see when different people on the team saw things similarly and where they saw them differently.

Weighing the evidence

And so we come to the crucial step, the one that most teams skip over, and why they end up in the “what ifs” and opinion wars: analysis. I’m not talking about group therapy, though some teams I’ve worked with could use some. Rather, the team now looks at the strength of the data to support design decisions. Without this step, it is far too easy to choose the wrong inference to direct the design decisions. You’re working from the gut, and the gut can be wrong.

Analysis doesn’t have to be difficult or time-consuming. It doesn’t even have to involved spreadsheets.* And it doesn’t have to be lonely. The team can do it together. The key is examining the weight of the evidence for the most likely inferences.

Take all those brainstormed inferences. Throw them in to a hat. Draw one out and start looking at data you have that supports that being the reason for the frustration or failure. Is there a lot? A little? Any? Everyone in the room should be poring through their notes. What happened in the sessions? How much? How many participants had a problem? What kinds of participants had the problem? What were they trying to do and how did they describe it?

Answering questions like these, among the team, gets us to understanding how likely is it that this particular inference is the cause of the frustration. After a few minutes of this, it is not uncommon for the team to collectively have an “ah ha!” moment. Breakthrough comes as the team eliminates some inferences because they’re weak, and keeps others because they are strong. Taking the strong inferences together, along with the data that shows what happened and why snaps the design direction right into focus.

Eliminating frustration is a process of elimination

The team comes to the design direction meeting knowing what the priority issues were. Everyone has at least one explanation for the gap between what the design does and what the participant tried to do. Narrowing those guesses to what is the most likely root cause based on the weight of the evidence – in an explicit, open, and conscious act –takes the “what ifs” out of the next version of a design, and shares the design decisions across the team.

* Though 95% of data analysis does. Sorry.

It was a spectacularly beautiful Saturday in San Francisco. Exactly the perfect day to do some field usability testing. But this was no ordinary field usability test. Sure, there’d been plenty of planning and organizing ahead of time. And there would be data analysis afterward. What made this test different from most usability tests?

  • 16 people gathered to make 6 research teams
  • Most of the people on the teams had never met
  • Some of the research teams had people who had never taken part in usability
    testing before
  • The teams were going to intercept people on the street, at libraries, in farmers’
  • markets

Ever heard of Improv Everywhere? This was the UX equivalent. Researchers just appeared out of the crowd to ask people to try out a couple of designs and then talk about their experiences. Most of the interactions with participants were about 20 minutes long. That’s it. But by the time the sun was over the yardarm (time for cocktails, that is), we had data on two designs from 40 participants. The day was amazingly energizing.

How the day worked
The timeline for the day looked something like this:

Coordinator checks all the packets of materials and supplies

Coordinator meets up with all the researchers for a briefing

Teams head to their assigned locations, discuss who should lead, take notes, and intercept

Most teams reach their locations, check in with contacts (if there are contacts), set up

Intercept the first participants and start gathering data

Break when needed

Finish up collecting data, head back to the meeting spot

Teams start arriving at the meeting spot with data organized in packets

Everybody debriefs about their experiences, observations

Researchers head home, energized about what they’ve learned

Researchers upload audio and video recordings to an online storage space

On average, teams came back with data from 6 or 7 participants. Not bad for a 3-hour stretch of doing sessions.

The role of the coordinator
I was excited about the possibilities, about getting a chance to work with some old friends, and to expose a whole bunch of people to a set of design problems they had not been aware of before. If you have thought about getting everyone on your team to do usability testing and user research, but have been afraid of what might happen if you’re not with them, conducting a study by flash mob will certainly test your resolve. It will be a
lesson in letting go.

There was no way I could join a team for this study. I was too busy coordinating. And I wanted to be available in case there was some kind of emergency. (In fact, one team left the briefing without copies of the thing they were testing. So I jumped in a car to deliver to them.)

Though you might think that the 3-or-so hours of data collection might be dull and boring for the coordinator, there were all kinds of things for me to do: resolve issues with locations, answer questions about possible participants, reconfigure teams when people had to leave early. Cell phones were probably the most important tool of the day.

I had to believe that the planning and organizing I had done up front would work for people who were not me. And I had to trust that all the wonderful people who showed up to be the flash mob were as keen on making this work as I was. (They were.)

Keys to making flash mob testing work
I am still astonished that a bunch of people would show up on a Saturday morning to conduct a usability study in the street without much preparation. If your team is half as excited about the designs you are working on as this team was, taking a field trip to do a flash mob usability test should be a great experience. That is the most important ingredient to making a flash mob test work: people to do research who are engaged with the project, and enthusiastic about getting feedback from users.

Contrary to what you might think, coordinating a “flash” test doesn’t happen out of thin air, or a bunch of friends declaring, “Let’s put on a show!” Here are 10 things that made the day work really well to give us quick and dirty data:

1.    Organize up front
2.    Streamline data collection
3.    Test the data collection forms
4.    Minimize scripting
5.    Brief everyone on test goals, dos and don’ts
6.    Practice intercepting
7.    Do an inventory check before spreading out
8.    Be flexible
9.    Check in
10.    Reconvene the same day

Organize up front

Starting about 3 or 4 weeks ahead of time, pick the research questions, put together what needs to be tested, create the necessary materials, choose a date and locations, and recruit researchers.

Introduce all the researchers ahead of time, by email. Make the materials available to everyone to review or at least peek at as soon as possible. Nudge everyone to look at the stuff ahead of time, just to prepare.

Put together everything you could possibly need on The Day in a kit. I used a small roll-aboard suitcase to hold everything. Here’s my list:

  • Pens (lots of them)
  • Clipboards, one for each team
  • Flip cameras (people took them but did most of the recording on their phones)
  • Scripts (half a page)
  • Data collecting forms (the other half of the page)
  • Printouts of the designs, or device-accessible prototypes to test
  • Lists of names and phone numbers for researchers and me
  • Lists of locations, including addresses, contact names, parking locations, and public transit routes
  • Signs to post at locations about the study
  • Masking tape
  • Badges for each team member – either company IDs, or nice printed pages with the first names and “Researcher” printed large
  • A large, empty envelope

About 10 days ahead, I chose a lead for each of the teams (these were all people who I knew were experienced user researchers) and talked with them. I put all the stuff listed above in a large, durable envelope with the team lead’s name on it.

Streamline data collection

The sessions were going to be short, and the note-taking awkward because of doing this research in ad hoc places, so I wanted to make data collection as easy as possible. Working from a form I borrowed from Whitney Quesenbery, I made something that I hoped would be quick and easy to fill in and easy for me to understand what the data meant later.

Data collector for our flash mob usability test

The data collection form was the main thing I spent time on in the briefing before everyone went off to collect data. There are things I will emphasize more, next time, but overall, this worked pretty well. One note: It is quite difficult to collect qualitative data in the wild by writing things down. Better to audio record.

Test the data collection forms

While the form was reasonably successful, there were some parts of it that didn’t work that well. Though a version of the form had been used in other studies before, I didn’t ask enough questions about the success or failure of the open text (qualitative data) part of the form. I wanted that data desperately, but it came back pretty messy. Testing the data collection form with someone else would have told me what questions researchers would have about that (meta, no?), and I could have done something else. Next time.

Minimize scripting

Maximize participant time by dedicating as much time to the session as possible to their interacting with the design. That means that the moderator does nothing to introduce the session, instead relying on an informed consent form that one of the team members can administer to the next participant while the current one is finishing up.

The other tip here is to write out the exact wording for the session (with primary and follow up questions), and threaten the researchers with being flogged with a wet noodle if they don’t follow the script.

Brief everyone on test goals, dos and don’ts

All the researchers and I met up at 10am and had a stand-up meeting in which I thanked everyone profusely for joining me in the study. And then I talked about and took questions on:

  • The main thing I wanted to get out of each session. (There was one key concept that we wanted to know whether people understood from the design.)
  • How to use the data collection forms. (We walked through every field.)
  • How to use the script. (“You must follow the script.”)
  • How to intercept people, inviting them to participate. (More on this below.)
  • Rules about recordings. (Only hands and voices, no faces.)
  • When to check in with me. (When you arrive at your location; at the top of each hour, when you’re on the way back.)
  • When and where to meet when they were done.

I also handed cash that the researchers could use for transit or parking or lunch, or just keep.

Practice intercepting people

Intercepting people to participate is the hardest part. You walk up to a stranger on the street asking them for a favor. This might not be bad in your town. But in San Francisco, there’s no shortage of competition. Homeless people, political parities registering voters, hucksters, buskers, and kids working for Greenpeace all wanting attention from passers-by. And there you are, trying to do a research study. So, how to get some attention without freaking people out? A few things that worked well:

  • Put the youngest and/or best-looking person on the task.
  • Smile and make eye contact.
  • Using cute pets to attract people. Two researchers who own golden retrievers brought their lovely dogs with them, which was a nice icebreaker.
  • Start off with what you’re not: “I’m not selling anything, and I don’t work for Greenpeace. I’m doing a research study.”
  • Start by asking for what you want: “Would you have a few minutes to help us make ballots easier to use?”
  • Take turns – it can be exhausting enduring rejection.

Do an inventory check before spreading out

Before the researchers went off to their assigned locations, I asked each team to check that they had everything they needed, which apparently was not thorough enough for one of my teams. Next time, I will ask each team to empty out the contents of the packet and check the contents. I’ll use the list of things I wanted to include in each team’s packet and my agenda items for the briefing to ask the teams to look for each item.

Be flexible

Even with lots of planning and organizing, things happen that you couldn’t have anticipated. Researchers don’t show up, or their schedules have shifted. Locations turn out to not be so perfect. Give teams permission to do whatever they think is the right thing to get the data – short of breaking the law.

Check in

Teams checked in when they found their location, between sessions, and when they were on their way back to the meeting spot. I wanted to know that they weren’t lost, that everything was okay, and that they were finding people to take part. Asking teams to check in also gave them permission to ask me questions or help them make decisions so they could get the best data, or tell me what they were doing that was different from the plan. Basically, it was one giant exercise in The Doctrine of No Surprise.

Reconvene the same day

I needed to get the data from the research teams at some point. Why not meet up again and share experiences? Turns out that the stories from each team were important to all the other teams, and extremely helpful to me. They talked about the participants they’d had and the issues participants ran into with the designs we were testing. They also talked about their experiences with testing this way, which they all seemed to love. Afterward, I got emails from at least half the group volunteering to do it again. They had all had an adventure, met a lot of new people, got some practice with skills, and helped the world be a become a better place through design.

Wilder than testing in the wild, but trust that it will work

On that Saturday in San Francisco the amazing happened: 16 people who were strangers to one another came together to learn from 40 users about how well a design worked for them. The researchers came out from behind their monitors and out of their labs to gather data in the wild. The planning and organizing that I did ahead of time let it feel like a flash mob event to the researchers, and it gave them room to improvise as long as they collected valid data. And it worked. (See the results.)

P.S. I did not originate this approach to usability testing. As far as I know, the first person to do it was Whitney Quesenbery in New York City in the autumn of 2010.