Archives for posts with tag: collecting data

She wrote to me to ask if she could give me some feedback about the protocol for a usability test. “Absolutely,” I emailed back, “I’d love that.”

By this point, we’d had 20 sessions with individual users, conducted by 5 different researchers. Contrary to what I’d said, I was not in love with the idea of getting feedback at that moment, but I decided I needed to be a grown-up about it. Maybe there really was something wrong and we’d need to start over.

That would have been pretty disappointing – starting over – because we had piloted the hell out of this protocol. Even my mother could do it and get us the data we needed. I was deeply curious about what the feedback would be, but it would be a couple of days before the concerned researcher and I could talk.

This was a protocol for a usability test of county election websites. It was just before the November 2012 Presidential election, and Cyd Harrell and I wanted to seize that moment to learn where voters looked for answers to their questions about elections, and whether they were successful finding useful, clear answers that they could act on. There was a tight window to conduct this research in. We had wanted to do as many individual remote moderated usability test sessions between the end of September and Election Day as we could manage. We needed help.

Fortunately, we had 300 new friends from a Kickstarter campaign and a roster of UX researchers collected over the years who we could call on. Amazingly, 30 people volunteered to help us. But not all were known to us, and many told us that they had not done this kind of thing before. There was no way we were going to turn away free labor. And it seemed important to include as many people in the research as possible. How were we going to train a bunch of (generous, awesome) strangers who were remote from us to do what we needed done?

Clearly, we needed to leave no room for error. So, even though the study participants would be exploring as they tried to find answers to their questions on county election websites, this would not be an exploratory study. Cyd and I agreed that we needed to design support into the research design. (We also agreed that we wouldn’t allow anyone who didn’t do the training to conduct sessions.)

Focus on the research question

Everything in a study should be done in the service of the research question. But it’s easy to lose sight of the Big Question when you’re planning logistics. So, in the same way that I’m constantly asking every team I work with, “What do you want the user’s experience to be?”, Cyd and I kept asking ourselves, “Does what we’re planning help us answer our research questions?” We had two questions:

  • What questions do voters have about elections and voting?
  • How well do county election department websites answer voters’ questions?

We developed an instrument for our volunteer researchers that combined a script with data collection in a SurveyMonkey form. (SurveyMonkey, as you might have guessed from the name, is a tool for setting up and conducting surveys.) SurveyMonkey wasn’t meant to do what we were making it do, so there were some things about the instrument that were clunky. But pilot testing the instrument helped smooth out the wording in the scripted parts, the prompts for the data collection, and the order of the questions and tasks.

Pilot test and then pilot test again

I wrote the instrument and did a dry run. Cyd tried it out. We made some changes. Then Cyd asked her mom to try it. We made some more changes. Whitney Quesenbery joined us, tried the instrument, and gave us feedback. We got one of the volunteers to try it out. We made even more changes. After about 6 pilot tests of the instrument, we decided it was ready for our volunteer researchers to be trained on.

Walkthroughs

To train our volunteers, we scripted a walkthrough of the instrument, what to do and what to say, and then delivered the training through GoToMeeting in 45 minutes. We held several of these sessions over a few evenings (a couple of them quite late Eastern Time to make it possible for people in Pacific Time to attend after regular working hours). The training demonstrated the instrument and gave volunteers a chance to ask questions. Anyone who wanted to conduct sessions with participants was required to attend training. Of the 30 people who originally volunteered, 16 attended training and ended up conducting sessions.

Snowball rehearsals, pairing pros with proto-pros

There were a few volunteers who looked like pros on paper, but none of the core team knew them and their skills. So Cyd or Whitney or I paired with these folks for a session, and if they did well, they were allowed to do sessions on their own. There were still a few people who weren’t user researchers at all, but who were interested in learning or who had other kinds of interviewing experience. We paired these people, who we called proto-pros, with approved pros to take notes or to do the data collecting while the pro conducted the interview. Some of them graduated to doing sessions on their own, too.

And this, my friends, is how we got 41 30-minute sessions done over a few weeks.

Office hours

Cyd and I also made ourselves available online to answer questions or address issues through Campfire, a closed group chat tool from 37Signals. We invited all the volunteers to Campfire, and sent out notices by email of when we’d be holding office hours there. A few volunteers did indeed have questions, which we could then clarify right in Campfire and then send out answers by email. Nothing came up that meant changing the script.

Check the data

Every now and then I wanted to know how many sessions we’d done, but I also wanted to make sure that the data was good and clean. Because the instrument was set up in SurveyMonkey, I could go look at the data as it came in. I could tell who had collected it by the participant numbers assigned, which used the researcher’s initials along with the date. This way, if I had a question or something didn’t seem right, I could go back to the researcher to correct the data. Fortunately, we never needed to do that.

A solid script focused the session

So many interesting things happened as we observed people trying to find answers to their questions on their own county election websites. We learned about what the most-asked questions and how people asked them. We heard what people had to say about whether and why they had or had not been to the site before. And we learned whether people found the answers to their questions.

We did not track where they went on the way to finding answers or giving up. And that is what the earnest volunteer researcher had wanted to talk about. “Can’t we add some fields to the data collector to put in notes about this? People get so lost! Some of these sites are train wrecks!” she wanted to know. My answer: “It’s fascinating, isn’t it? But, no.” What would we do with that data over 40 or more sessions? It wasn’t as if we were going to send feedback to every site. I reminded her of the research questions. “We want to know whether voters find answers, not how they find answers on the sites,” I said. “But if you want to take notes about how it happens, those notes could be helpful to understanding the rest of the data. And I can’t tell you how much the whole team appreciates you doing this. We don’t want you to take on extra work! We’ve already asked so much of you.”

Whew. No need to start over. “You’ll get to do more sessions within the time you have available,” I said, “If you just stick to the script.”

In the fall of 2012, I seized the opportunity to do some research I’ve wanted to do for a long time. Millions of users would be available and motivated to take part. But I needed to figure out how to do a very large study in a short time. By large, I’m talking about reviewing hundreds of websites. How could we make that happen within a couple of months?

Do election officials and voters talk about elections the same way?

I had BIG questions. What were local governments offering on their websites, and how did they talk about it? And, what questions did voters have?  Finally, if voters went to local government websites, were they able to find out what they needed to know?

Brain trust

To get this going, I enlisted a couple of colleagues and advisors. Cyd Harrell is a genius when it comes to research method (among other things). Ethan Newby sees the world in probabilities and confidence intervals. Jared Spool came up with the cleverest twist, which actually prevented us from evaluating using techniques we were prone to use just out of habit. Great team, but I knew we weren’t enough to do everything that needed doing.

Two-phases of research: What first, then whether

We settled on splitting the research into 2 steps. First, we’d go look at a bunch of county election websites to see what was on them. We decided to do this by simply cataloging the words in links, headings, and graphics on a big pile of election sites. Next, we’d do some remote, moderated usability test sessions asking voters what questions they had and then observe as they looked for satisfactory answers on their local county websites.

Cataloging the sites would tell us what counties thought was important enough to put on the home pages of their election websites. It also would reveal the words used in the information architecture. Would the labels match voters’ mental models?

Conducting the usability test would tell us what voters cared about, giving us a simple mental model. Having voters try to find answers on websites close to them would tell us whether there was a gap between how election officials talk about elections and how voters think about elections. If there was a gap, we could get a rough measure of how wide the gap might be.

When we had the catalog and the usability test data, we could look at what was on the sites and where it appeared against how easily and successfully voters found answers. (At some point, I’ll write about the usability test because there were fun challenges in that phase, too. Here I want to focus on the cataloging.)

Scoping the sample

Though most of us only think of elections when it’s time to vote for president every four years, there are actually elections going on all the time. Right now, at this very moment, there’s an election going on somewhere in the US. And, contrary to what you might think, most elections are run at the county or town level.  There are a lot of counties, boroughs, and parishes in the US. And then there’s Wisconsin and New England where elections are almost exclusively run by towns. There are about 3,057 counties or equivalent. If you count all the towns and other jurisdictions that put on elections in the US and it’s territories and protectorates, there are over 8,000 voting jurisdictions. Most of them have websites.

We decided to focus on counties or equivalents, which brings us back to roughly 3,000 to choose from. The question then was how to narrow the sample to be big enough to give us reliable statistics, but small enough to gather the data within a reasonable time.

So, our UX stats guy, Ethan, gave us some guidance. 200 counties seemed like a reasonable number to start with. Cyd created selection criteria based on US Census data. In the first pass, we selected counties based on population size (highest and lowest), population density (highest and lowest), and diversity (majority white or majority non-white). We also looked across geographic regions. When we reviewed which counties showed up under what criteria, we saw that there were several duplicates. For example, Maricopa County, Arizona is highly populated, densely populated, and mostly racial minorities. When we removed the duplicates, we had 175 counties left.

The next step was to determine whether they all had websites. Here we had one of our first insights: Counties with populations somewhere between 7,000 and 10,000 are less likely to have websites about elections than counties that are larger. We eliminated counties that either didn’t have websites or had a one-pager with the clerk’s name and phone number. This brought our sample down to 147 websites to catalog. Insanely, 147 seemed so much more reasonable than 200.

One more constraint we faced was timing. Election websites change all the time, because, well, there are elections going on all the time. Because we wanted to do this before the 2012 Presidential election in November, we had to start cataloging sites in about August. But with just a few people on the team, how would we ever manage that and conduct usability test sessions?

Crowd-sourced research FTW

With 147 websites to catalog, if we could get helpers to do 5 websites each, we’d need about 30 co-researchers. Could we find people to give us a couple of hours in exchange for nothing but our undying gratitude?

I came to learn to appreciate social networks in a whole new way. I’ve always been a big believer in networking, even before the Web gave us all these new tools. The scary part was asking friends and strangers for this kind of favor.

Fortunately, I had 320 new friends from a Kickstarter campaign I had conducted earlier in the year to raise funds to publish a series of little books called Field Guides To Ensuring Voter Intent. Even though people had already backed the project financially, many of them told me that they wanted to do more, to be directly involved. Twitter and Facebook seemed like options for sources of co-researchers, too. I asked, and they came. All together, 17 people cataloged websites.

Now we had a new problem: We didn’t know the skills of our co-researchers, and we didn’t want to turn anyone away. That would just be ungrateful.

A good data collector, some pilot testing, and a little briefing

Being design researchers, we all wanted to evaluate the websites as we were reviewing and cataloging them. But how do you deal with all those subjective judgements? What heuristics could we apply? We didn’t have the data to base heuristics on. And though Cyd, Ethan, Jared, and I have been working on website usability since the dawn of time, these election websites are particular and not like e-commerce sites and not exactly like information-rich sites. Heuristic evaluation was out of the question. As Jared suggested — and here’s the twist — let the data speak for itself rather than evaluating the information architecture or the design. After we got over the idea of evaluating, the question was how to proceed. Without judgement, what did we have?

Simple data collection. It seemed clear that the way to do the cataloging was to put the words into a spreadsheet. The format of the spreadsheet would be important. Cyd set up a basic template that looks amazingly like a website layout. It had different regions that reflected different areas of a website: banner, left column, center area, right column, footer. She added color coding and instructions and examples.

I wrote up a separate sheet with step-by-step instructions and file naming conventions. It also listed the simple set of codes to mark the words collected. And then we tested the hell out of it. Cyd’s mom was one of our first co-researchers. She had excellent questions about what to do with what. We incorporated her feedback in the spreadsheet and the instructions, and tried the process and instruments out with a few other people. After 5 or 6 pilots, when we thought we’d smoothed out the kinks, we invited our co-researchers to briefing sessions through GoToMeeting, and gave assignments.

To our delight, the data that came back was really clean and consistent. And there were more than 8,000 data items to analyze.

Lessons learned: focus, prepare, pilot, trust

It’s so easy in user research to just say, Hey, we’ll put it in front of people and ask a couple of questions, and we’ll be good.  I’ve been a loud voice for a long time crying, Just do it! Just put your design in front of users and watch. This is good for some kinds of exploratory, formative research where you’re early in a design.

But there’s a place, too, for specific, tightly bounded, narrowed scope, and a thoroughly designed research study. We wanted to answer specific questions at scale. This takes a different kind of preparation from a formative study. Getting the data collection right was key to the success of the project.

To get the data collecting right, we had to take out as much judgement as possible for 2 reasons:

• we wanted the data to be consistently gathered

• we had people whose skills we didn’t know collecting the data

Though the findings from the study are fascinating (at least to me), what makes me proud of this project was how we invited other people in. It was not easy letting go. But I just couldn’t do it all. I couldn’t even have got it done with the help of Cyd and Ethan. Setting up training helped. Setting up office hours helped. Giving specific direction helped. And now 17 people own parts of this project, which means 17 people can tell at least a little part of the story of these websites. That’s what I want out of user research. I can’t wait to do something like this with a client team full of product managers, marketers, and developers.

If you’d like to see some stats on the 8,000+ data items we collected, check out the slide deck that Ethan Newby created that lays out when, where, and how often key words that might help voters answer their questions appeared on 147 county election websites in November 2012.

It was a spectacularly beautiful Saturday in San Francisco. Exactly the perfect day to do some field usability testing. But this was no ordinary field usability test. Sure, there’d been plenty of planning and organizing ahead of time. And there would be data analysis afterward. What made this test different from most usability tests?

  • 16 people gathered to make 6 research teams
  • Most of the people on the teams had never met
  • Some of the research teams had people who had never taken part in usability
    testing before
  • The teams were going to intercept people on the street, at libraries, in farmers’
  • markets

Ever heard of Improv Everywhere? This was the UX equivalent. Researchers just appeared out of the crowd to ask people to try out a couple of designs and then talk about their experiences. Most of the interactions with participants were about 20 minutes long. That’s it. But by the time the sun was over the yardarm (time for cocktails, that is), we had data on two designs from 40 participants. The day was amazingly energizing.

How the day worked
The timeline for the day looked something like this:

8:00
Coordinator checks all the packets of materials and supplies

10:00
Coordinator meets up with all the researchers for a briefing

10:30
Teams head to their assigned locations, discuss who should lead, take notes, and intercept

11:00
Most teams reach their locations, check in with contacts (if there are contacts), set up

11:15-ish
Intercept the first participants and start gathering data

Break when needed

14:00
Finish up collecting data, head back to the meeting spot

14:30
Teams start arriving at the meeting spot with data organized in packets

15:00-17:00
Everybody debriefs about their experiences, observations

17:00
Researchers head home, energized about what they’ve learned

Later
Researchers upload audio and video recordings to an online storage space

On average, teams came back with data from 6 or 7 participants. Not bad for a 3-hour stretch of doing sessions.

The role of the coordinator
I was excited about the possibilities, about getting a chance to work with some old friends, and to expose a whole bunch of people to a set of design problems they had not been aware of before. If you have thought about getting everyone on your team to do usability testing and user research, but have been afraid of what might happen if you’re not with them, conducting a study by flash mob will certainly test your resolve. It will be a
lesson in letting go.

There was no way I could join a team for this study. I was too busy coordinating. And I wanted to be available in case there was some kind of emergency. (In fact, one team left the briefing without copies of the thing they were testing. So I jumped in a car to deliver to them.)

Though you might think that the 3-or-so hours of data collection might be dull and boring for the coordinator, there were all kinds of things for me to do: resolve issues with locations, answer questions about possible participants, reconfigure teams when people had to leave early. Cell phones were probably the most important tool of the day.

I had to believe that the planning and organizing I had done up front would work for people who were not me. And I had to trust that all the wonderful people who showed up to be the flash mob were as keen on making this work as I was. (They were.)

Keys to making flash mob testing work
I am still astonished that a bunch of people would show up on a Saturday morning to conduct a usability study in the street without much preparation. If your team is half as excited about the designs you are working on as this team was, taking a field trip to do a flash mob usability test should be a great experience. That is the most important ingredient to making a flash mob test work: people to do research who are engaged with the project, and enthusiastic about getting feedback from users.

Contrary to what you might think, coordinating a “flash” test doesn’t happen out of thin air, or a bunch of friends declaring, “Let’s put on a show!” Here are 10 things that made the day work really well to give us quick and dirty data:

1.    Organize up front
2.    Streamline data collection
3.    Test the data collection forms
4.    Minimize scripting
5.    Brief everyone on test goals, dos and don’ts
6.    Practice intercepting
7.    Do an inventory check before spreading out
8.    Be flexible
9.    Check in
10.    Reconvene the same day

Organize up front

Starting about 3 or 4 weeks ahead of time, pick the research questions, put together what needs to be tested, create the necessary materials, choose a date and locations, and recruit researchers.

Introduce all the researchers ahead of time, by email. Make the materials available to everyone to review or at least peek at as soon as possible. Nudge everyone to look at the stuff ahead of time, just to prepare.

Put together everything you could possibly need on The Day in a kit. I used a small roll-aboard suitcase to hold everything. Here’s my list:

  • Pens (lots of them)
  • Clipboards, one for each team
  • Flip cameras (people took them but did most of the recording on their phones)
  • Scripts (half a page)
  • Data collecting forms (the other half of the page)
  • Printouts of the designs, or device-accessible prototypes to test
  • Lists of names and phone numbers for researchers and me
  • Lists of locations, including addresses, contact names, parking locations, and public transit routes
  • Signs to post at locations about the study
  • Masking tape
  • Badges for each team member – either company IDs, or nice printed pages with the first names and “Researcher” printed large
  • A large, empty envelope

About 10 days ahead, I chose a lead for each of the teams (these were all people who I knew were experienced user researchers) and talked with them. I put all the stuff listed above in a large, durable envelope with the team lead’s name on it.

Streamline data collection

The sessions were going to be short, and the note-taking awkward because of doing this research in ad hoc places, so I wanted to make data collection as easy as possible. Working from a form I borrowed from Whitney Quesenbery, I made something that I hoped would be quick and easy to fill in and easy for me to understand what the data meant later.

Data collector for our flash mob usability test

The data collection form was the main thing I spent time on in the briefing before everyone went off to collect data. There are things I will emphasize more, next time, but overall, this worked pretty well. One note: It is quite difficult to collect qualitative data in the wild by writing things down. Better to audio record.

Test the data collection forms

While the form was reasonably successful, there were some parts of it that didn’t work that well. Though a version of the form had been used in other studies before, I didn’t ask enough questions about the success or failure of the open text (qualitative data) part of the form. I wanted that data desperately, but it came back pretty messy. Testing the data collection form with someone else would have told me what questions researchers would have about that (meta, no?), and I could have done something else. Next time.

Minimize scripting

Maximize participant time by dedicating as much time to the session as possible to their interacting with the design. That means that the moderator does nothing to introduce the session, instead relying on an informed consent form that one of the team members can administer to the next participant while the current one is finishing up.

The other tip here is to write out the exact wording for the session (with primary and follow up questions), and threaten the researchers with being flogged with a wet noodle if they don’t follow the script.

Brief everyone on test goals, dos and don’ts

All the researchers and I met up at 10am and had a stand-up meeting in which I thanked everyone profusely for joining me in the study. And then I talked about and took questions on:

  • The main thing I wanted to get out of each session. (There was one key concept that we wanted to know whether people understood from the design.)
  • How to use the data collection forms. (We walked through every field.)
  • How to use the script. (“You must follow the script.”)
  • How to intercept people, inviting them to participate. (More on this below.)
  • Rules about recordings. (Only hands and voices, no faces.)
  • When to check in with me. (When you arrive at your location; at the top of each hour, when you’re on the way back.)
  • When and where to meet when they were done.

I also handed cash that the researchers could use for transit or parking or lunch, or just keep.

Practice intercepting people

Intercepting people to participate is the hardest part. You walk up to a stranger on the street asking them for a favor. This might not be bad in your town. But in San Francisco, there’s no shortage of competition. Homeless people, political parities registering voters, hucksters, buskers, and kids working for Greenpeace all wanting attention from passers-by. And there you are, trying to do a research study. So, how to get some attention without freaking people out? A few things that worked well:

  • Put the youngest and/or best-looking person on the task.
  • Smile and make eye contact.
  • Using cute pets to attract people. Two researchers who own golden retrievers brought their lovely dogs with them, which was a nice icebreaker.
  • Start off with what you’re not: “I’m not selling anything, and I don’t work for Greenpeace. I’m doing a research study.”
  • Start by asking for what you want: “Would you have a few minutes to help us make ballots easier to use?”
  • Take turns – it can be exhausting enduring rejection.

Do an inventory check before spreading out

Before the researchers went off to their assigned locations, I asked each team to check that they had everything they needed, which apparently was not thorough enough for one of my teams. Next time, I will ask each team to empty out the contents of the packet and check the contents. I’ll use the list of things I wanted to include in each team’s packet and my agenda items for the briefing to ask the teams to look for each item.

Be flexible

Even with lots of planning and organizing, things happen that you couldn’t have anticipated. Researchers don’t show up, or their schedules have shifted. Locations turn out to not be so perfect. Give teams permission to do whatever they think is the right thing to get the data – short of breaking the law.

Check in

Teams checked in when they found their location, between sessions, and when they were on their way back to the meeting spot. I wanted to know that they weren’t lost, that everything was okay, and that they were finding people to take part. Asking teams to check in also gave them permission to ask me questions or help them make decisions so they could get the best data, or tell me what they were doing that was different from the plan. Basically, it was one giant exercise in The Doctrine of No Surprise.

Reconvene the same day

I needed to get the data from the research teams at some point. Why not meet up again and share experiences? Turns out that the stories from each team were important to all the other teams, and extremely helpful to me. They talked about the participants they’d had and the issues participants ran into with the designs we were testing. They also talked about their experiences with testing this way, which they all seemed to love. Afterward, I got emails from at least half the group volunteering to do it again. They had all had an adventure, met a lot of new people, got some practice with skills, and helped the world be a become a better place through design.

Wilder than testing in the wild, but trust that it will work

On that Saturday in San Francisco the amazing happened: 16 people who were strangers to one another came together to learn from 40 users about how well a design worked for them. The researchers came out from behind their monitors and out of their labs to gather data in the wild. The planning and organizing that I did ahead of time let it feel like a flash mob event to the researchers, and it gave them room to improvise as long as they collected valid data. And it worked. (See the results.)

P.S. I did not originate this approach to usability testing. As far as I know, the first person to do it was Whitney Quesenbery in New York City in the autumn of 2010.

For many of us, usability testing is a necessary evil. For others, it’s too much work, or it’s too disruptive to the development process. As you might expect, I have issues with all that. It’s unfortunate that some teams don’t see the value in observing people use their designs. Done well, it can be an amazing event in the life of a design. Even done very informally, it can still show up useful insights that can help a team make informed design decisions. But I probably don’t have to tell you that.
Usability testing can be enormously elevating for teams at all stages of UX maturity. In fact, there probably isn’t nearly enough of it being done. Even on enlightened teams that know about and do usability tests, they’re probably not doing it often enough. There seems to be a correlation between successful user experiences and how often and how much the designers and developers spend time observing users. (hat tip Jared Spool)
Observing people using early designs can be energizing as designers and developers get a chance to see reactions to ideas. I’ve seen teams walk away with insights from observing people use their designs that they couldn’t have got any other way – and then make better designs than they’ve ever made. Close to launch, it is exciting – yes, exciting – to see a design perform as useful, usable, and desirable.
I’ve been negative on usability testing and our failure of imagination regarding bringing the method up to date, lately. But there’s a lot of good to any basic usability test. In fact, I went looking for the worth, the value, the alluring in usability testing a few weeks ago when I asked on Quora, “What’s the sexiest thing about usability testing?”
Some of the answers surprised me. Some of the answers were more about what people love about usability testing than what makes it seductive. But let’s go with seductive. People who find usability testing hot say it’s about data that can end the opinion wars, revelations and surprises, and getting perspective about real use, motivations, and context of use.  Okay. We’re nerds.
The kiss of data
We always learn from users. Of course, we could just ask. But observing is so much more interesting. People do unpredictable things; they create workarounds, hacks, and alternative paths to make tools fit for their use.
This is the best case I can think of for watching rather than asking. From this observing, we get data. Juicy, luscious data like verbal protocols, task success rates, and physical behavior. This package makes it much easier to make good design decisions because we know have evidence on which to create theories about what should work better. There’s nothing like having hard evidence for going with a design direction – or changing direction.
Voyeuristic revelations
When designers, developers, and stakeholders of all persuasions get to observe people using a design – especially the first time – there’s often an “ah ha!” moment. (That’s the clean version.) Observers exclaim, “Wow, that was amazing!” when they see something surprising, both the good and the bad. The reaction that follows a completed usability study often is, “Damn. I wish we’d done this years ago. It would have saved us a ton of rework!” After watching one over-qualified participant struggle with a design recently, I heard a client say, “If that guy can’t do it, we’re in serious trouble.” That’s powerful.
When participants are surprised, that’s when the real fun begins. Not everyone likes surprises in their user interfaces, especially if they’re not the delightful Easter egg kind. While a team hopes not to hear, “I feel lost and abandoned,” you’ve got to wonder how bad it’s been when a participant squeals, “Oh, my gosh! This is so much better! When can I have it?!” Those eureka moments can reveal what to do to improve a design or an experience.
Relationship dynamics
One of the magical things about observing users working with a design is that suddenly, disputes within the team melt away. Chances are, the disputing parties were both wrong because neither (unless they have a ton of experience already observing these kinds of users in this domain doing this task) could accurately predict how the user would behave and perform.
Now, even with observations from watching just one user, there’s data on which to base design decisions. Data trumps gut. Data outweighs feelings. Data can put to rest those endless, circular discussions where inevitably, the person with the biggest paycheck or the most important title wins. The opinion wars come to an end.
When the whole team is involved in deciding what to test and observing sessions, everyone can share in making and carrying out agreed design decisions. Whenever a question comes up where no one knows but everyone has an opinion, the answer in a team doing usability testing is, “Let’s do some user research on that,” or “Let’s find out what users do.”
For the love of users
It’s so easy to get caught up in the business goals and issues with the underlying technology of a design. It’s so easy to stay in the safe bubble of the office, cranking out code, designs, plans, and reports. It’s easy to lose touch with users.
Teams that spend a couple of hours observing their users every few weeks keep that connection. They fall in love with their users. They relish the chance to see for themselves why people do the things they do with designs.
Getting out of their own heads, a successful team uses usability testing to get perspective, learn about users’ contexts, and remember the people and their stories. For these teams, usability testing is inspiring. And that’s hot.
What’s sexy about usability testing?
Observing people use a design can be revelatory. It turns up the volume on design by helping teams make informed design decisions. What’s sexy about usability testing? Data for evidence-based design. Ending opinion wars. Knowing users from observations and surprises. Getting perspective and knowledge of context of use.
The UX equivalent of a romantic dinner or a walk on the beach? Perhaps not, even for a geek girl like me. But it can be exciting, fun, funny, encouraging, and empowering. Just what you want from a relationship. That’s pretty seductive, if you ask me.

For most teams, the moderator of user research sessions is the main researcher. Depending on the comfort level of the team, the moderator might be a different person from session to session in the same study. (I often will moderate the first few sessions of a study and then hand the moderating over to the first person on the design team who feels ready to take over.)
To make that work, it’s a good practice to create some kind of checklist for the sessions, just to make sure that the team’s priorities are addressed. For a field study or a formative usability test, a checklist might be all a team needs. But if the team is working on sussing out nuanced behaviors or solving subtle problems, we might want a bit more structure.
A couple of the teams I work with ensure that everything is lined up and that *anyone* on the team could conduct the sessions by creating detailed scripts that include stage direction. Here are a couple of samples:
Whether the team is switching up moderators or it’s the same person conducting all the sessions, creating a script for the session that includes logistics is a good idea:
  • think through all the logistics, ideally, together with the team
  • make sure the sessions are conducted consistently, from one to the next
  • back up the main researcher in case something drastic happens — someone else could easily fill in
Logistics rehearsal
When you walk through, step by step, what’s supposed to happen during a session, it helps everyone visualize the steps, pacing, and who should be doing what. My client teams use the stage direction in the script as a check to make sure everything is being covered to reach the objectives of the sessions. It’s also a good way to review what tools, data, and props you might need.
Estimate timing
Teams often ask me about timing. When they get through a draft of a script that includes stage directions, they get a pretty solid feeling pretty quickly for what is going to take how long. From this they can assign timing estimates and make decisions about whether they want participants to keep going on a task after the estimated time is reached or redirect to the next task.
Mapping out location flow
It’s easy to overlook the physical or geographic flow – what a director would call blocking – of a session. Where does the participant start the session? In a waiting room, at her desk, or somewhere else? Will you change locations within a room or building during the session? How do you get from one place to the next?
Consistency and rigor
Including stage directions in a script for a user research session can help reviewer-stakeholders understand what to expect. More importantly, the stage directions act as reminders to the moderator so she’s doing the same things with and saying the same things to every participant in the study. This means nothing gets left out deliberately and nothing gets added that wasn’t agreed on ahead of time. (For example, the team could identify some area to observe for and put a prompt in the script for the moderator to ask follow-up questions that are not specifically scripted, depending on what the participant does.)
Insurance
Any really good project manager is going to have a Plan B. With a script that includes detailed stage directions, anyone who has been involved in the planning of a study should be able to pick up the script and moderate a session. The people I worked with at Tec-Ed called this “the bus test” (as in, If you get hit by a bus we still have to do this work).
Some teams I work with want to spread out and run simultaneous sessions. The stage directions can help ensure consistency across moderators. (Rehearse and refine if you’re really going to do this.)
Finally, when it comes time to write the report about the insights the team gained, the script — with its stage directions — can help answer the questions that often come asking why things were done the way they were done or why the data says what it says.
Stage it

Each person in a session is an actor, whether participant or observer. The moderator is the director. If the script for a study includes instructions for all the actors in the session as well as the director in addition to documenting what words to say, everyone involved will give a great performance.