Archives for category: Main story

It’s a 1-hour remote session using GoToMeeting either Thursday the 17th or Friday the 18th. We can conduct sessions as early as 5am Pacific Time (8am EDT). You’ll be paid $75!

We’re interested in learning about your online shopping activities, i.e. where, what, when & why. During the session you’ll give feedback on the process of shopping and buying products online (create an account, set email preferences, browse available deals and products).

If you’re interested, or know someone who is, please contact Sandy Olson ASAP at  usabilityworks@att.net or call her at 510.420.8624.

Because this research study is being done remotely, participants can be anywhere in the U.S.

UXers: send us your friends and family members! 

She wrote to me to ask if she could give me some feedback about the protocol for a usability test. “Absolutely,” I emailed back, “I’d love that.”

By this point, we’d had 20 sessions with individual users, conducted by 5 different researchers. Contrary to what I’d said, I was not in love with the idea of getting feedback at that moment, but I decided I needed to be a grown-up about it. Maybe there really was something wrong and we’d need to start over.

That would have been pretty disappointing – starting over – because we had piloted the hell out of this protocol. Even my mother could do it and get us the data we needed. I was deeply curious about what the feedback would be, but it would be a couple of days before the concerned researcher and I could talk.

This was a protocol for a usability test of county election websites. It was just before the November 2012 Presidential election, and Cyd Harrell and I wanted to seize that moment to learn where voters looked for answers to their questions about elections, and whether they were successful finding useful, clear answers that they could act on. There was a tight window to conduct this research in. We had wanted to do as many individual remote moderated usability test sessions between the end of September and Election Day as we could manage. We needed help.

Fortunately, we had 300 new friends from a Kickstarter campaign and a roster of UX researchers collected over the years who we could call on. Amazingly, 30 people volunteered to help us. But not all were known to us, and many told us that they had not done this kind of thing before. There was no way we were going to turn away free labor. And it seemed important to include as many people in the research as possible. How were we going to train a bunch of (generous, awesome) strangers who were remote from us to do what we needed done?

Clearly, we needed to leave no room for error. So, even though the study participants would be exploring as they tried to find answers to their questions on county election websites, this would not be an exploratory study. Cyd and I agreed that we needed to design support into the research design. (We also agreed that we wouldn’t allow anyone who didn’t do the training to conduct sessions.)

Focus on the research question

Everything in a study should be done in the service of the research question. But it’s easy to lose sight of the Big Question when you’re planning logistics. So, in the same way that I’m constantly asking every team I work with, “What do you want the user’s experience to be?”, Cyd and I kept asking ourselves, “Does what we’re planning help us answer our research questions?” We had two questions:

  • What questions do voters have about elections and voting?
  • How well do county election department websites answer voters’ questions?

We developed an instrument for our volunteer researchers that combined a script with data collection in a SurveyMonkey form. (SurveyMonkey, as you might have guessed from the name, is a tool for setting up and conducting surveys.) SurveyMonkey wasn’t meant to do what we were making it do, so there were some things about the instrument that were clunky. But pilot testing the instrument helped smooth out the wording in the scripted parts, the prompts for the data collection, and the order of the questions and tasks.

Pilot test and then pilot test again

I wrote the instrument and did a dry run. Cyd tried it out. We made some changes. Then Cyd asked her mom to try it. We made some more changes. Whitney Quesenbery joined us, tried the instrument, and gave us feedback. We got one of the volunteers to try it out. We made even more changes. After about 6 pilot tests of the instrument, we decided it was ready for our volunteer researchers to be trained on.

Walkthroughs

To train our volunteers, we scripted a walkthrough of the instrument, what to do and what to say, and then delivered the training through GoToMeeting in 45 minutes. We held several of these sessions over a few evenings (a couple of them quite late Eastern Time to make it possible for people in Pacific Time to attend after regular working hours). The training demonstrated the instrument and gave volunteers a chance to ask questions. Anyone who wanted to conduct sessions with participants was required to attend training. Of the 30 people who originally volunteered, 16 attended training and ended up conducting sessions.

Snowball rehearsals, pairing pros with proto-pros

There were a few volunteers who looked like pros on paper, but none of the core team knew them and their skills. So Cyd or Whitney or I paired with these folks for a session, and if they did well, they were allowed to do sessions on their own. There were still a few people who weren’t user researchers at all, but who were interested in learning or who had other kinds of interviewing experience. We paired these people, who we called proto-pros, with approved pros to take notes or to do the data collecting while the pro conducted the interview. Some of them graduated to doing sessions on their own, too.

And this, my friends, is how we got 41 30-minute sessions done over a few weeks.

Office hours

Cyd and I also made ourselves available online to answer questions or address issues through Campfire, a closed group chat tool from 37Signals. We invited all the volunteers to Campfire, and sent out notices by email of when we’d be holding office hours there. A few volunteers did indeed have questions, which we could then clarify right in Campfire and then send out answers by email. Nothing came up that meant changing the script.

Check the data

Every now and then I wanted to know how many sessions we’d done, but I also wanted to make sure that the data was good and clean. Because the instrument was set up in SurveyMonkey, I could go look at the data as it came in. I could tell who had collected it by the participant numbers assigned, which used the researcher’s initials along with the date. This way, if I had a question or something didn’t seem right, I could go back to the researcher to correct the data. Fortunately, we never needed to do that.

A solid script focused the session

So many interesting things happened as we observed people trying to find answers to their questions on their own county election websites. We learned about what the most-asked questions and how people asked them. We heard what people had to say about whether and why they had or had not been to the site before. And we learned whether people found the answers to their questions.

We did not track where they went on the way to finding answers or giving up. And that is what the earnest volunteer researcher had wanted to talk about. “Can’t we add some fields to the data collector to put in notes about this? People get so lost! Some of these sites are train wrecks!” she wanted to know. My answer: “It’s fascinating, isn’t it? But, no.” What would we do with that data over 40 or more sessions? It wasn’t as if we were going to send feedback to every site. I reminded her of the research questions. “We want to know whether voters find answers, not how they find answers on the sites,” I said. “But if you want to take notes about how it happens, those notes could be helpful to understanding the rest of the data. And I can’t tell you how much the whole team appreciates you doing this. We don’t want you to take on extra work! We’ve already asked so much of you.”

Whew. No need to start over. “You’ll get to do more sessions within the time you have available,” I said, “If you just stick to the script.”

In the fall of 2012, I seized the opportunity to do some research I’ve wanted to do for a long time. Millions of users would be available and motivated to take part. But I needed to figure out how to do a very large study in a short time. By large, I’m talking about reviewing hundreds of websites. How could we make that happen within a couple of months?

Do election officials and voters talk about elections the same way?

I had BIG questions. What were local governments offering on their websites, and how did they talk about it? And, what questions did voters have?  Finally, if voters went to local government websites, were they able to find out what they needed to know?

Brain trust

To get this going, I enlisted a couple of colleagues and advisors. Cyd Harrell is a genius when it comes to research method (among other things). Ethan Newby sees the world in probabilities and confidence intervals. Jared Spool came up with the cleverest twist, which actually prevented us from evaluating using techniques we were prone to use just out of habit. Great team, but I knew we weren’t enough to do everything that needed doing.

Two-phases of research: What first, then whether

We settled on splitting the research into 2 steps. First, we’d go look at a bunch of county election websites to see what was on them. We decided to do this by simply cataloging the words in links, headings, and graphics on a big pile of election sites. Next, we’d do some remote, moderated usability test sessions asking voters what questions they had and then observe as they looked for satisfactory answers on their local county websites.

Cataloging the sites would tell us what counties thought was important enough to put on the home pages of their election websites. It also would reveal the words used in the information architecture. Would the labels match voters’ mental models?

Conducting the usability test would tell us what voters cared about, giving us a simple mental model. Having voters try to find answers on websites close to them would tell us whether there was a gap between how election officials talk about elections and how voters think about elections. If there was a gap, we could get a rough measure of how wide the gap might be.

When we had the catalog and the usability test data, we could look at what was on the sites and where it appeared against how easily and successfully voters found answers. (At some point, I’ll write about the usability test because there were fun challenges in that phase, too. Here I want to focus on the cataloging.)

Scoping the sample

Though most of us only think of elections when it’s time to vote for president every four years, there are actually elections going on all the time. Right now, at this very moment, there’s an election going on somewhere in the US. And, contrary to what you might think, most elections are run at the county or town level.  There are a lot of counties, boroughs, and parishes in the US. And then there’s Wisconsin and New England where elections are almost exclusively run by towns. There are about 3,057 counties or equivalent. If you count all the towns and other jurisdictions that put on elections in the US and it’s territories and protectorates, there are over 8,000 voting jurisdictions. Most of them have websites.

We decided to focus on counties or equivalents, which brings us back to roughly 3,000 to choose from. The question then was how to narrow the sample to be big enough to give us reliable statistics, but small enough to gather the data within a reasonable time.

So, our UX stats guy, Ethan, gave us some guidance. 200 counties seemed like a reasonable number to start with. Cyd created selection criteria based on US Census data. In the first pass, we selected counties based on population size (highest and lowest), population density (highest and lowest), and diversity (majority white or majority non-white). We also looked across geographic regions. When we reviewed which counties showed up under what criteria, we saw that there were several duplicates. For example, Maricopa County, Arizona is highly populated, densely populated, and mostly racial minorities. When we removed the duplicates, we had 175 counties left.

The next step was to determine whether they all had websites. Here we had one of our first insights: Counties with populations somewhere between 7,000 and 10,000 are less likely to have websites about elections than counties that are larger. We eliminated counties that either didn’t have websites or had a one-pager with the clerk’s name and phone number. This brought our sample down to 147 websites to catalog. Insanely, 147 seemed so much more reasonable than 200.

One more constraint we faced was timing. Election websites change all the time, because, well, there are elections going on all the time. Because we wanted to do this before the 2012 Presidential election in November, we had to start cataloging sites in about August. But with just a few people on the team, how would we ever manage that and conduct usability test sessions?

Crowd-sourced research FTW

With 147 websites to catalog, if we could get helpers to do 5 websites each, we’d need about 30 co-researchers. Could we find people to give us a couple of hours in exchange for nothing but our undying gratitude?

I came to learn to appreciate social networks in a whole new way. I’ve always been a big believer in networking, even before the Web gave us all these new tools. The scary part was asking friends and strangers for this kind of favor.

Fortunately, I had 320 new friends from a Kickstarter campaign I had conducted earlier in the year to raise funds to publish a series of little books called Field Guides To Ensuring Voter Intent. Even though people had already backed the project financially, many of them told me that they wanted to do more, to be directly involved. Twitter and Facebook seemed like options for sources of co-researchers, too. I asked, and they came. All together, 17 people cataloged websites.

Now we had a new problem: We didn’t know the skills of our co-researchers, and we didn’t want to turn anyone away. That would just be ungrateful.

A good data collector, some pilot testing, and a little briefing

Being design researchers, we all wanted to evaluate the websites as we were reviewing and cataloging them. But how do you deal with all those subjective judgements? What heuristics could we apply? We didn’t have the data to base heuristics on. And though Cyd, Ethan, Jared, and I have been working on website usability since the dawn of time, these election websites are particular and not like e-commerce sites and not exactly like information-rich sites. Heuristic evaluation was out of the question. As Jared suggested — and here’s the twist — let the data speak for itself rather than evaluating the information architecture or the design. After we got over the idea of evaluating, the question was how to proceed. Without judgement, what did we have?

Simple data collection. It seemed clear that the way to do the cataloging was to put the words into a spreadsheet. The format of the spreadsheet would be important. Cyd set up a basic template that looks amazingly like a website layout. It had different regions that reflected different areas of a website: banner, left column, center area, right column, footer. She added color coding and instructions and examples.

I wrote up a separate sheet with step-by-step instructions and file naming conventions. It also listed the simple set of codes to mark the words collected. And then we tested the hell out of it. Cyd’s mom was one of our first co-researchers. She had excellent questions about what to do with what. We incorporated her feedback in the spreadsheet and the instructions, and tried the process and instruments out with a few other people. After 5 or 6 pilots, when we thought we’d smoothed out the kinks, we invited our co-researchers to briefing sessions through GoToMeeting, and gave assignments.

To our delight, the data that came back was really clean and consistent. And there were more than 8,000 data items to analyze.

Lessons learned: focus, prepare, pilot, trust

It’s so easy in user research to just say, Hey, we’ll put it in front of people and ask a couple of questions, and we’ll be good.  I’ve been a loud voice for a long time crying, Just do it! Just put your design in front of users and watch. This is good for some kinds of exploratory, formative research where you’re early in a design.

But there’s a place, too, for specific, tightly bounded, narrowed scope, and a thoroughly designed research study. We wanted to answer specific questions at scale. This takes a different kind of preparation from a formative study. Getting the data collection right was key to the success of the project.

To get the data collecting right, we had to take out as much judgement as possible for 2 reasons:

• we wanted the data to be consistently gathered

• we had people whose skills we didn’t know collecting the data

Though the findings from the study are fascinating (at least to me), what makes me proud of this project was how we invited other people in. It was not easy letting go. But I just couldn’t do it all. I couldn’t even have got it done with the help of Cyd and Ethan. Setting up training helped. Setting up office hours helped. Giving specific direction helped. And now 17 people own parts of this project, which means 17 people can tell at least a little part of the story of these websites. That’s what I want out of user research. I can’t wait to do something like this with a client team full of product managers, marketers, and developers.

If you’d like to see some stats on the 8,000+ data items we collected, check out the slide deck that Ethan Newby created that lays out when, where, and how often key words that might help voters answer their questions appeared on 147 county election websites in November 2012.

I’ve seen it dozens of times. The team meets after observing people use their design, and they’re excited and energized by what they saw and heard during the sessions. They’re all charged up about fixing the design. Everyone comes in with ideas, certain they have the right solution to the remedy frustrations users had. Then what happens?

On a super collaborative team everyone is in the design together, just with different skills. Splendid! Everyone was involved in the design of the usability test, they all watched most of the sessions, they participated in debriefs between sessions. They took detailed, copious notes. And now the “what ifs” begin:

What if we just changed the color of the icon? What if we made the type bigger? What if we moved the icon to the other side of the screen? Or a couple of pixels? What if?

How do you know you’re solving the right problem? Well, the team thinks they’re on the right track because they paid close attention to what participants said and did. But teams often leave that data behind when they’re trying to decide what to do. This is not ideal.

Getting beyond “what if”

On a super collaborative team, everyone is rewarded for doing the right thing for the user, which in turn, is the right thing for the business. Everyone is excited about learning about the goodness (or badness) of the design by watching users use it. But a lot of teams get stuck in the step after observation. They’re anxious to get to design direction. Who can blame them? That’s where the “what ifs” and power plays happen. Some teams get stuck and others try random things because they’re missing one crucial step: going back to the evidence for the design change.

Observing with an open mind

Observations tell you what happened. That is, you heard participants say things and you saw them do things — many, many interesting, sometimes baffling things. Good things, and bad things. Some of those things backed up your theories about how the design would work. Some of the observations blew your theories out of the water. And that’s what we do usability testing to see: In a low risk situation like a small, closed test, what will it be like when our design is out in the wild.

Brainstorming the why

The next natural step is to make inferences. These are guesses or judgments about why the things you observed happened. We all do this. It’s usually what the banter is all about in the observation room.

“Why” is why we do this usability testing thing. You can’t get to why from surveys or focus groups. But even in direct observation, with empirical evidence, why is sometimes difficult to ferret out. A lot of times the participants just say it. “That’s not what I was looking for.” “I didn’t expect it to work that way.” “I wouldn’t have approached it that way.” “That’s not where I’d start.” You get the idea.

But they don’t always tell you the right thing. You have to watch. Where did they start? What wrong turns did they take? Where did they stop? What happened in the 3 minutes before they succeed or failed? What happened in the 3 minutes after?

It’s important to get judgments and guesses out into the fresh air and sunshine by brainstorming them within the team. When teams make the guessing of the why an explicit act that they do in a room together, they test the boundaries of their observations. It’s also easy to see when different people on the team saw things similarly and where they saw them differently.

Weighing the evidence

And so we come to the crucial step, the one that most teams skip over, and why they end up in the “what ifs” and opinion wars: analysis. I’m not talking about group therapy, though some teams I’ve worked with could use some. Rather, the team now looks at the strength of the data to support design decisions. Without this step, it is far too easy to choose the wrong inference to direct the design decisions. You’re working from the gut, and the gut can be wrong.

Analysis doesn’t have to be difficult or time-consuming. It doesn’t even have to involved spreadsheets.* And it doesn’t have to be lonely. The team can do it together. The key is examining the weight of the evidence for the most likely inferences.

Take all those brainstormed inferences. Throw them in to a hat. Draw one out and start looking at data you have that supports that being the reason for the frustration or failure. Is there a lot? A little? Any? Everyone in the room should be poring through their notes. What happened in the sessions? How much? How many participants had a problem? What kinds of participants had the problem? What were they trying to do and how did they describe it?

Answering questions like these, among the team, gets us to understanding how likely is it that this particular inference is the cause of the frustration. After a few minutes of this, it is not uncommon for the team to collectively have an “ah ha!” moment. Breakthrough comes as the team eliminates some inferences because they’re weak, and keeps others because they are strong. Taking the strong inferences together, along with the data that shows what happened and why snaps the design direction right into focus.

Eliminating frustration is a process of elimination

The team comes to the design direction meeting knowing what the priority issues were. Everyone has at least one explanation for the gap between what the design does and what the participant tried to do. Narrowing those guesses to what is the most likely root cause based on the weight of the evidence – in an explicit, open, and conscious act –takes the “what ifs” out of the next version of a design, and shares the design decisions across the team.

* Though 95% of data analysis does. Sorry.

Sports teams drill endlessly. They walk through plays, they run plays, they practice plays in scrimmages. They tweak and prompt in between drills and practice. And when the game happens, the ball just knows where to go.

This seems like such an obvious thing, but we researchers often poo-poo dry runs and rehearsals. In big studies, it is common to run large pilot studies to get the kinks out of an experiment design before running the experiment with a large number of participants.

But I’ve been getting the feeling that we general research practitioners are afraid of rehearsals. One researcher I know told me that he doesn’t do dry runs or pilot sessions because he fears that makes it look to his team like he doesn’t know what he is doing. Well, guess what. The first “real” session ends up being your rehearsal, whether you like it or not. Because you actually don’t know exactly what you’re doing — yet. If it goes well, you were lucky and you have good, valid, reliable data. But if it didn’t go well, you just wasted a lot of time and probably some money.

The other thing I hear is that researchers are pressured for time. In an Agile team, for example, everyone feels like they just have to keep moving forward all the time. This is an application development methodology in desperate want of thinking time, of just practicing craft. The person doing the research this week has finite time. The participants are only available at certain times. The window for considering the findings closes soon. So why waste it rehearsing what you want to do in the research session?

Conducting dry runs, practice sessions, pilots, and rehearsals — call them whatever works in your team — gives you the superpower of confidence. That confidence gives you focus and relaxation in the session so you can open your mind and perception to what is happening with the user rather than focusing on managing the session. And who doesn’t want the super power of control? Or of deep insight? These things don’t come without preparation, practice, and poking at improving the protocol.

You can’t get that deep insight in situ if you’re worried about things like how to transfer the control of the mouse to someone else in a remote session. Or whether the observers are going to say something embarrassing at just the wrong time. Or how you’re going to ask that one really important question without leading or priming the participant.

The way to get to be one with the experience of observing the user’s experience is to practice the protocol ahead of time.

There are 4 levels of rehearsal that I use. I usually do all of them for every study or usability test.

  • Script read-through. You’ve written the script, probably, but have you actually read it? Read it aloud to yourself. Read it aloud to your team. Get feedback about whether you’re describing the session accurately in the introduction for the participant. Tweak interview questions so they feel natural. Make sure that the task scenarios cover all the issues you want to explore. Draft follow-up questions.
  • Dry run with a confederate. Pretending is not a good thing in a real session. But having someone act as your participant while you go through the script or protocol or checklist can give you initial feedback about whether the things you’re saying and asking are understandable. It’s the first indication of whether you’ll get the data you are looking for.
  • Rehearsal with a team member. Do a full rehearsal on all the parts. First, do a technical rehearsal. Does the prototype work? Do you know what you’re doing in the recording software? Does the camera on the mobile sled hold together? If there will remote observers, make sure whatever feed you want to use will work for them by going through every step. When everything feels comfortable on the technical side, get a team member to be the participant and go through every word of the script. If you run into something that doesn’t seem to be working, change it in the script right now.
  • Pilot session with a real participant. This looks a lot like the rehearsal with the team member except for 3 things. First, the participant is not a team member, but a user or customer who was purposely selected for this session. Second, you will have refined the script after your experience of running a session using it with a team member. Third, you will now have been through the script at least 3 other times before this, so you should be comfortable with what the team is trying to learn and with the best way to ask about it. How many times have you run a usability test only to get to the 5th session and hear in your mind, ‘huh, now I know what this test is about’? It happens.

All this rehearsal? As the moderator of a research session, you’re not the star — the participant is. But if you aren’t comfortable with what you’re doing and how you’re doing it, the participant won’t be comfortable and relaxed. either. And you won’t get the most out of the session. But after you get into the habit of rehearsing, when it comes game time, you can concentrate on what is happening with the participant. Instead, those rehearsal steps become ways to test the test, rather than testing you.

There’s a lot of truth to “practice makes perfect.” When it comes to conducting user research sessions, that practice can make all the difference in getting valid data and useful insights. As Yogi Bera said, “In theory, there’s no difference between theory and practice. In practice there is.”