Archives for posts with tag: methodology

In the fall of 2012, I seized the opportunity to do some research I’ve wanted to do for a long time. Millions of users would be available and motivated to take part. But I needed to figure out how to do a very large study in a short time. By large, I’m talking about reviewing hundreds of websites. How could we make that happen within a couple of months?

Do election officials and voters talk about elections the same way?

I had BIG questions. What were local governments offering on their websites, and how did they talk about it? And, what questions did voters have?  Finally, if voters went to local government websites, were they able to find out what they needed to know?

Brain trust

To get this going, I enlisted a couple of colleagues and advisors. Cyd Harrell is a genius when it comes to research method (among other things). Ethan Newby sees the world in probabilities and confidence intervals. Jared Spool came up with the cleverest twist, which actually prevented us from evaluating using techniques we were prone to use just out of habit. Great team, but I knew we weren’t enough to do everything that needed doing.

Two-phases of research: What first, then whether

We settled on splitting the research into 2 steps. First, we’d go look at a bunch of county election websites to see what was on them. We decided to do this by simply cataloging the words in links, headings, and graphics on a big pile of election sites. Next, we’d do some remote, moderated usability test sessions asking voters what questions they had and then observe as they looked for satisfactory answers on their local county websites.

Cataloging the sites would tell us what counties thought was important enough to put on the home pages of their election websites. It also would reveal the words used in the information architecture. Would the labels match voters’ mental models?

Conducting the usability test would tell us what voters cared about, giving us a simple mental model. Having voters try to find answers on websites close to them would tell us whether there was a gap between how election officials talk about elections and how voters think about elections. If there was a gap, we could get a rough measure of how wide the gap might be.

When we had the catalog and the usability test data, we could look at what was on the sites and where it appeared against how easily and successfully voters found answers. (At some point, I’ll write about the usability test because there were fun challenges in that phase, too. Here I want to focus on the cataloging.)

Scoping the sample

Though most of us only think of elections when it’s time to vote for president every four years, there are actually elections going on all the time. Right now, at this very moment, there’s an election going on somewhere in the US. And, contrary to what you might think, most elections are run at the county or town level.  There are a lot of counties, boroughs, and parishes in the US. And then there’s Wisconsin and New England where elections are almost exclusively run by towns. There are about 3,057 counties or equivalent. If you count all the towns and other jurisdictions that put on elections in the US and it’s territories and protectorates, there are over 8,000 voting jurisdictions. Most of them have websites.

We decided to focus on counties or equivalents, which brings us back to roughly 3,000 to choose from. The question then was how to narrow the sample to be big enough to give us reliable statistics, but small enough to gather the data within a reasonable time.

So, our UX stats guy, Ethan, gave us some guidance. 200 counties seemed like a reasonable number to start with. Cyd created selection criteria based on US Census data. In the first pass, we selected counties based on population size (highest and lowest), population density (highest and lowest), and diversity (majority white or majority non-white). We also looked across geographic regions. When we reviewed which counties showed up under what criteria, we saw that there were several duplicates. For example, Maricopa County, Arizona is highly populated, densely populated, and mostly racial minorities. When we removed the duplicates, we had 175 counties left.

The next step was to determine whether they all had websites. Here we had one of our first insights: Counties with populations somewhere between 7,000 and 10,000 are less likely to have websites about elections than counties that are larger. We eliminated counties that either didn’t have websites or had a one-pager with the clerk’s name and phone number. This brought our sample down to 147 websites to catalog. Insanely, 147 seemed so much more reasonable than 200.

One more constraint we faced was timing. Election websites change all the time, because, well, there are elections going on all the time. Because we wanted to do this before the 2012 Presidential election in November, we had to start cataloging sites in about August. But with just a few people on the team, how would we ever manage that and conduct usability test sessions?

Crowd-sourced research FTW

With 147 websites to catalog, if we could get helpers to do 5 websites each, we’d need about 30 co-researchers. Could we find people to give us a couple of hours in exchange for nothing but our undying gratitude?

I came to learn to appreciate social networks in a whole new way. I’ve always been a big believer in networking, even before the Web gave us all these new tools. The scary part was asking friends and strangers for this kind of favor.

Fortunately, I had 320 new friends from a Kickstarter campaign I had conducted earlier in the year to raise funds to publish a series of little books called Field Guides To Ensuring Voter Intent. Even though people had already backed the project financially, many of them told me that they wanted to do more, to be directly involved. Twitter and Facebook seemed like options for sources of co-researchers, too. I asked, and they came. All together, 17 people cataloged websites.

Now we had a new problem: We didn’t know the skills of our co-researchers, and we didn’t want to turn anyone away. That would just be ungrateful.

A good data collector, some pilot testing, and a little briefing

Being design researchers, we all wanted to evaluate the websites as we were reviewing and cataloging them. But how do you deal with all those subjective judgements? What heuristics could we apply? We didn’t have the data to base heuristics on. And though Cyd, Ethan, Jared, and I have been working on website usability since the dawn of time, these election websites are particular and not like e-commerce sites and not exactly like information-rich sites. Heuristic evaluation was out of the question. As Jared suggested — and here’s the twist — let the data speak for itself rather than evaluating the information architecture or the design. After we got over the idea of evaluating, the question was how to proceed. Without judgement, what did we have?

Simple data collection. It seemed clear that the way to do the cataloging was to put the words into a spreadsheet. The format of the spreadsheet would be important. Cyd set up a basic template that looks amazingly like a website layout. It had different regions that reflected different areas of a website: banner, left column, center area, right column, footer. She added color coding and instructions and examples.

I wrote up a separate sheet with step-by-step instructions and file naming conventions. It also listed the simple set of codes to mark the words collected. And then we tested the hell out of it. Cyd’s mom was one of our first co-researchers. She had excellent questions about what to do with what. We incorporated her feedback in the spreadsheet and the instructions, and tried the process and instruments out with a few other people. After 5 or 6 pilots, when we thought we’d smoothed out the kinks, we invited our co-researchers to briefing sessions through GoToMeeting, and gave assignments.

To our delight, the data that came back was really clean and consistent. And there were more than 8,000 data items to analyze.

Lessons learned: focus, prepare, pilot, trust

It’s so easy in user research to just say, Hey, we’ll put it in front of people and ask a couple of questions, and we’ll be good.  I’ve been a loud voice for a long time crying, Just do it! Just put your design in front of users and watch. This is good for some kinds of exploratory, formative research where you’re early in a design.

But there’s a place, too, for specific, tightly bounded, narrowed scope, and a thoroughly designed research study. We wanted to answer specific questions at scale. This takes a different kind of preparation from a formative study. Getting the data collection right was key to the success of the project.

To get the data collecting right, we had to take out as much judgement as possible for 2 reasons:

• we wanted the data to be consistently gathered

• we had people whose skills we didn’t know collecting the data

Though the findings from the study are fascinating (at least to me), what makes me proud of this project was how we invited other people in. It was not easy letting go. But I just couldn’t do it all. I couldn’t even have got it done with the help of Cyd and Ethan. Setting up training helped. Setting up office hours helped. Giving specific direction helped. And now 17 people own parts of this project, which means 17 people can tell at least a little part of the story of these websites. That’s what I want out of user research. I can’t wait to do something like this with a client team full of product managers, marketers, and developers.

If you’d like to see some stats on the 8,000+ data items we collected, check out the slide deck that Ethan Newby created that lays out when, where, and how often key words that might help voters answer their questions appeared on 147 county election websites in November 2012.


How many of you have run usability tests that look like this: Individual, one-hour sessions, in which the participant is performing one or more tasks from a scenario that you and your team have come up with, on a prototype, using bogus or imaginary data. It’s a hypothetical situation for the user, sometimes, they’re even role-playing.

Anyone? That’s what I thought. Me too. I just did it a couple of weeks ago.

But that model of usability testing is broken. Why? Because one of the first things we found out is that the task we were asking people to do – doing some basic financial estimates based on goals for retirement – involved more than the person in the room with me.

For the husbands, the task involved their wives because the guys didn’t actually know what the numbers were for the household expenses. For the women, it was their children, because they wanted to talk to them about medical expenses and plans for assisted living. For younger people it was their parents or grandparents, because they wanted to learn from them how they’d managed to save enough to help them through school and retire, too.

There’s a conversation there. There’s a support network there. And that’s what’s broken about usability testing. It always has been.

I first started thinking about this when Google launched Buzz. Buzz used Gmail users’ contacts to automatically generate online social networks to connect users’ most frequent contacts. Google employees – 20,000 of them – had been using Buzz inside the garden walls for a year. A nice, big sample. The problem became evident, however, when Buzz was let into the wild — almost immediately. One example: A blogger who calls herself Harriet is one of the most famous cases. She wrote about how one of her most frequent correspondents in Gmail was her boyfriend. Another was her abusive ex-husband. Now they were publicly connected, and this made her very, very unhappy. In fact, the post was titled, Fuck You, Google.

There might have been no harm done in the retirement planning study. But there might. Would the 31-year-old who broke down crying in the session because her mother was in late-stage ALS have had a better experience if we’d tested in her context, where she could work with her closest advisor – her dad? Might it have been a calming process, where she felt in control and became engaged in envisioning her independent future because someone she trusted could give her perspective that I could not? Maybe.

As for Buzz, Harriet certainly wasn’t pleased, and she was left with a mess to clean up. How to unconnect two people who were now connected?

When companies started doing usability testing with regularity in the 1980s, it was about  finding design problems in what now look like fairly simple UIs that frustrated or hindered users. It was one person, one machine, as the human did a usually work-based task. That’s why it was called computer-human interaction.

But today, technology large and small is fully integrated into peoples’ lives in a much more ephemeral, less compartmentalized way. It is rare to sit next to a corded phone holding the handset only talking and listening.

When you look at what the social web is, there are some characteristics that I think we’re not taking into account very well in doing usability tests:

– It’s about relationships among people
– In context
– Conducted fluidly, across time across time and space, continuously

I also think that people who are new to usability testing and user research are not going to do a good job of testing social interaction design, because constructing a study cannot be very formal or controlled. Measuring what’s happening is much more complex. And scope and scale make a difference. Testing Buzz with 20,000 Googlers for a year wasn’t enough; it took letting it out to a million people who hadn’t drunk the Koolaid to find the *real* problems, the real frustrations, the real hindrances that truly affect uptake and adoption.

The nature of online is social
Let’s back up and talk about a key definition. What I mean by “social” is anything that someone does that changes the behavior of someone else.

This is how I can say that being online *is* social. Email is social. Publishing a flat HTML document is social. Putting something on a calendar is social. Everything is social. Social isn’t the special secret sauce that you pour on top of an experience. Social is already there. Choosing a bank is social. Planning a vacation is social. Buying an appliance is social. I SMSd a series of photos to my boyfriend the other day, of me in different eyeglass frames because I couldn’t decide by myself. This was *not* a computer-centered, or an app-centered interaction. This was a decision being made by two people, a conversation, mediated by fully integrated technology in fluid activities in different contexts for two people. It was social.

Social isn’t sauce. It’s sustenance. It’s already there, and we’re not seeing it. So we’re not researching it, and we’re definitely not testing for it.

We have to stop thinking about human-computer interaction. That model by default is too limiting. Look around. It’s really about human relationships and interactions mediated by technology. Technology is supporting the communication, not driving it. Ask any parent who has used Facetime or Skype to have a video chat with their baby for the first time.

Scale is the game changer

Discount usability testing is great for some things, but what we’re really studying when doing user research and usability testing for the social web is social behavior. And that takes scale. That takes connections. That takes observing people’s real networks and understanding what makes those work, what makes those friends, family, colleagues, neighbors, acquaintances, associates, clients, vendors, pen pals, drinking buddies, partners for life, or friends with benefits.

Those are rich, life-framing relationships that affect how someone interacts with a user interface that most of us are not even scratching the surface of when when we “micro-test” a commenting feature on an online invitation web site.

“Task” doesn’t mean what you think it means

For the retirement planning tool, I did a little interview to start the session that I hoped would set some context for the participant to do the behavior that I wanted to observe. But it was woefully inadequate. Don’t get me wrong, the client wasn’t unhappy; they thought it was a cool technique. But as soon as I learned who the participant went to for financial advice, where was I? Putting the participant in a situation where they had to pretend. They did, and they did a fair job of it. But it was lacking.

But tasks are the wrong unit. What we’re asking people to do in usability tests is like attending a cocktail party while grocery shopping. Even with an interview, even with careful recruiting, it’s incongrous. There are very few discrete tasks in life. Instead there are activities that lead people to goals. Multiple activities and goals might be intermixed on the way to achieving any one of them. In the meantime, the technology is completely integrated, ambient, almost incidental. LIke asking your bf which eyeglass frames look nerdy-sexy, versus just nerdy.

The activity of interest isn’t computer based. Look at retirement planning. It’s *retirement planning*! That’s not the task. The activity is planning for the future, a future in which you have no real idea of what is going to happen, but you have hopes, aspirations.

Using Buzz is not a task. It’s not an activity. People who use Buzz don’t have the goal of connecting to other people, not in that deliberate way. They’re saying, hey, I’ve read something interesting you might be interested in, too. The task isn’t “sharing.” It’s putting  themselves out in the world hoping that people they care about will notice. How do you make that scenario in a usability test?

Satisfaction may now equal user control

The ISO measures of usability are efficiency, effectiveness, and satisfaction. What is effective about having Tweetdeck open all day long while you’re also writing a report, drafting emails, taking part in conference calls, attending virtual seminars, going back to writing a report, calling your mother?

When the efficiency measure went into the ISO definition, most people were measuring time on task. But if you don’t have a discrete task, how do you measure time?

Satisfaction may be the most important thing in the end for the social web, and that may be the degree to which the user feels she has control of the activities she’s doing while she’s using your tool. How much is the UI forcing her to learn the tool, versus quickly integrating it into her life?

Measuring success in the social web often defies what we’ve been taught to count for data. How do you measure engagement in the social web? Is it about time on the site? I could lurk on or Facebook all day. Am I engaged? Is it about minutes spent pursuing and perusing content? Is it about how likely someone is to recommend something to someone else? I wrote my first product review, EVER last week, for a pair of jeans on the Lands End web site. Am I engaged with the site? I would say no.

We have to look hard at the goodness of conventional metrics. They’re not translating to anything meaningful, I don’t think, because we’ve been thinking about all this all wrong – or not enough. What is goodness to a user? Control of her life. Control of her identify. Control of her information.

Users are continuously designing your UI

What does Task mean, what does Success mean, how do you measure new features that users create with your UI on the fly? Twitter has hashtags and direct messages. Users created those. Facebook is continuously being hacked for fresh activities. Look at commenting systems on blog posts or articles. Spammers, yes, but people are also talking to one another, arguing, flirting, solving problems, telling their own stories. No matter what you build, and what your intentions were in designing it, users are going to hijack it to make it useful to them. How do you test for that?

Ideas from smart people
I had all these questions and more when I met with a bunch of smart people who have been working in researching the social web. Out of that discussion came some great stories about what people had tried and worked, and what had not worked so well.

For creating task scenarios for usability tests, getting participants to tell stories of specific interactions helped. Doing long interviews helped learn context, scope, priorities, connections. Getting people to talk about their online profiles and explain relationships helped set the scene for activities. Getting them to use their own log-ins with their real relationships helped everyone know whether the outcomes were useful, usable, and desirable. Whether the outcomes were satisfying and even enriching.

Some of the people in this informal workshop also offered these ideas:

  • Screen sharing with someone outside the test lab or test situation
  • Making video diaries and then reviewing them retrospectively
  • Developing and testing at the same time, with users
  • Including friends or other connections in the same test session, setting up multi-user sessions
  • Sampling the experience in the same way that Flow was discovered: prompting people by SMS at select or random moments to ask people to report their behavior

There’s also bodystorming, critical incident analysis, co-designing paper or other prototypes. A few things seemed clear through that discussion. To make user research and usability testing useful to designers, we have to rethink how we’re doing it. It’s got to reflect reality a bit better, which means it takes more from social science and behavioral science than psychology. It takes more time. It takes more people. It takes a wider view of task, success, and engagement. And we’re just beginning to figure all that out.

Rethink research and testing

Everything is social. Scale is the game changer. Tasks aren’t what you think they are. User satisfaction may be about control. Users are continuously designing your UI. I invite you to work with me on rethinking how we’re doing user research and usability testing for what’s really happening in the world: fluid, context-dependent, relationships mediated by technology.

I want to thank Brynn Evans, Chris Messina, Nate Bolt, Ben Gross, Erin Malone, and Jared Spool for spending the better part of a day talking with me about their experiences in researching social. These musings come from that cooperative, ahem, social effort. 

For most teams, the moderator of user research sessions is the main researcher. Depending on the comfort level of the team, the moderator might be a different person from session to session in the same study. (I often will moderate the first few sessions of a study and then hand the moderating over to the first person on the design team who feels ready to take over.)
To make that work, it’s a good practice to create some kind of checklist for the sessions, just to make sure that the team’s priorities are addressed. For a field study or a formative usability test, a checklist might be all a team needs. But if the team is working on sussing out nuanced behaviors or solving subtle problems, we might want a bit more structure.
A couple of the teams I work with ensure that everything is lined up and that *anyone* on the team could conduct the sessions by creating detailed scripts that include stage direction. Here are a couple of samples:
Whether the team is switching up moderators or it’s the same person conducting all the sessions, creating a script for the session that includes logistics is a good idea:
  • think through all the logistics, ideally, together with the team
  • make sure the sessions are conducted consistently, from one to the next
  • back up the main researcher in case something drastic happens — someone else could easily fill in
Logistics rehearsal
When you walk through, step by step, what’s supposed to happen during a session, it helps everyone visualize the steps, pacing, and who should be doing what. My client teams use the stage direction in the script as a check to make sure everything is being covered to reach the objectives of the sessions. It’s also a good way to review what tools, data, and props you might need.
Estimate timing
Teams often ask me about timing. When they get through a draft of a script that includes stage directions, they get a pretty solid feeling pretty quickly for what is going to take how long. From this they can assign timing estimates and make decisions about whether they want participants to keep going on a task after the estimated time is reached or redirect to the next task.
Mapping out location flow
It’s easy to overlook the physical or geographic flow – what a director would call blocking – of a session. Where does the participant start the session? In a waiting room, at her desk, or somewhere else? Will you change locations within a room or building during the session? How do you get from one place to the next?
Consistency and rigor
Including stage directions in a script for a user research session can help reviewer-stakeholders understand what to expect. More importantly, the stage directions act as reminders to the moderator so she’s doing the same things with and saying the same things to every participant in the study. This means nothing gets left out deliberately and nothing gets added that wasn’t agreed on ahead of time. (For example, the team could identify some area to observe for and put a prompt in the script for the moderator to ask follow-up questions that are not specifically scripted, depending on what the participant does.)
Any really good project manager is going to have a Plan B. With a script that includes detailed stage directions, anyone who has been involved in the planning of a study should be able to pick up the script and moderate a session. The people I worked with at Tec-Ed called this “the bus test” (as in, If you get hit by a bus we still have to do this work).
Some teams I work with want to spread out and run simultaneous sessions. The stage directions can help ensure consistency across moderators. (Rehearse and refine if you’re really going to do this.)
Finally, when it comes time to write the report about the insights the team gained, the script — with its stage directions — can help answer the questions that often come asking why things were done the way they were done or why the data says what it says.
Stage it

Each person in a session is an actor, whether participant or observer. The moderator is the director. If the script for a study includes instructions for all the actors in the session as well as the director in addition to documenting what words to say, everyone involved will give a great performance.

When I say “usability test,” you might think of something that looks like a psych experiment, without the electrodes (although I’m sure those are coming as teams think that measuring biometrics will help them understand users’ experiences). Anyway, you probably visualize a lab of some kind, with a user in one room and a researcher in another, watching either through a glass or a monitor.

It can be like that, but it doesn’t have to. In fact, I’d argue that for early designs it shouldn’t be like that at all. Instead, usability testing should be done wherever and whenever users normally do the tasks they’re trying to do with a design.

Usability testing: A great tool
It’s only one technique in the toolbox, but in doing usability testing, teams get crisp, detailed snapshots about user behavior and performance. As a bonus, gathering data from users through observing them do tasks can resolve conflict within a design team or assist in decision-making. The whole point is to inform the design decisions that teams are making already.

Lighten up the usability testing methodology
Most teams I know start out thinking that they’re going to have a hard time fitting usability testing into their development process. All they want is to try out early ideas, concepts and designs or prototypes with users. But reduced to its essence, usability testing is simple:

  • Develop a test plan and design
  • Find participants
  • Gather the data by conducting sessions
  • Debrief with the team

That test plan/design? It can be a series of lists or a table. It doesn’t have to be a long exposition. As long as the result is something that everyone on the team understands and can agree to, you have written enough. After that, improvising is encouraged.

The individual sessions should be short and focused on only one or two narrow issues to explore.

But why bother to do such a quick, informal test?
First, doing any sort of usability test is good for getting input from users. The act of doing it gets the team one step closer to supporting usable design. Next, usability testing can be a great vehicle for getting the whole team excited about gathering user data. There is nothing like seeing a user use your design without intervention.

Most of the value in doing testing – let’s say about 70% – comes from just watching someone use a design. Another valuable aspect is the team working together to prepare for a usability test. That is, thinking about what Big Question they want answered and how to answer it. When those two acts align, having the team discuss together what happened in the sessions just comes naturally.

When not to do testing in the wild: Hard problems or validation
This technique is great for proving concepts or exploring issues in formative designs. It is not the right tool if the team is facing subtle, nuanced, or difficult questions to answer. In those cases, it’s best to go with more rigor and a test design that puts controls on the many possible variables.

Why? Well, in a quick, ad hoc test in the wild, the sample of participants may be too small. If you have seized a particular opportunity (say, with a seatmate on an airplane or a bus, as I have been known to do – yeah, you really don’t want me to sit next to you on a cross-country flight), a sample of one may not be enough to instill confidence with the rest of the team.

It might also happen, because the team is still forming ideas, that the approach in conducting sessions is not consistent from session to session. When that goes on, it isn’t bad necessarily. It can just mean that it’s difficult to draw meaningful inferences about what the usability problems are and how to remedy them.

If the team is okay with all that and ready to say, “let’s just do it!” to usability testing in the wild, then you can just do more sessions.

So, there are tradeoffs
What might a team have to consider in doing quick, ad hoc tests in the wild rather than a larger, more formal usability test? If you’re in the right spot in a design, for me doing usability testing in the wild is a total win:

  • You have some data, rather than no data (because running a larger, formal test is daunting or anti-Agile).
  • The team gets a lot of energy out of seeing people use the design, rather than arguing among themselves in the bubble of the conference room.
  • Quick, ad hoc testing in the wild snugs nicely into nearly any development schedule; a team doesn’t have to carve out a lot of time and stop work to go do testing.
  • It can be very inexpensive (or even free) to go to where users are to do a few sessions, quickly.

Usability testing at its essence: something, someone, and somewhere
Just a design, a person who is like the user, and an appropriate place – these are all a team needs to gather data to inform their early designs. I’ve seen teams whip together a test plan and design in an hour and then send a couple of team members to go round up participants in a public place (cafes, trade shows, sporting events, lobbies, food courts). Two other team members conduct 15- to 20-minute sessions. After a few short sessions, the team debriefs about what they saw and heard, which makes it simple to agree on a design direction.

It’s about seizing opportunity
There’s huge value in observing users use a design that is early in its formation. Because it’s so cheap, and so quick, there’s little risk of making a mistake in making inferences from the observations because a team can compensate for any shortcomings of the informality of the format by doing more testing – either more sessions, or another round of testing as follow-up. See a space or time and use it. It only takes four simple steps.

“Let’s check this against the Nielsen guidelines for intranets,” she said. We were three quarters of the way through completing wireframes for a redesign. We had spent 4 months doing user research, card sorting, prototyping, iterating, and testing (a lot). At the time, going back to the Nielsen Norman Group guidelines seemed like a really good idea. “Okay,” I said. “I’m all for reviewing designs from different angles.”

There are 614 guidelines.

This was not a way to check designs to see if the team had gone in the right design direction.

Are you designing or inspecting?
They are not interchangeable, guidelines and heuristics, but many UXers treat them that way. It’s common to hear someone saying that they’re doing a heuristic evaluation against X guidelines. But it doesn’t quite work like that.

Designing is an act of creation, whether you’re doing research, drawing on graph paper, or coding CSS. Inspecting is an act of checking, of examining, often with some measure in mind.

Guidelines are statements of direction. They’re about looking to the future and what you want to incorporate in the design. Guidelines are aspirational, like these:

  • Add, update, and remove content frequently.
  • Provide persistent navigation controls.
  • Index all intranet pages.
  • Provide org charts that can be viewed onscreen as well as printed.*

Heuristics challenge a design with questions. The purpose of heuristics is to provide a way to “test” a design in the absence of data by making an inspection. Heuristics are about enforcement, like these:

Visibility of system status
The system should always keep users informed about what is going on…
Match between system and the real world
The system should speak the users’ language….
User control and freedom
The system should provide a clearly marked “emergency exit” to leave the unwanted state … **

Creating or diagnosing?
Heuristics are often cast as pass/fail tests. Does the UI comply or not? While you could use the guidelines to evaluate web site designs, they were developed as tools for designing. They present things to think about as teams make decisions.

Both guidelines and heuristics are typically broad and interpretable. They’re built to apply to nearly interface. But they come into play at different points in a design project. Guidelines are things to think about in reaching a design; they are considerations and can interact with one another in interesting ways. Heuristics are usually diagnostic and generally don’t interact.

Don’t design by guidelines alone
For example, on the intranet project, we looked at guidelines about the home page. One directive says to put the most important new information on the home page, and the next one says to include key features and company news on the home page. A third says to include tools with information that changes every day. But earlier in the list of guidelines, we see a directive to be “judicious about having a designated ‘quick links’ area.” Guidelines may feel complementary to one another or some may seem to cancel others out. Taken together, there’s a set of complex decisions to make just about the home page.

And it was too late on our intranet to pay attention to every guideline. The decisions had been made, based on stakeholder input, business requirements, and technology constraints, as well as user requirements. Though we were thoughtful and thorough in designing, anyone scoring our site against the guidelines might not give us good marks.

Don’t evaluate by heuristics alone
Likewise, when looking at heuristics such as “be consistent,” there’s a case for conducting usability tests with real users. For example, on the intranet I was working on, one group in the client company was adamant about having a limited set of page templates, with different sections of the site meeting strict requirements for color, look, and feel. But in usability testing, participants couldn’t tell where they were in the site when they moved from section to section.

Guidance versus enforcement
What are you looking for at this point in your design project? In the intranet project, we were much closer to an evaluative mode than a creation mode (though we did continue to iterate). We needed something to help us measure how far we had come. Going back to the guidelines was not the checkpoint we were looking for.

We sallied forth. The client design team decided instead to create “heuristics” from items from the user and business requirements lists generated at the beginning of the project, making a great circle and a thoughtful cycle of research, design, and evaluation.

I don’t know whether the intranet we designed meets all of the guidelines. But users tell us and show us every day that it is easier, faster, and better than the old intranet. For now, that’s enough of a heuristic.

* From “Intranet Usability: Design Guidelines from Studies with Intranet Users” by Kara Pernice Coyne, Amy Schade, and Jakob Nielsen

** From Jakob Nielsen’s 10 heuristics, see


Where do heuristics come from?

What are you asking for when you ask for heuristic evaluation?

:: :: :: :: :: :: :: :: :: ::

Note: I’m moving!
After 20 years in the San Francisco Bay Area, I’m bugging out. As of September 1, I will be operating out of my new office and home in Andover, Massachusetts. I’m excited about this move. It’s big!

You can still find me at, email me at, on Twitter as danachis, and on the phone at 415.519.1148.