How to run an heuristic evaluation

Lukcha · 9 June 2016 22:14

Originally published at: http://uxmastery.com/run-heuristic-evaluation/
Being a user experience designer often requires juggling tensions, whether they be juggling creative tension, managing stakeholder tension, or constantly living in the tension between design expertise and user-centred design methods. In the early 1990’s this third tension sparked a lot of high-level thinking by people wanting to define, once and for all, what the principles and best practices for interface design should be.

The process we use for heuristic evaluations today can be traced directly back to that period. Rolf Molich and Jakob Neilsen published their foundational ‘Improving a human-computer dialogue’ in 1990, building on work by people like David Cheriton (1976), Neilsen’s future consulting partner Donald Norman (1983), and Ben Schneiderman (1987), amongst others.

At the same conference where Molich and Neilsen presented their heurisic evaluation method, Clayton Lewis, Peter G Polson and their colleagues at the Institute of Cognitive Science presented a walkthrough methodology tackling the same problems. Usability practitioners recognised the benefits of combining these two methods, and although the naming rights are a bit grey, this combination is what most UXers think of when talking about doing a heuristic evaluation today.

The good and the bad

There are some good reasons why you might use this technique:

It can be a quick and inexpensive way to generate feedback for designers
It can be used pretty early in the design process
It can give a more comprehensive (death by a thousand cuts!) assessment of the system than usability testing
Assigning the correct heuristic can suggest a good place to start for corrective measures
You can use it together with other usability methods

But there are also some fundamental problems with the technique, so it is critical that you understand its limitations and dangers too:

For it to be done well, the evaluators should ideally be double-experts; usability experts as well as experts in the subject domain of the project (i.e. finance, education, insurance, etc)
You need to use more than one evaluator—this is often forgotten. A single expert working in isolation may only pick up 20% of the usability issues. Even ten experts may only surface 85%. A good compromise between effectiveness and practicality is to use between 3-5 evaluators, which gives you around 60% hit rate.
Finding people suitable as evaluators can be difficult, and nowadays they may be more expensive than running a usability test with 5 participants.
Using heuristics to identify usability issues is a relatively black and white approach. It will identify more minor issues than usability testing, but it will also have plenty of 'false positive' issues that aren't really problems at all.
Placing all of your trust in the heuristics may not be well founded. For example, I'm not sure that Neilsen's heuristics have been formally validated. Does anyone know?

The argument goes: "If you're such an expert and have the experience you claim, how come you can't just give us the answers for the right design?" Well, if you have sufficient experience and expertise in usability, you'll also be well aware that users are notoriously unpredictable. It's common to experience 'aha!' moments in usability tests that show something we never expected, even when the design seems to conform to key heuristics perfectly.

Heuristic evaluations are no substitute for usability testing, but they can help improve the potential of usability tests if they’re used in conjunction; running a heuristic evaluation before beginning a round of usability tests will reduce the number and severity of design errors discovered by users, helping minimise problems and distractions during the testing.

Getting your hands on some suitable heuristics

Although Nielsen is probably the best known, there are many other sets of heuristics, and some of them are considered better. Here's a list of all the ones I know of. Any that I've missed?

Jakob Nielsen’s Heuristics for User Interface Design (1994 version)
Ben Shneiderman’s Eight Golden Rules of Interface Design from his book '‘Designing the user interface’ (1987)
Jill Gerhardt-Powals 10 Cognitive Engineering Principles
Connell & Hammond's 30 Usability Principles
Bastien and Scapin created a set of 18 Ergonomic criteria
Tognazzini, B. (2003). First principles of interaction design (2003)
Lidwell, W., Holden, K., and Butler, J. (2003). Universal principles of design. Rockport Publishers.
Constantine, L. and Lockwood, L. (1999). Software for use. Addison-Wesley.
Cooper, A. and Reimann, R. (2003). About face 2.0: The essentials of interaction design. John Wiley & Sons.

If you really want to get stuck in, check out Smith & Mosier's nine hundred and forty four guidelines for the design of user-interfaces (from 1986).

Susan Weinschenk and Dean Barker decided to amalgamate the usability guidelines from multiple sources (including Nielsen’s, Apple and Microsoft) and did a massive card sort resulting in twenty of their own heuristics.

I use Jill Gerhardt-Powals’ heuristics as they take a more wholistic approach to evaluating the system, which I find easier to get my head around.

Running your own heuristic evaluation

Ingredients:

3-5 experts
a set of heurstics
a list of tasks
a system to test, or screen shots, prototypes
a standard form for recording your notes

Before you start:

You need to know who your users are, and what their goals are. Your existing research may set the scene with scenarios, personas or story mapping, so use those to get up to speed. Do more research if you don't have a good idea of who you're designing for.
You also need to define the tasks that your users need to accomplish to achieve their goals, and how these work with the design vision you want to evaluate.
There will be lots of different tasks, so I find it useful to rank them and focus on the most important. You can achieve this by listing them all (there may be hundreds of tasks) and getting lots of users to choose just their top 5. This is a statistical, quantitative method, so the more the merrier—two hundred user respondents is a ballpark. It's also critical that you use the right language and terminology for your tasks, as users are only going to be skim reading to try and pick out their favourite tasks. Get your handful of stakeholders to do the same activity too, and then compare the results. Hopefully, you have a handful of tasks that users suggest are clearly ahead of the pack. If these match with what your stakeholders say, add these tasks directly to the priority list. Some other tasks considered important by users and stakeholders may need more discussion!
Work out what the best evaluation method is, given the project and the tasks you're investigating. Heuristic evaluation may not be the right answer.

Method:

Decide on a set of heuristics to use. If you don't have a good reason not to, I would recommend Neilsen's or Gerhardt-Powals.
Select your team of 3-5 evaluators. Ideally they both usability experience as well as domain knowledge related to your project, and might be found in your design team or list of professional contacts. You should give them all the same training on the principles and process, and ensure they're interpreting the heuristics properly.
Set up a system of severity codes (critical issue, serious issue, minor issue, good practice), or traffic light scheme (red, orange, yellow, green), and make sure the evaluators understand them and can use them consistently.
Conduct the walkthrough itself. Step by step through each agreed task, wearing the users shoes (terminology, priorities, likely choices).
Look for and identify problems based on the set of heuristics.
Note where the problem is (the page/screen, location on the page), how bad it is (rating scale), and push on to the next issue.
I like to go back and note the options for fixing the issues seprately afterwards, as switching between 'seek mode' and 'fix mode' is pretty distracting.
After you're done, collate and analyse results from the multiple experts. This is where you realise the consistency between evaluators was important. Otherwise you're trying to reconcile too many levels and interpretations.

Writing up a report

The whole point is to surface the key issues and start a discussion about how to improve. There's enough detail that you can't avoid a certain amount of documentation, but it’s also pretty important that you deliver the findings in person. You’ve systematically deconstructed and criticised something that your team and stakeholders will likely be protective of, so being there to show your loyalty, to clarify and explain gently, and to justify your findings is pretty key.

Presenting the report itself is often done as a deck of slides, or as a written report. Some really slick presentations have captured video of the screens to include mouse movement, interactions, audio of the issues being described, and to highlight particular issues.

However you choose to put it together, make sure you cover the following:

Don't put more work into the report than the system you're ultimately designing. That would be silly.
Include a statement of which set of heuristics you have used, and why they are an appropriate guideline to trust. A lot of your credibility will be resting on the shoulders of the person who developed the heuristics, so make sure you trust them.
Note the professional backgrounds of the experts that were used as evaluators (some proof of why they can be considered experts), such as a CV or resume.
Include a simple summary of the aggregated key findings, including which things need time and resources to address. This is for the executives who just need the punchline.
List and describe the issues you discovered, in order of severity.
With each issue, make recommendations for what it would take to adjust the system to follow good practice.
Relate the issues and recommendations to visuals for easy reference. Annotated screenshots are a good option.

What comes next?

Heuristic evaluations only give you an indication of where likely issues are. They suggest some options for heading towards fixes, but you still haven't put the design in front of real users, so you won't know how much you're missing out on. Your next step is to take a stab at solving the biggest problems, and then arranging your next set of usability tests. You'll find that you have both some assumptions that you can base design hypotheses on, and some ideas of where to focus your testing. From there, you can be guided by the users themselves. Just promise me you won't stop with only a heuristic evaluation!