HCI Track-A Evaluating UIs (Analytical) Week40

Posted Nov 19, 2024 Updated Nov 19, 2024

By Wei Xiong

7 min read

Evaluation in Human-Computer Interaction (HCI)

Introduction

This week, we delve into the critical topic of evaluation in Human-Computer Interaction (HCI). Evaluation is fundamental to the design process, and we will explore various forms, focusing specifically on analytical evaluation. Additionally, we’ll touch upon why evaluation is essential in the first place.

Why Do We Evaluate?

Determine Success

Regardless of whether you see yourself as an engineer, developer, scientist, or researcher, a common goal is to determine whether your theories, ideas, or designs actually work. Evaluation helps us:

Assess if we’ve fulfilled the project’s requirements and promises.
Understand when a project is complete.

Part of the Design Process

Evaluation isn’t just a final step to satisfy stakeholders; it’s an integral part of the design process:

Formative Evaluation: Conducted before or during the project to understand users, their needs, and how to build tools effectively.
Summative Evaluation: Conducted at the end of a design cycle to assess how well the system performs against certain criteria.

Separates Amateurs from Professionals

Professionals validate and evaluate their work to ensure it meets high standards:

Avoids costly errors that can have significant financial or even life-threatening consequences.
Involves users early on to increase adoption and create better solutions.

Types of Evaluation

Formative vs. Summative Evaluation

Formative Evaluation:
- Conducted before or during the design process.
- Aims to improve the software by finding problems and informing design choices.
- Example: Understanding user needs, preferences, and contexts.
Summative Evaluation:
- Conducted after the system is built.
- Measures the system’s performance against predefined benchmarks or certifications.
- Example: Testing if the software meets accessibility standards.

Analytical vs. Empirical Evaluation

Analytical Evaluation:
- Does not involve users.
- Based on theoretical models, rules of thumb, principles, or heuristics.
- Useful for early detection of usability issues.
Empirical Evaluation:
- Involves users interacting with the system.
- Based on observations, experiments, and user feedback.
- Provides insights into real-world use and user satisfaction.

Laboratory vs. Field Studies

Laboratory Studies:
- Controlled environment.
- High degree of control over variables.
- May lack real-world context.
Field Studies:
- Conducted in the user’s natural environment.
- Provides realistic insights.
- Less control over external factors.

Choosing the Right Evaluation Method

No One-Size-Fits-All

There is no single evaluation method that works for every situation. The choice depends on:

Stage of the project.
Resources available (time, money, expertise).
Specific goals of the evaluation (e.g., usability, performance, adoption).

Factors to Consider - ASSESSING EVALUATION

Validity: Is the evaluation correctly assessing the system?
- Some problems can’t be found by all types of evaluations
Reliability: Are the evaluation results independent of the evaluators?
- The “evaluator effect”
Usefulness: Can the results be used for their intended purpose?
- Not all evaluations will actually be useful

Matching Evaluation to Claims

As per Tamara Munzner’s nested model:

Algorithmic Claims: Use performance evaluation without users.
Design Claims: Require user studies to assess effectiveness.

Analytical Evaluation Methods

Purpose: Assess an interactive system based on questions, rules of thumb, or models of performance.
Components:

Process for performing evaluation
Resources to be used in the process

Heuristic Evaluation

Definition:
Assessing an interface against a set of usability principles (heuristics).
Process:
1. List the steps to conduct a typical operation.
2. Focus on the interactions with the system.
3. Apply rules of thumb and guidelines to assess.
Usability Heuristics Examples:
- Visibility of system status.
- Match between system and real world.
- User control and freedom.
- Consistency and standards.
Pros:
- Cheap, fast, and easy.
Cons:
- Error prone, not exhaustive, not user-centered.
Specialized Heuristic Evaluation
- Approach: Run an evaluation with only expert participants.
- Hybrid of empirical and analytical evaluation
- Experts provide specialized knowledge
- Higher chance of finding deep problems

Exercise: Sending an Email

Task: Write down the steps to send an email using your favorite app.
Evaluation: Identify any violations of usability heuristics.
Findings: Most email apps had minor nuisances but no critical issues.

Human Error Identification (HEI)

Focus: Predict potential user errors to improve error prevention.
Process:
- Perform a task analysis to understand possible user actions.
- Create a state matrix to map legal, illegal, and unavailable transitions between system states.
Error Types:
- Slips(差错): Correct intention, wrong action.
- Mistakes(错误): Wrong intention.
- Lapses(疏忽): Memory or attention failures.

State Matrix in HEI

Step 1: Enumerate all states of the UI
Step 2: Create a matrix showing possible transitions between states
Step 3: Label each cell in the matrix:
- (1) Legal transition
- (-1) Illegal transition (erroneous; should not be done by user)
- (0) Transition not available

Cognitive Walkthrough

Simulating how people think

Definition: An analytical evaluation method based on mental simulation of user thinking
- Goal: Expose problems impairing ease-of-use and learnability, especially for novice users
Key Characteristics:
- Systematic, step-by-step inspection of an artifact
- Evaluator simulates user’s mental processes during interaction
- Focus on guessing and exploring how to use an interface
Inputs Required:
1. User interface
2. Task scenario
3. Assumptions about users and contexts of use
4. Sequence of actions to complete tasks (from Task Analysis)
Effectiveness: Can predict around 50% of learnability-related problems (based on studies)
Purpose: Evaluate a system’s learnability, especially for new users.
Step-by-step Process:
1. Define user goals and tasks.
2. Step through the tasks as the user would.
3. Ask key questions at each step:
  1. Will the user try to achieve the right effect?
  2. Will they notice that the correct action is available?
  3. Will they associate the correct action with the effect?
  4. If correct action, will user understand task is progressing?
Consequence: design for “guessability”
Example: Evaluating a patient portal where users might be unsure which action to take due to ambiguous options.

COMPARISON: HE VS. HEI VS. CW

Keystroke-Level Model (KLM)

Definition: Predicts the time it takes for an experineced user to perform a task without errors.
Components:
- Physical actions: Keystrokes, mouse movements.
- Mental operations: Thinking time.
- System response time.
Process:
- Break down tasks into primitive operations.
- Assign standard times to each operation.
- Sum the times to predict total task time.
Example: Calculating the time to perform a ‘Find and Replace’ operation in a text editor.

GOMS Model

Definition: A hierarchical model that stands for Goals, Operators, Methods, and Selection rules.
Purpose: Analyzes the user’s cognitive structure to predict task performance.
Components:
- Goals: What the user wants to achieve.
- Operators: Actions available to the user.
- Methods: Procedures to accomplish goals.
- Selection Rules: How users choose between methods.

Automated Usability Evaluation

Aim: Use software tools to evaluate usability aspects automatically.
Advantages:
- Reduces the need for human evaluators.
- Can quickly assess certain usability metrics.
Limitations:
- Cannot fully replicate human judgment.
- Best used as a complement to other methods.

Examples of Tools:

Alto Interface Metrics: Evaluates web pages for accessibility, aesthetics, and color perception.
W3C Accessibility Tools: Checks compliance with web accessibility standards.
Color Blindness Simulators: Shows how interfaces look to users with color vision deficiencies.
UX Sense: An advanced tool that uses machine learning to analyze user interactions and sentiments.

WHICH ANALYTICAL EVALUATION TO USE?

The Importance of Multiple Evaluation Methods

No Single Method Suffices: Each method has strengths and limitations.
Combining Methods: Using multiple approaches provides a more comprehensive evaluation.
Context Matters: The choice of method depends on specific project needs and goals.

Summary

Why Evaluate in HCI?
- Ensure usability and user satisfaction
- Identify and address potential issues early
- Improve overall design and user experience
- Validate design decisions
- Reduce development costs and time
Key analytical evaluation methods
- Heuristic Evaluation: quick, broad coverage
- HEI: focus on errors
- Cognitive Walkthrough: learnability for novice users
- Keystroke-Level Model (KLM): task execution time for expert users
- Automated Usability Evaluation

Examples in Practice

Halo 4 Playtesting

Context: Video game development requires extensive evaluation.
Approach: Thousands of hours of playtesting to balance gameplay and identify usability issues.
Outcome: Improved game design and user experience.

KidsTeam

Initiative by Allison Druin at the University of Maryland.
Concept: A team of 8- to 12-year-olds collaborates with researchers to evaluate technology.
Methodology: Cooperative Inquiry—research is conducted with people, not on people.
Impact: Provides unique insights, especially in designing for children.

Elderly Users

Rationale: Designing for elderly users can improve usability for all.
Concept: The Curb-Cut Effect—features designed for accessibility benefit a wider audience.
Approach: Involve elderly users in evaluation to identify issues related to motor skills, vision, and cognitive load.

Conclusion

Evaluation is a crucial component of HCI that should be integrated throughout the design process. By understanding and applying various evaluation methods—analytical and empirical—we can create systems that are not only functional but also user-friendly and accessible. Remember, no single evaluation method is sufficient; choosing the right tool for the task is essential for effective evaluation.

Human Computer Interaction, Track A

This post is licensed under CC BY 4.0 by the author.