Testing Graphical User Interfaces (information science)

INTRODUCTION

In recent years, an emerging trend in software products has been toward the use of graphical user interfaces (GUIs). More user-friendly than traditional, text-based interfaces, GUIs serve as the front-end for a large portion of today’s software applications. Technologies like Ajax are helping to spread familiar GUI interaction styles to Web applications. With the rise of ubiquitous computing, users are interacting with GUIs in a widening range of situations—not just with their PCs, but with their dishwashers and cars. Critical applications, such as banking systems, are moving to GUIs as well. Thus, quality assurance for GUI-based software is growing more important every day.

With GUIs, users enjoy many degrees of freedom in the way they interact with the software. While this benefits users, it challenges testers. Because users may interact with a GUI in a variety of unexpected ways, it is difficult to insure that the software meets its functional requirements (correctness) and non-functional requirements (e.g., usability) for all possible interactions. The difficulties are compounded by the frequent intersection of GUIs with other emerging technologies, including component-based and service-oriented architectures. New trends in software development, such as rapid development cycles, globally distributed developers, and open-source projects, make the quality assurance process ever more challenging.

This chapter describes the state of the art in testing GUI-based software. Traditionally, GUI testing has been performed manually or semimanually, with the aid of capture-replay tools. Since this process may be too slow and ineffective to meet the demands of today’s developers and users, recent research in GUI testing has pushed toward automation. Model-based approaches are being used to generate and execute test cases, implement test oracles, and perform regression testing of GUIs automatically. This chapter shows how research to date has addressed the difficulties of testing GUIs in today’s rapidly evolving technological world, and it points to the many challenges that lie ahead.

BACKGROUND

A GUI provides a visual front-end through which a user can interact with a software application. Although there are various models for GUI design, the most commonly used in practice and in software-testing research—and hence the model assumed in this chapter—is the WIMP model with windows, icons, .menus, and pointing devices (Nielsen, 1993). The GUI is made up of widgets—such as buttons, text boxes, and labels—that the user can manipulate to send input to the underlying software and the software can, in turn, manipulate to send output to the user. Each widget has a set of properties—for example, “font”, “width”, “enabled”—each of which has some value—for example, “Helvetica”, “100″, “true” (Yuan & Memon, 2007).

Widgets are contained in windows, which may either be modal or modeless. A modal window blocks the user’s interaction with other windows while it is active, whereas a modeless window imposes no such restrictions. A window’s state at any particular time is the set of all triples (w,p, V) such that w is a widget in the window, p is a property of w, and v is the value of p. The GUI state then consists of the state of all windows in the GUI (Yuan & Memon, 2007).

As the user interacts with the GUI, the state of both the GUI and the underlying software can change. When the user performs an event on the GUI—such as clicking a button or typing in a text box—a piece of application code called an event handler is executed. The event is the basic unit of interaction with a GUI. To accomplish a task, a user typically must perform multiple events in sequence. Hence, a GUI test case consists of a sequence of events (Yuan & Memon, 2007).

Several tools and techniques are available to aid testing of GUI-based applications, varying greatly in the level of automation they provide. Ignoring the GUI altogether, test harnesses like JUnit can interact directly with the underlying software much like the GUI would. However, this may require major changes to the GUI’s architecture, and, at any rate, it leaves an important part of the end-user software untested.

JUnit has been extended in tools such as JFCUnit, Pounder, and Jemmy Module to interact with the application under test through its GUI. With these tools, test cases must be written manually. Alternatively, a tester can generate test cases by recording sequences of events, which the tester manually performs on the GUI, using a capture-replay tool. Some capture-replay tools—for example, CAPBAK and TestWorks—record events in terms of mouse coordinates, while others—for example, WinRunner, Abbot, and Rational Robot—record the GUI widgets associated with events. The latter are more robust in the face of superficial changes to the GUI layout (Memon & Xie, 2005).

All of the tools and techniques mentioned so far automate the execution of test cases but still require substantial effort on the part of the tester to generate test cases, define the test oracle, and modify the test suite as the application under test evolves. Tools like the visual test-development environment created by Ostrand, Anodide, Foster, and Goradia (1998) streamline the testing process but do not depart from the conceptualization of GUI testing as a fundamentally manual process. Similarly, while Kasik and George (1996) have shown how genetic algorithms can be used to augment a test suite, they leave much work to the tester. Fortunately, new techniques based on various types of models of the GUI are shifting much of the burden of the testing process from humans to machines.

The most popular type of GUI model, the state-machine model, makes it possible to generate test cases—or perform model-checking, a related activity—automatically (Belli, 2001; Berstel, Reghizzi, Roussel, & Pietro, 2005; Dwyer, Carr, & Hines, 1997; Holzmann & Smith, 1999; Shehady & Siewiorek, 1997; White & Almezen, 2000). But techniques based on state-machine models have serious drawbacks. These techniques require that the model be created manually, that a formal specification be written, or that the source code be annotated—in any case, a potentially laborious task susceptible to human error. Further, since the effectiveness of the test cases generated from the state-machine model depends on the model creator’s definition of “state”, two testers testing the same application may get quite different results (Yuan & Memon, 2007). Techniques for generating test cases from UML diagrams suffer from similar weaknesses (Vieira, Leduc, Hasling, Subramanyan, & Kazmeier, 2006).

Rather than modeling a GUI in terms of states, others have modeled it in terms of events. Memon, Pollack, and Soffa (2001) have used automated planning to generate test cases that consist of sequences of events chosen to accomplish tasks specified by the tester. In this approach, model creation requires substantial human effort: although the events in the model are identified automatically, their preconditions and effects must be defined manually. More recently, techniques have used event-based models to further reduce the amount of effort required in the testing process while improving its effectiveness. These are described in the next section.

GUI TESTING WITH EVENT-FLOW MODELS

Events are central to the dynamic structure of a GUI-based application. A user accomplishes tasks via the GUI by performing sequences of events. Thus, the execution of the application occurs as the execution of a sequence of event handlers, each of which may depend on and may also affect the state of the application. Users may interact with the application in unexpected ways, so the event handlers may be executed in unexpected orderings. In these respects, GUI-based applications differ from traditional, batch-style software (e.g., compilers), which receives some input, processes it, produces some output, and terminates. Traditional testing techniques like code-based coverage criteria that were designed for such software may not work as well for much differently-structured GUI-based applications, so new techniques have been developed to address GUIs’ event-driven nature (Memon, 2002).

The previous section showed how GUI-testing tools and techniques have evolved to be faster and more effective. Notable advances have been achieved through model-based testing, using various types of models. In recent years, one type of model has proved particularly successful: the event-flow graph.

Event-Flow Graph

In an event-flow graph, a GUI is represented by a graph whose vertices represent events and whose edges represent the follows relationship. Event et is said to followevent e2if et can be executed immediately after e2, with no events intervening. Test cases can be generated rapidly and automatically by traversing the EFG, and coverage criteria can be defined in terms of the EFG. Variations of the EFG have been used to further improve the cost-effectiveness of GUI testing. Each of these topics will be elaborated upon after the process of creating an EFG is explained (Xie & Memon, 2006).

An EFG can be reverse engineered semi-automatically from a GUI in a process called GUI ripping. A single GUI window is ripped by identifying and recording properties of all of the widgets it contains, then executing any events available in the window that open new windows. This can be accomplished by running the GUI with reflection to access the currently open windows and inspect their widgets. Widgets likely to open new windows can be identified based on conventions in GUI design: clicking on a widget whose caption ends in “…” typically opens a window. As new windows are discovered, each is ripped until no more new windows are found. Since the GUI may not be ripped perfectly—indeed, the GUI itself may contain defects that show up in the resulting GUI model—the tester must examine and possibly edit the model. The GUI model provides the information necessary to determine the follows relationships for all of the events and, hence, to construct the EFG (Memon, Banerjee & Nagarajan, 2003).

Any path through the EFG starting at an event that is available when the application under test is launched can serve as a test case. However, the number of possible test cases grows exponentially with the length of the path. Short test cases can find some faults, but many faults can only be detected with longer event sequences (Yuan & Memon, 2007). Thus, one challenge of GUI testing is to identify which of the longer sequences are most likely to add value to testing. Several variations of the EFG have been created to address this challenge.

Variations of the Event-Flow Graph

An important observation about the EFG is that it contains many events that are unlikely to be defective. The handler for an event that opens a menu, for example, is almost always located in library code and does not interact with the application code in any way. This observation leads to a variation of the EFG called the event interaction graph (EIG) that achieves marked size reduction without sacrificing fault-detection potential by omitting events that need not be tested as rigorously as the rest. The vertices in the EIG represent system-interaction events—events that either close windows or perform actions without opening or closing any windows or menus. Examples of system-interaction events include clicking the “OK” button in a preferences window and using the “copy” event to copy objects to the clipboard. Edges in the EIG represent the interacts-with relationship. A system-interaction event e1 is said to interact with a system-interaction event e2 if there is a path from e1 to e2 in the EFG that contains no system-interaction events other than e1 and e2. Test cases can be generated by traversing the EIG to get sequences of system-interaction events. To make the test cases executable on the GUI, they must be mapped onto the EFG to fill in the necessary nonsystem-interaction events before and between system-interaction events (Memon & Xie, 2005).

Even with its substantial size reduction compared to the EFG, the EIG can be unwieldy for large applications. Testing all length-n event sequences in the EIG is only feasible for the smallest values of n, often just 2. But many of the longer event sequences can safely be skipped during testing—namely, sequences in which the event handlers do not affect each other. If the handlers of events e1 and e2 do not interact, then executing e1 and e2 in sequence will not reveal any faults that could not be revealed by executing each of e1 and e2 by itself. The event semantic interaction graph (ESIG) is a subgraph of the EIG that omits edges between unrelated events. In the ESIG, an edge from event e1 to e2 means that there is an event semantic interaction relationship between e1 and e2, or, in other words, executing e1 followed by e2 results in a GUI state that is in some sense different from the state that would have resulted if e1 and e2 had been executed in isolation. An example of an event semantic interaction in a word processor occurs between the events “select all” and “delete”: performing delete just after select all modifies the text on the screen differently than executing either select all or delete does. But select all would not be expected to semantically interact with, say, “change page orientation”. The event semantic interaction relationship is defined in terms of dynamic GUI state, rather than static source-code properties such as variables shared by event handlers, because pervasive design elements in GUI software—such as multiple languages, callbacks for event handlers, multi-threading, and the use of libraries—limit the applicability of static analysis. The ESIG is constructed automatically by running an initial test suite that covers all edges in the EIG and analyzing the states into which each test case drives the GUI. As with the EIG, test cases can be generated from the ESIG by traversing the graph and filling in any necessary nonsystem-interaction events (Yuan & Memon, 2007).

Another variation of the EFG, the probabilistic event flow graph (PEFG), has been used to focus testing on the event sequences that users are most likely to perform. The PEFG consists of an EFG in which paths are annotated with probabilities indicating the likelihood that a user would follow that path. These probabilities come from usage profiles (also called operational profiles or session data), which are event sequences captured automatically as users interact with the program. Although the usage profiles could simply be replayed on the application (as in capture-replay tools), test cases generated by traversing high-probability paths in the PEFG offer some advantages. First, because the PEFG represents a composite of different usage patterns, some high-probability paths, while not executed by any individual user during usage-profile collection, are likely to be executed by future users. Testing these paths, then, can reveal faults that future users would be likely to encounter. Second, generating test cases from the PEFG is more flexible in the face of changes to the GUI than replaying users’ interactions verbatim (Brooks & Memon, 2007).

Running Test Cases

Test cases generated from any of these models—the EFG, the EIG, the ESIG, or the PEFG—can be executed automatically. Much like in GUI ripping, the application under test is run under reflection, enabling the widgets specified in the test cases to be identified and manipulated to perform events. The resulting output can be recorded and checked (run through the oracle procedure) automatically (Xie & Memon, 2007).

As mentioned previously, the way GUIs communicate information to users departs from the traditional batch style. GUI output is complex: rather than a number or text string, it consists of the property values of all of the widgets the user can see, including their structure and position relative to one another. Each event the user performs can drive the GUI into a new state, producing new output in the form of changes to the GUI. Even if the final GUI state resulting from an event sequence is correct, those at intermediate steps may not be. Recording and checking the entire GUI state after each intermediate event, while effective at detecting faults, costs time and space. The cost of applying the test oracle can be reduced by collecting less oracle information or relaxing the oracle procedure, at the price of reduced fault detection. The amount of oracle information—property values of widgets in the GUI—that must be collected and checked each time the GUI state is captured can be reduced by omitting widgets outside the currently active window or even outside the most recently manipulated widget. The oracle procedure—which compares the actual oracle information to the expected behavior—may be called less frequently—for example, only after the last event in the test case. Although less stringent oracles may be cheaper, thorough oracles can be more cost-effective. Moreover, stricter oracles can make up for a shortage of longer test cases (Xie & Memon, 2007).

Another factor that affects the cost-effectiveness of GUI testing is the coverage criterion (or test-adequacy criterion) that defines when the GUI has been tested enough. Coverage criteria have been defined in terms of the event-flow models described above—for example, “all events”, “all EIG edges”, or “all length-n event sequences in the EFG”. Event-based coverage criteria are more closely tied than code-based coverage criteria to the way the application is used and the way its components interact. Model-based approaches to GUI testing not only provide a way to define coverage criteria, but, by enabling many testing tasks to be automated, also make it feasible to satisfy those criteria (Memon, 2002; Memon, Soffa & Pollack, 2001).

Regression Testing

Of course, a testing technique does not just need to be successful the first time around—it must be cost-effective throughout the life-cycle of the software, across many iterations of regression testing. This is especially critical given the rise of rapid development cycles and multiple, geographically-distributed developers and maintainers. Enabling testers to rapidly construct a GUI model and generate test cases from it, event-flow models support the creation of a new, disposable test suite for each revision of the software.

In addition, testers may want to create a more permanent set of test cases for regression testing. Although minor changes to the GUI tend to break test cases generated from the EFG, test cases generated from the EIG have proved to be more robust, withstanding changes like moving events from one window to another and changing the structure of menus (Memon & Xie, 2005; Xie & Memon, 2006).

FUTURE TRENDS

As GUI testing is integrated into development processes, we expect that different styles of GUI testing will be adapted to different parts of the processes. Xie and Memon (2006) have proposed three concentric loops of testing that vary in speed and thoroughness. For rapid, fully automated testing, which may be performed each time a code revision is committed, a few EIG edges are covered and the oracle procedure simply checks for crashes; this technique is called crash testing. Over multiple iterations of crash testing, different EIG edges are covered, so that eventually all are tested. In smoke testing—a slower, more comprehensive form of testing that may be performed in conjunction with a nightly build—test cases covering all EFG vertices or edges are generated and used to reference test the current version of the software against an earlier version. The most labor-intensive forms of GUI validation—such as manually-created test cases and state-model-based techniques—may be reserved for release testing.

GUIs belong to the larger class of event-driven (or reactive) software—a class of software that challenges testers in many ways. The GUI-testing techniques described here may in the future be generalized to other kinds of event-driven software. This may include component-based applications, object- and service-oriented applications, network protocols, and Web applications, as well as nonWIMP GUIs. The event-flow models of GUI usage will have to be extended to handle complications like event timing, multiple users or input streams, and relaxed assumptions about turn-taking between users and computers (Memon, 2004; Nielsen, 1993).

CONCLUSION

As GUIs become more pervasive and more is demanded of them, the importance of their quality assurance is growing. The need to incorporate GUI testing into testing processes throughout the software life-cycle is becoming apparent. Over the past several years, advances in model-based GUI testing have made this more cost-effective by providing test-adequacy criteria and by automating test-case generation, test execution, test oracles, and regression testing. In the future, GUI testing may commonly be woven into the testing process, with different levels of testing designed to meet goals at different time scales. Lessons learned from GUI testing will likely be applied to the testing of other kinds of event-driven software.

KEY TERMS

Graphical User Interface (GUI): A visual front-end through which a user can interact with a software application.

GUI State: The collection of states of all windows in the GUI, where a window’s state is the set of all triples (w,p, v) such that wis a widget in the window, p is a property of w, and Ws the value of p.

Event: The basic unit of input to a GUI, triggered by such user actions as clicking a button or typing in a text box.

GUI Test Case: A sequence of events to be performed on the GUI.

Event Handler: A piece of application code that executes in response to an event.

System-Interaction Event: An event that either closes a window or performs some action without opening or closing any windows or menus.

Event-Flow Graph (EFG): A graph representation of a GUI in which vertices represent events and an edge from event e to event e2 signifies that e2 can be performed immediately after e .

Event-Interaction Graph (EIG): A graph representation of a GUI in which vertices represent system-interaction events and an edge from event e to event e2 signifies that there is a path from e to e2 in the EFG that contains no system-interaction events other than e and e2.

Event Semantic Interaction Graph (ESIG): A graph representation of a GUI in which vertices represent system-interaction events and an edge from event e to event e2 signifies that performing e t followed by e2 results in a GUI state that is qualitatively different from the state that would have resulted had e and e2 been performed in isolation.

Probabilistic Event-Flow Graph (PEFG): A graph representation of a GUI that consists of an EFG whose paths are annotated with probabilities of traversal by users.