In the context of education, accountability refers to the concept that schools are responsible for ensuring that students meet agreed-upon standards of academic achievement. While governmental entities claim accountability is essential for the allocation of resources and the evaluation of policies and budgets, the term has taken on several distinct meanings. The principal dilemma was articulated by Susan Fuhrman and Richard Elmore (2004):

It is evident that what policy makers and the informed public think performance-based accountability is, differs considerably from what it actually is. In political discourse, it is common to hear both opponents and advocates speak as if test results were the metric of success in performance-based accountability … [but] the idea of equating student learning with test performance is suspect, both in terms of the technical characteristics of tests and the incentive effects of testing on instruction. (p. 275)

David Figlio and Cecilia Rouse (2006) suggest that accountability in the United States involves two distinct alternatives. The first is the use of test-based performance indicators, followed by sanctions for low-performing schools. The second alternative uses market forces to reward some schools and punish others as parents and students make personal resource-allocation decisions through the use of vouchers. Similar market-based accountability systems occur through the exercise of enrollment choice in charter schools, magnet schools, or open-enrollment systems (Finn et. al. 2000; Coulson et al. 2006). This article reviews accountability before recent changes in federal legislation and provides alternatives for consideration.


Until 2001 compliance-based accountability was the primary mechanism by which school systems and many other recipients of governmental funds were held accountable. If procedures were followed and rules were enforced, then the entity was sufficiently accountable. Despite the evidence that a lot of wasteful and counterproductive effort is spent on generating strategic plans and submitting proof of school improvement (Schmoker 2004; Reeves 2006), compliance-based accountability remains a dominant force throughout the United States and in many other national and provincial school systems.


The prevailing example of results-based accountability is the No Child Left Behind (NCLB) Act of 2001, legislation that is scheduled for reauthorization in 2007. The essence of NCLB is a focus on results as defined by state test scores. While the law makes the National Assessment of Educational Progress (NAEP) a calibration device (Reeves 2001), it also allows each state to establish its own academic standards and its own assessment procedures. Thus, two states using the NAEP can show, respectively, 20 percent and 80 percent of students scoring at a proficient level—but show precisely the reverse when the "results" under consideration are scores generated by state-created tests. Moreover, results-based accountability emphasizes the effects of education without providing insight into the results. Wealthy schools have better results, but it does not necessarily follow that those results stem exclusively from better teaching, leadership, and policy, any more than poorer results in poor schools stem exclusively from inferior teaching, leadership, and policy (Rothstein 2004).


To supplement exclusive reliance on test scores, expanded accountability systems have been employed in many school systems (see Reeves 2002, 2004a, 2004b). Holistic accountability is based on three tiers of indicators: system-wide indicators, including test scores; school-based indicators, including professional practices of teachers and educational leaders; and school narratives, providing qualitative context for quantitative data. One of the best examples of a governmental entity systematically examining both student achievement data and professional practices is provided by Alberta Learning, the system used in the Canadian province of Alberta (unlike the United States, Canada’s educational governance is decentralized). Alberta’s rigorous standards, consistent tests, holistic accountability system, public reporting, and long-term improvements in achievement suggest that accountability policies can be effective and constructive governance tools.


A growing number of schools are using value-added accountability, in which progress is measured by comparing students’ present performance to their performance in previous years (Sanders 1998). As of November 2006, ten states have been authorized to experiment with this system. Value-added accountability has the advantage of showing more meaningful comparisons and focusing on growth in achievement, thus encouraging low-performing schools and challenging high-performing schools. Value-added models are complex and in some cases proprietary. Moreover, any test that can show progress will, of necessity, include items below grade level and above grade level. Such tests are not consistent with the prevailing NCLB requirement that state tests reflect grade-level academic standards. This inevitably leads to tradeoffs: A test that addresses multiple grade levels in order to allow students to "show progress" will require more items in order to maintain the reliability of the test—but more test items can subject students (and teachers) to "test fatigue," which itself can impair the validity of the test. If, on the other hand, a fifth grade student takes a test with only fifth grade items on it, it would be possible for immense progress—say, from a second-grade to a fourth-grade reading level—to be substantially overlooked. Emerging models using item response theory are being experimentally implemented in some districts, notably by the Northwest Evaluation Association. Item response theory (IRT) is the study of test and item scores based on assumptions concerning the mathematical relationship between abilities (or other hypothesized traits) and item responses (Baker 2001). While the mathematics of IRT can be complex, the practical application in the realm of educational accountability and assessment is straightforward. Without IRT, every student would take the same test. Using IRT, each student would take a test uniquely suited to his or her abilities.

For example, if student A and B are taking a test of Grade 4 reading, student A might get the first question right, while student B gets the first question wrong. In a traditionally constructed test, both students would continue to take the same test, with student A doing well—in fact, failing to be challenged—while student B might become increasingly frustrated, perhaps to the point of enduring test fatigue and giving up on the exam. Using IRT, however, student A would proceed to a more difficult question, while the next question given to student B would be easier.


As educational accountability policies are revised in future years, leaders and policymakers can learn from the successes, errors, and unintended consequences of previous policies. In a wide variety of fields, opinions hold sway over evidence, and as Jeffrey Pfeffer and Robert Sutton (2006) warn, leaders are deluded by "dangerous half-truths" and "total nonsense." Many faulty educational practices, often incorporated into detailed long-term strategic plans, continue, even though the evidence does not support them (Childress et. al 2006; Reeves 2006). While market-based accountability surely leads to definitive rewards and sanctions, the invisible hand of the market does not shed light on how accountability policies can be employed to attain their central aim—the improvement of school performance and student achievement. To advance this cause, three essential aspects of accountability need to be kept in mind:

1. The purpose of accountability is to improve performance. It is not merely a reporting vehicle used to rate, rank, and sort students, teachers, schools, and states. Therefore, of necessity, an effective accountability system must include not only results, but also inferences about how to improve results. An accountability system that includes only student test scores without a measurement of teaching and leadership practices is like a healthcare accountability system that counts death rates, but does not ask how patients died.

2. Accountability requires coherent data. To allow for hypothesis testing, data must be distributed and warehoused in a way that makes it accessible and usable. While national educational standards remain politically impossible, there should be national standards for accountability systems that would permit meaningful comparison of the data generated by them.

3. The smaller the unit of analysis, the more meaningful inferences from the data will be. The evidence on the impact of classroom teachers on student learning is overwhelming (Darling-Hammond and Sykes 1999). Goodlad (1990, 1994) also makes a persuasive case that the individual school leader can have a profound impact on student achievement. However, when considered on a larger scale, the relationship between policy and results is less clear. Attempts to track district-level "progress" are bedeviled by countless confounding variables. Even school-level accountability—the focus of present law—can lead to a label of success or failure for an entire school based on the performance of one group of students in one grade in one subject.

