Criterion-Referenced Test

Criterion-referenced tests and assessments are designed to measure student performance against a fixed set of predetermined criteria or learning standards —i.e., concise, written descriptions of what students are expected to know and be able to do at a specific stage of their education. In elementary and secondary education, criterion-referenced tests are used to evaluate whether students have learned a specific body of knowledge or acquired a specific skill set. For example, the curriculum taught in a course, academic program, or content area .

If students perform at or above the established expectations—for example, by answering a certain percentage of questions correctly—they will pass the test, meet the expected standards, or be deemed “ proficient .” On a criterion-referenced test, every student taking the exam could theoretically fail if they don’t meet the expected standard; alternatively, every student could earn the highest possible score. On criterion-referenced tests, it is not only possible, but desirable, for every student to pass the test or earn a perfect score. Criterion-referenced tests have been compared to driver’s-license exams, which require would-be drivers to achieve a minimum passing score to earn a license.

Criterion-Referenced vs. Norm-Referenced Tests
Norm-referenced tests are designed to rank test takers on a “bell curve,” or a distribution of scores that resembles, when graphed, the outline of a bell—i.e., a small percentage of students performing poorly, most performing average, and a small percentage performing well. To produce a bell curve each time, test questions are carefully designed to accentuate performance differences among test takers—not to determine if students have achieved specified learning standards, learned required material, or acquired specific skills. Unlike norm-referenced tests, criterion-referenced tests measure performance against a fixed set of criteria.

Criterion-referenced tests may include multiple-choice questions, true-false questions, “open-ended” questions (e.g., questions that ask students to write a short response or an essay), or a combination of question types. Individual teachers may design the tests for use in a specific course, or they may be created by teams of experts for large companies that have contracts with state departments of education. Criterion-referenced tests may be high-stakes tests —i.e., tests that are used to make important decisions about students, educators, schools, or districts—or they may be “low-stakes tests” used to measure the academic achievement of individual students, identify learning problems, or inform instructional adjustments.

Well-known examples of criterion-referenced tests include Advanced Placement exams and the National Assessment of Educational Progress , which are both standardized tests administered to students throughout the United States. When testing companies develop criterion-referenced standardized tests for large-scale use, they usually have committees of experts determine the testing criteria and passing scores, or the number of questions students will need to answer correctly to pass the test. Scores on these tests are typically expressed as a percentage.

It should be noted that passing scores—or “cut-off scores“—on criterion-referenced tests are judgment calls made by either individuals or groups. It’s theoretically possible, for example, that a given test-development committee, if it had been made up of different individuals with different backgrounds and viewpoints, would have determined different passing scores for a certain test. For example, one group might determine that a minimum passing score is 70 percent correct answers, while another group might establish the cut-off score at 75 percent correct. For a related discussion, see proficiency .

Criterion-referenced tests created by individual teachers are also very common in American public schools. For example, a history teacher may devise a test to evaluate understanding and retention of a unit on World War II. The criteria in this case might include the causes and timeline of the war, the nations that were involved, the dates and circumstances of major battles, and the names and roles of certain leaders. The teacher may design a test to evaluate student understanding of the criteria and determine a minimum passing score.

While criterion-referenced test scores are often expressed as percentages, and many have minimum passing scores, the test results may also be scored or reported in alternative ways. For example, results may be grouped into broad achievement categories—such as “below basic,” “basic,” “proficient,” and “advanced”—or reported on a 1–5 numerical scale, with the numbers representing different levels of achievement. As with minimum passing scores, proficiency levels are judgment calls made by individuals or groups that may choose to modify proficiency levels by raising or lowering them.

The following are a few representative examples of how criterion-referenced tests and scores may be used:

Reform

Criterion-referenced tests are the most widely used type of test in American public education. All the large-scale standardized tests used to measure public-school performance, hold schools accountable for improving student learning results, and comply with state or federal policies—such as the No Child Left Behind Act—are criterion-referenced tests, including the assessments being developed to measure student achievement of the Common Core State Standards. Criterion-referenced tests are used for these purposes because the goal is to determine whether educators and schools are successfully teaching students what they are expected to learn.

Criterion-referenced tests are also used by educators and schools practicing proficiency-based learning , a term that refers to systems of instruction, assessment, grading, and academic reporting that are based on students demonstrating mastery of the knowledge and skills they are expected to learn before they progress to the next lesson, get promoted to the next grade level, or receive a diploma. In most cases, proficiency-based systems use state learning standards to determine academic expectations and define “proficiency” in a given course, content area, or grade level. Criterion-referenced tests are one method used to measure academic progress and achievement in relation to standards.

Following a wide variety of state and federal policies aimed at improving school and teacher performance, criterion-referenced standardized tests have become an increasingly prominent part of public schooling in the United States. When focused on reforming schools and improving student achievement, these tests are used in a few primary ways:

Debate

The widespread use of high-stakes standardized tests in the United States has made criterion-referenced tests an object of criticism and debate. While many educators believe that criterion-referenced tests are a fair and useful way to evaluate student, teacher, and school performance, others argue that the overuse, and potential misuse, of the tests could have negative consequences that outweigh their benefits.

The following are a few representative arguments typically made by proponents of criterion-referenced testing:

The following are representative arguments typically made by critics of criterion-referenced testing:


The Glossary of Education Reform by Great Schools Partnership is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.