Kathleen McKinney, Cross Chair in the Scholarship of Teaching and
Learning
and Professor of Sociology
Illinois State University
This material draws heavily from G.S. Hanna and W.E. Cashin. (1988). Improving college grading. IDEA Paper No. 19. Center for Faculty Evaluation and Development, Kansas State Division of Continuing Education.
Most university professors use one of two mathematical methods for grading, both in terms of computing grades for individual assignments and in terms of computing grades for the overall course.
Some faculty members assign grades using a set standard, usually described as earning 90% or 92% or accumulating X number of points. Students seem to appreciate these set standards because they know what they have to achieve to earn a particular grade. But, experts have asked, what does this really tell students? Does saying you need 450 out of 500 points in this class tell students what is important, what level of learning they should achieve, what skills to master? This method of grading, then, isn't really criterion based grading because the criterion is merely an arbitrary percentage or number of points, not some specific skills, behaviors, or clearly defined and specified content that has been mastered. Criterion based grading requires specific and clear statements of the relevant content domain and domain referencing. Experts argue that in most college courses the material varies in difficulty level, and test questions and assignments do as well, thus, this system is not logical for those types of courses.
Other instructors compute grades by attempting to fit the grade distribution in the class to a normal curve or specifying that to receive an A, for example, you must be in the top 21% of the people in the class. On the one hand, this makes sense because it is a form of norm referencing, which is logical to use when the content domain is not clearly defined. However, this particular normal grade distribution fit to a normal curve generally does not make sense in small classes and/or with student populations that do not mirror large, general populations of students. In addition, such standards do not really tell students what they need to know or how hard they need to work unless they know the level of possible performance of the others in their class. Finally, this method forces students to compete with each other for grades; a limited number or percentage of students can receive an A regardless of the quality of their work.
Is there an alternative to these two methods? Some suggest anchoring grades. An anchor is a variable that correlates with performance in the class and, thus, helps one evaluate the status of the class. Examples include the following: the distribution of the SAT Math scores for students in a math class to help determine grade distributions in that class; the distribution of scores on a common final exam in a subject area to help determine the grade distribution in a section of the course next semester; or the grades on writing assignments in several sections of a course last semester to help you with the same course this semester. Anchoring, however, has some practical problems including that most faculty may be less knowledgeable about this technique than others and the data needed for anchoring may be unavailable.
As the discussion above implies, it is important when grading to take into account whether the course or task emphasizes well defined and specific content, less well defined content, or skills. With skills, one can more readily set real (not just 90%) criterion to master. With content, the content domain and difficulty must be clearly defined to use criterion based grading. If not, anchoring may be better.
Hanna and Cashin (1988) suggest the following criteria for evaluating college grading methods: obtain relevant norm referencing (such as anchors based on information from past or present students in the course) avoid the instability of small samples (grades should be able to reflect actual achievement in a given course even if it is unusually high or low; small samples will not fit a normal curve) avoid psychological evils or fixed-sum games (do not prevent cooperative efforts in learning through artificial competition, and grading should acknowledge that there is no set maximum amount of learning that can be achieved) provide a sense of efficacy (students need some control over their learning and their grades; they should know what the grades mean) be defined and interpretable (a grade should communicate useful information, and the definition of grades should be consistent across instructors and sections).