skip the i-GuideIllinois State UniversityAdmissions at ISUAcademics at ISUEvents at ISUMap of ISUISU A to Z ListingISU AccessibilityISU 150th Anniversary
Center for Teaching, Learning & Technology

Opscan Evaluation: User’s Guide

Overview

Frequently Asked Questions

Test Writing Tips

User's Guide

Grade Posting Policy

Frequently Asked Questions about Grade Posting and UIDs

Change in Service - Spring 2010 New Change!

Opscan Reports

The Purpose of Test Statistics

The purpose of our scanning services is to provide you with quick scoring of your tests. To help you in analyzing the quality of your tests, we provide a variety of statistics and results to measure individual questions and overall class results. The statistics provide a basis for assessing the reliability and validity of a test and improving the quality of future classroom tests.

The following are examples and explanations of the statistics that appear on Opscan results.

Return to Table of Contents

Map between Job Request Form and Printouts

This table lists the output Opscan will produce for each option on the job request form.

Option Electronic Files Printouts
Scores only Excel spreadsheet, scores listed alphabetically Scores listed alphabetically
Scores listed by ID
Standard output Scores listed alphabetically
Scores listed by ID
Frequency table
Histogram
Scores for Blackboard Excel spreadsheet, scores listed alphabetically with ULIDs  
Long form item analysis   Long form item analysis printout
Difficulty index distribution
Discrimination index distribution
Short form item analysis Short form item analysis in Excel spreadsheet Short form item analysis printout
Difficulty index distribution
Discrimination index distribution
Student test responses   Student test response error report
Individual student feedback Individual student feedback reports in Word document Individual student feedback printouts, one sheet per student
Grade Book results   Grade Book printout
Tabulation   SPSS output of tabulated results

Return to Table of Contents

The Scores and Standard Output Results

If you chose “Scores only” or “Standard output,” then we created two printouts of scores: one alphabetized by last name and another sorted by identification number.  The percentage correct, percentile rank, T-scores, mean, and standard deviation are also printed on the lists of scores.

Student Scores, Listed by Name

The percentage correct is a conversion of the student’s score to a 100-point scale. The percentile rank indicates the student’s place in the entire class. For example, a student who scored 35 on a 44-point test got 79.5% of the questions correct. In our example above, that same score for Student07 meant that 92.5% of the students scored at or below his score.

A T-score is a standardized score that allows you to compare test scores for tests with different scales and for different classes. A T-score assumes that the test mean is 50 and the standard deviation is 10. The T-score provides an index of the distance a particular score lies from the average.

The electronic version of this report is very similar.

Excel version of Student Scores, Listed by Name

Return to Table of Contents

The Frequency Distribution Table

Choosing standard output means you will also receive a frequency distribution table. The frequency distribution gives you a table showing the frequency distribution of the test scores.

Sample of Frequency Distribution

The printout includes the following columns.

Score:

The scores are listed in descending order, beginning with the highest score earned on the test to the lowest score.

Frequency:

This indicates the number of students who received that exact score. In this example, exactly 3 students got 34 points on the test.

Cum Frequency:

Cumulative frequency is the number of students who performed at or below a given score. In this example, 18 students scored 34 points or below on the test.

Percentile:

This shows the percentage grade for each score. A student who received a 35 on the test got 79.55% of the questions correct.

Percentile Rank:

Each score is given a rank that indicates the percentage of students who fall below this point on the score distribution. A student who received a 27 on the test was in the 50th percentile. In cases where more than one student or where no student obtains a particular score, the distance between scores is taken into account.

Return to Table of Contents

The Histogram

The standard output report also includes a histogram, a graphical representation of a frequency distribution. The weighted scores are along the x-axis and the frequency is shown on the y-axis (how many students received each score on the test).

In some cases student scores will approximate the shape of a normal distribution, or a bell curve. For a variety of reasons (including small class size) this doesn’t always happen. A histogram, therefore, is more likely—but not guaranteed—to be bell shaped for classes in which there are 30 or more students.

Sample Histogram

Return to Table of Contents

The Scores for Blackboard Output

In order for us to include ULIDs in the report we generate for you, students must use their University ID (UID) on the Opscan sheet in the field that says "Social Security Number" or "University ID". Social Security Numbers or specially-generated classroom identification numbers will not work.

When you submit the job to Opscan, make sure to check the box that says "Scores for Blackboard" on the job request form along with any other reports you want. (If you want us to email the file, then you must set up a password with us to encrypt the Excel file and thus protect students' UIDs). In an email message, you will receive an Excel file that lists the students' names, UIDs, university login identifications (ULIDs), and scores. The column of ULIDs is titled “User ID” because Blackboard uses that term. A sample of such a file is shown here. If our scanner was unable to read a UID or unable to match it to a student, then no ULID is inserted.

Sample of Scores for Blackboard Excel File

To prepare the file for use with Blackboard

  1. In the top of Column D, you should change the word "Score" to the name you wish to give this test in Blackboard. If necessary, manually type in any missing ULIDs in the User ID column.
  2. Delete the Student Name column and University ID column. Keep the columns of scores and ULIDs (User IDs). Deleting the names in the Excel file ensures that the preferred name for a student remains in the Blackboard system and isn't overwritten by the name in the Excel file.
  3. Once you have reviewed the results and believe they are correct, save the file as a CSV (comma-delimited) file.

To import the file into Blackboard CE 6

  1. Make sure you are in the Teach tab.
  2. Click Grade Book in the left-hand side.
  3. Click the button “Import from Spreadsheet.”
  4. Click the Browse button to select the CSV file you created earlier.
  5. Click Upload.
  6. In the next screen, you will be asked to match the columns of your spreadsheet to columns in the gradebook. User ID should already be matched. If the label of the column with the scores exists in the gradebook, that is where they will go. Otherwise, you will be prompted on what to do with the new column.
  7. Once the scores are imported, you will need to turn this column into a grade column by choosing Grade Book options > Column Settings. By default, the Grade Column setting will be No; if you click on the link that says “No” under your column label, it will change the status to Yes and the grades will now be seen.

Return to Table of Contents

The Item Analysis

An item analysis takes each item, or question, on the test and gives you a variety of statistics regarding the answers chosen by the students. The item analysis allows you to evaluate a question and decide whether to use it on future tests. An item analysis of tests is available in either short or long form. Either long-form or short-form produces an item analysis report, a difficulty index distribution, and a discrimination index distribution.

The short form item analysis reportgives information for the overall class and covers the information listed below. The short form item analysis report also offers one statistic not found on the long form: point-biserial correlation.

Sample Short Form Item Analysis

Item:

This refers to the question number on the answer sheet, which corresponds to the question number on the test.

Correct:

The correct response as entered on the key is shown in this column. When multiple correct responses are marked on the key, up to three answers will be displayed. Thus, if A, B, C, and D are correct only A, B, and C will be displayed.

Frequency/Percentage:

For each possible response to the item, the frequency and percentage is printed. The frequency is the number of people who chose each response, while the percentage represents the percentage of the total group who chose each response. The “Omit” heading represents the number of students who did not answer the question.

Validity:

Validity is also known as the discrimination index. The validity for an individual item (i.e. question) compares two groups of students: those who scored in the top part of the class and those who scored in the bottom part. The value is calculated by subtracting the fraction of the bottom who answered correctly on the question from the fraction of upper who answered correctly.

The range of possible values is –1.0 to 1.0. If an item carries a high validity, it means that overall, high scoring individuals (i.e., those with high scores on the total test) answered the item correctly while low scoring individuals tended to miss the question. Therefore, a question with high validity has a high correlation with the total test score. If one considers the total test score to be a better indicator of a student’s knowledge, then the higher the relationship between the item and the total test, the more valid the item.

There are a number of factors to consider when examining an item’s validity. In contrast to standardized entrance exams, classroom tests often contain some items that discriminate poorly. For example, it may be an instructor’s intention to begin a test with several easy items in order to put students at ease or to establish a baseline. In cases where everyone answers a question correctly, the item validity is zero. However, it may be desirable to keep the item anyway.

A high negative validity indicates that there is something definitely wrong–either there is something wrong with the item, such as an ambiguous distracter, or the item has been keyed incorrectly. In the case of a zero or very low negative validity (e.g., -.10), the item may be very easy or very difficult with even a few good students getting the item wrong. It may also be due to random guessing.

Difficulty:

This is the proportion of the entire class who answered the question correctly. This feature is currently unavailable with questions that have multiple correct answers, but the difficulty can be easily figured by adding up the percentages of those students who answered correctly.

PBCorr (short form only):

Point-biserial correlation coefficient is another measure of the relationship between the score on the item and the score on the test. The value of this statistic ranges from -1.00 to 1.00.

A high positive value indicates that those who answered the item correctly also received higher scores on the test than those who answered the item incorrectly. A high negative value indicates that those who answered the item correctly received low scores on the test and those who answered the item incorrectly did well on the test. A near zero value indicates that there is little relationship between the score on the item and the score on the test. It is desirable to retain items with a high positive correlation coefficient and to eliminate those with near zero or negative values. As a rough guide, it is suggested that items with large negative or near-zero correlations be eliminated or substantially revised and those with low positive correlations be studied to determine how improvement might be accomplished.

n:

This gives the total number of students who answered the question.

The electronic version of this report is virtually identical.

Sample Item Analysis Excel File

The long-form item analysis gives basically the same information as the short form analysis, but you trade off the point-biserial correlation coefficient to see more information about each question. A different layout is used as well.

Sample Long Form Item Analysis

For completeness, here is an explanation of the columns in the report.

Item:

This refers to the question number on the answer sheet, which should correspond to the question number on the test. Under the item heading is the list of possible answers. In addition, the “Omit” heading represents the number of students who did not answer the question.

Frequency:

Frequency is the number of people who chose a particular answer.

%:

This symbol stands for the percentage of the total group who chose each response.

Correct:

The correct response as entered on the key is shown in this column. When multiple correct responses are marked on the key, up to three answers will be displayed. Thus, if A, B, C, and D are correct only A, B, and C will be displayed.

Difficulty:

This is the proportion of the entire class who answered the question correctly. This feature is currently unavailable with questions that have multiple correct answers, but the difficulty can be easily figured by adding up the percentages of those students who answered correctly.

Validity:

Validity is also known as the discrimination index and is calculated by subtracting the fraction of the lower scoring individuals who answered correctly from the fraction of upper scoring individuals who answered correctly. The range of possible values is –1.0 to 1.0.

If an item carries a high validity, it means that overall, high scoring individuals (i.e., those with high scores on the total test) answered the item correctly while low scoring individuals tended to miss the question. Therefore, a question with high validity has a high correlation with the total test score. If one considers the total test score to be a better indicator of a student’s knowledge, then the higher the relationship between the item and the total test, the more valid the item.

There are a number of factors to consider when examining an item’s validity. In contrast to standardized entrance exams, classroom tests often contain some items that discriminate poorly. For example, it may be an instructor’s intention to begin a test with several easy items in order to put students at ease or to establish a baseline. In cases where everyone answers a question correctly, the item validity is zero. However, it may be desirable to keep the item anyway.

A high negative validity indicates that there is something definitely wrong–either there is something wrong with the item, such as an ambiguous distracter, or the item has been keyed incorrectly. In the case of a zero or very low negative validity (e.g., -.10), the item may be very easy (a difficulty close to 1.0) or very difficult with even a few good students getting the item wrong. It may also be due to random guessing.

Up:

“Up” refers to those students who scored in the upper 27% of the class distribution of test scores.

Mid:

“Mid” refers to those students who scored in the middle 46% of the class.

Lo:

“Lo” refers to those students who scored in the lower 27% of the class.

Total:

The total number of students who chose each alternative. It should match the Frequency column.

n:

This gives the total number of students who answered the question.

An electronic version of the long form item analysis is not available.

A summary of test statistics is located at the bottom of any page of the item analysis report (short-form or long-form) and includes the following information:

Sample of Item Analysis Footer

Filename:

The name of the file at Opscan that stores the information. This is rarely useful for your purposes but will help us should a problem arise and we need to look at the data again.

Number of Students:

The number of tests scanned and used to generate the statistics.

Lo/Hi Score:

This shows the lowest and highest scores achieved by any students on the test.

Test Mean:

The test mean is the average of test scores; i.e., it is the sum of the test scores divided by the number of students who took the exam.

Standard Deviation:

Standard deviation gives you a measure of the spread of scores around the test mean. It is a calculation of the average distance of any score to the mean.

Standard Error of Measurement:

Standard error is a way of determining how well the test was able to reflect the knowledge and ability of the students. For a large class (more than 30), the standard error of measurement can be interpreted as follows: If a person obtains a test score of 50 and the standard error is 3, there is a 68% chance that his or her true score lies between 47 and 53 (50 ± 3) and there is a 95% chance that his or her true score lies between 44 and 56 (50 ± 6). The larger the standard error, the greater the chance that a student’s obtained score does not reflect his or her true ability.

Reliability Coefficient (KR20):

Reliability describes the extent to which the test scores can be depended on to provide an actual measurement of the students’ abilities and knowledge. The Kuder-Richardson formula (KR20) is one such coefficient that measures reliability. The reliability coefficient ranges from 0.0 to 1.0. The closer the coefficient is to 0, the less of a relationship exists between the test scores and the students’ true abilities.* In other words, a score close to 0 means the test scores are random and don’t accurately reflect the student’s knowledge. The closer the coefficient is to 1, the more the obtained score reflects the student’s actual knowledge.

In determining acceptable levels of reliability, several factors must be considered:

  • Test length. Long tests are more reliable than short tests.
  • Item difficulty. Very easy and/or very difficult items reduce reliability.
  • Range of talent. Classes containing students with wider ranges in ability (and tests constructed to reflect the range) will result in higher test reliability coefficients.
  • Similarity of item content. Tests that are constructed with items that measure the same content will have higher reliability than those that measure different content areas.

Gilbert Sax, Principles of Educational Measurement and Evaluation (Belmont, CA: Wadsworth, 1974), 174.

Return to Table of Contents

The Difficulty Index Distribution

This difficulty index is a printout displaying the range of difficulty values over the entire test and is included in both the long-form and short-form analyses. Recall that the difficulty of a question is the proportion of the entire class who answered the question correctly. The questions are grouped together based on their difficulty values to help you analyze how your test was handled.

Sample Difficulty Index Distribution

The columns refer to the following information:

Number:

The number of items (i.e. questions) being discussed in the given row.

Percent:

The percentage of items that fall in this row.

Interval:

The difficulty index interval being considered. While the long-form and short-form analyses return the difficulty index as a decimal fraction between 0.0 and 1.0, here the difficulty is converted into a percentage, and the items are grouped together in intervals of 5%. So a decimal range of .75 to .79 in the analysis printouts is written here as “75 to 79.” The intervals are organized from higher ranking down to lower ranking.

Item Numbers:

Under this heading all the items that have a difficulty value falling in the given interval are listed. For example, items with difficulty values of .76, .78, and .75 would be grouped together in the interval “75 to 79.”

Standardized normative tests such as the ACT and GRE require a difficulty level for each item of approximately .40 to .70. It is virtually impossible and often undesirable for classroom tests to adhere strictly to this requirement. For example, a few easy items, especially at the beginning of a test, often help students who suffer from test anxiety. A few very difficult items to determine the test “ceiling” may also be desirable.

An electronic version of the difficulty index distribution is not available.

The Discrimination Index Distribution

The discrimination index is a printout included in both the Long Form and Short Form Item Analysis options. It displays the range of validity values over the entire test.

Sample Discrimination Index Distribution

Number:

The number of items being discussed in the given row.

Percent:

The percentage of items that fall in this row.

Interval:

The validity interval being considered. The index is organized into higher-ranking intervals down to lower-ranking intervals.

Item Numbers:

Under this heading all the questions that have a validity values falling in the given interval are listed. For example, questions with validities of .67, .69, and .65 would be grouped together in the interval “0.65 to 0.69.”

Although it is always desirable to have positive coefficients, it is possible to produce negative ones. This happens when more low-scoring than high-scoring students answer an item correctly.

An electronic version of the discrimination index distribution is not available.

Return to Table of Contents

Student Test Responses Report

The report on student test responses is designed for the instructor. It shows all the responses given by the students for the test in a compact report. The example below has been modified to fit in this document but should demonstrate the report and its characteristics.

Sample Student Test Responses

For each student the following information is given:

Student Name:

The student's name is listed as it was given on the test sheet.

Student ID:

The student's identification number is listed.

Score:

The score, as weighted by you, is listed.

Underneath this information are the student’s responses to the test. The responses are printed in groups of ten. Any correct answer is shown as a dash (-), while an incorrect answer is listed. For example, under Student10, the first sequence (D-A-D--B--) shows his answers to questions 1 through 10. Question 1 he answered incorrectly; his response was D. Questions 2 he answered correctly, so no answer is given. Question 3 was answered with the response A, which was also incorrect. Question 4 was answered correctly. And so forth. The next set of responses (-A--------) lists his answers for questions 11 through 20.

If a student marks two or more answers for a question, then for that question an asterisk (*) is listed. Any question that was not answered shows up as a blank ( ).

An electronic version of the student test responses report is not available.

Return to Table of Contents

Individual Student Feedback

Individual student feedback is a report that allows you to give each student a sheet of paper that lists the score, the answers chosen by the student, and the correct answers for all questions. It is designed to be a graded version of the test.

Sample Individual Student Feedback

This example only shows a portion of the report. For any item where the student gave the incorrect answer, the response they chose is marked with a dollar sign ($).

The electronic version of this document is available as a Word document and includes all the students from the class.

Return to Table of Contents