skip the i-GuideIllinois State UniversityAdmissions at Illinois StateAcademics at Illinois StateEvents at Illinois StateMap of Illinois StateIllinois State A to Z ListingIllinois State University Accessibility Information
Center for Teaching, Learning & Technology
CTLT Home >> Services >> Opscan Evaluation >> Frequently Asked Questions

Opscan Evaluation

 

What is the purpose of the test statistics?

The aim of testing is to match each student with a score that reflects 1) the amount learned relative to a norm group—the current class or previous classes, or 2) an absolute proportion of amount learned based on a very well constructed mastery test. For either style of testing, the only method that will provide information on the quality of the test is an item-by-item examination. Careful study of the statistical analysis output provides a basis for assessing the reliability and validity of a test and improving the quality of future classroom tests.

The following are examples and explanations of the statistics that appear on Opscan results.

Return to Statistics and Printout Questions

What is printed on the list of scores?

There are two printouts of scores: one alphabetized by last name and another sorted by University ID. The percentage correct, percentile rank, T-scores, mean, and standard deviation are also printed on the lists of scores.
Sample List of Scores, sorted by Name

What is a T-score?

A T-score is a standardized score that allows you to compare test scores for tests with different scales and for different classes. A T-score assumes that the test mean is 50 and the standard deviation is 10. The T-score provides an index of the distance a particular score lies from the average. In cases where the scores are normally distributed, approximately 68% of the students would have T-scores between 40 and 60. This is similar to the Z-score, where we assume the mean is 0 and the standard deviation is 1.

The relation between a Z-score and a T-score is as follows: If μ is the mean of the tests and σ is the standard deviation, then the Z score z of an individual test x is calculated as

z score formula

and then the T-score t is calculated as

t score formula .

On the standard output, this score is listed in the last column with the student’s actual score and percentage correct.

Return to Statistics and Printout Questions

What is printed on the frequency distribution?

The frequency distribution gives you a table showing the frequency distribution of the test results. The printout includes the following columns.
Sample Frequency Distribution

Score:
The scores are listed in descending order, beginning with the highest score earned on the test to the lowest score.
Frequency:
This indicates the number of students who received that exact score. In this example, exactly 3 students got 37 points on the test.
Cum Frequency:
Cumulative frequency is the number of students who performed at or below a given score. In this example, 22 students scored 37 points or below on the test.
Percentile:
This shows the percentage grade for each score. A student who received a 45 on the test got 90% of the questions correct.
Percentile Rank:
Each score is given a rank that indicates the percentage of students who fall below this point on the score distribution. A student who received a 37 on the test was in the 50th percentile. In cases where more than one student or where no student obtains a particular score, the distance between scores is taken into account.

Return to Statistics and Printout Questions

What does the histogram represent?

A histogram is a graphical representation of a frequency distribution. The histogram below is the approximate shape of a normal distribution, or a bell curve. The x-axis represents the weighted scores and the y-axis indicates the frequency (how many students received each score on the test).

In most cases student scores will not form a perfect bell curve for a variety of reasons (including small class size). A histogram, therefore, is more useful for classes in which there are 30 or more students.
Sample Histogram

Return to Statistics and Printout Questions

What is an item analysis?

An item analysis takes each item, or question, on the test and gives you a variety of statistics regarding the answers chosen by the students. The item analysis allows you to evaluate a question and decide whether to use it on future tests.

An item-by-item analysis of tests is available in either short or long form.

Return to Statistics and Printout Questions

What information is included in the short-form analysis?

The short form item analysis gives information for the overall class and covers the information listed below. The short form item analysis also offers one statistic not found on the long form: point-biserial correlation.
Sample Short Form Analysis

Item:
This refers to the question number on the answer sheet, which should correspond to the question number on the test.
Correct:
The correct response as entered on the key is shown in this column. When multiple correct responses are marked on the key, up to three answers will be displayed. Thus, if A, B, C, and D are correct only A, B, and C will be displayed.
Frequency/Percentage:
For each possible response to the item, the frequency and percentage is printed. The frequency is the number of people who chose each response, while the percentage represents the percentage of the total group who chose each response. The "Omit" heading represents the number of students who did not answer the question.
Validity:
Validity is also known as the discrimination index and is calculated by subtracting the fraction of the lower scoring individuals who answered correctly from the fraction of upper scoring individuals who answered correctly. The range of possible values is –1.0 to 1.0.

If an item carries a high validity, it means that overall, high scoring individuals (i.e., those with high scores on the total test) answered the item correctly while low scoring individuals tended to miss the question. Therefore, a question with high validity has a high correlation with the total test score. If one considers the total test score to be a better indicator of a student’s knowledge, then the higher the relationship between the item and the total test, the more valid the item.

There are a number of factors to consider when examining an item’s validity. In contrast to standardized entrance exams, classroom tests often contain some items that discriminate poorly. For example, it may be an instructor’s intention to begin a test with several easy items in order to put students at ease or to establish a baseline. In cases where everyone answers a question correctly, the item validity is zero. However, it may be desirable to keep the item anyway.

A high negative validity indicates that there is something definitely wrong–either there is something wrong with the item, such as an ambiguous distracter, or the item has been keyed incorrectly. In the case of a zero or very low negative validity (e.g., -.10), the item may be very easy (a difficulty close to 1.0) or very difficult with even a few good students getting the item wrong. It may also be due to random guessing.
Difficulty:
This is the proportion of the entire class who answered the question correctly. This feature is currently unavailable with questions that have multiple correct answers, but the difficulty can be easily figured by adding up the percentages of those students who answered correctly.
PBCorr:
Point-biserial correlation coefficient is another measure of the relationship between the score on the item and the score on the test. The value of this statistic ranges from -1.00 to 1.00. A high positive value indicates that those who answered the item correctly also received higher scores on the test than those who answered the item incorrectly. A high negative value indicates that those who answered the item correctly received low scores on the test and those who answered the item incorrectly did well on the test. A near zero value indicates that there is little relationship between the score on the item and the score on the test. It is desirable to retain items with a high positive correlation coefficient and to eliminate those with near zero or negative values. As a rough guide, it is suggested that items with negative or near zero correlations be eliminated or substantially revised and those with low positive correlations be studied to determine how improvement might be accomplished.
n:
This gives the total number of students who answered the question.

Return to Statistics and Printout Questions

What information is included in the long-form analysis?

The long-form item analysis gives information for the overall class and covers the information listed below. The long-form analysis shows almost all the same information as the short-form analysis, but with a different layout and with some additional information.
Item from Long Form Analysis

Item:
This refers to the question number on the answer sheet, which should correspond to the question number on the test.
Frequency:
Frequency is the number of people who chose a particular answer.
%:
This symbol stands for the percentage of the total group who chose each response.
Correct:
The correct response as entered on the key is shown in this column. When multiple correct responses are marked on the key, up to three answers will be displayed. Thus, if A, B, C, and D are correct only A, B, and C will be displayed.
Difficulty:
This is the proportion of the entire class who answered the question correctly. This feature is currently unavailable with questions that have multiple correct answers, but the difficulty can be easily figured by adding up the percentages of those students who answered correctly.
Validity:
Validity is also known as the discrimination index and is calculated by subtracting the fraction of the lower scoring individuals who answered correctly from the fraction of upper scoring individuals who answered correctly. The range of possible values is –1.0 to 1.0.

If an item carries a high validity, it means that overall, high scoring individuals (i.e., those with high scores on the total test) answered the item correctly while low scoring individuals tended to miss the question. Therefore, a question with high validity has a high correlation with the total test score. If one considers the total test score to be a better indicator of a student’s knowledge, then the higher the relationship between the item and the total test, the more valid the item.

There are a number of factors to consider when examining an item’s validity. In contrast to standardized entrance exams, classroom tests often contain some items that discriminate poorly. For example, it may be an instructor’s intention to begin a test with several easy items in order to put students at ease or to establish a baseline. In cases where everyone answers a question correctly, the item validity is zero. However, it may be desirable to keep the item anyway.

A high negative validity indicates that there is something definitely wrong–either there is something wrong with the item, such as an ambiguous distracter, or the item has been keyed incorrectly. In the case of a zero or very low negative validity (e.g., -.10), the item may be very easy (a difficulty close to 1.0) or very difficult with even a few good students getting the item wrong. It may also be due to random guessing.
Up:
"Up" refers to those students who scored in the upper 27% of the class distribution of test scores.
Mid:
"Mid" refers to those students who scored in the middle 46% of the class.
Lo:
"Lo" refers to those students who scored in the lower 27% of the class.
Total:
The total number of students who chose each alternative. It should match the Frequency column.
n:
This gives the total number of students who answered the question.

Return to Statistics and Printout Questions

What information is included in the difficulty index?

The difficulty index is a printout included in both the Long Form and Short Form Item Analysis options that displays the range of difficulty values over the entire test. The questions are grouped together based on their difficulty values to help you analyze how your test was handled.
Sample Difficulty Index

Number:
The number of items (i.e. questions) being discussed in the given row.
Percent:
The percentage of items that fall in this row.
Interval:
The difficulty index interval being considered. While the long-form and short-form analyses return the difficulty index as a decimal fraction between 0.0 and 1.0, here the difficulty is converted into a percentage, and the questions are group together in intervals of 5%. So a decimal range of .75 to .79 in the analysis printouts is written here as "75 to 79." The intervals are organized from higher ranking down to lower ranking.
Item Numbers:
Under this heading all the items that have a difficulty value falling in the given interval are listed. For example, items with difficulty values of .76, .78, and .75 would be grouped together in the interval "75 to 79."

Standardized normative tests such as the ACT and GRE require a difficulty level for each item of approximately .40 to .70. It is virtually impossible and often undesirable for classroom tests to adhere strictly to this requirement. For example, a few easy items, especially at the beginning of a test, often help students who suffer from test anxiety. A few very difficult items to determine the test "ceiling" may also be desirable.

Return to Statistics and Printout Questions

What information is included in the discrimination index?

The discrimination index is a printout included in both the Long Form and Short Form Item Analysis options that displays the range of point-biserial correlation coefficient values over the entire test. These coefficient values are shown within the short form item analysis.
Sample Discrimination Index

Number:
The number of question items being discussed in the given row.
Percent:
The percentage of questions that fall in this row.
Interval:
The point-biserial correlation coefficient index interval being considered. The index is organized into higher-ranking intervals down to lower-ranking intervals.
Item Numbers:
Under this heading all the questions that have a point-biserial correlation coefficient value falling in the given interval are listed. For example, questions with correlation coefficients of .67, .69, and .65 would be group together in the interval "0.65 to 0.69."

Although it is always desirable to have positive discrimination indices, it is possible to produce negative ones. This happens when more low-scoring than high-scoring students answer an item correctly. Usually these items should be discarded or at least modified before they are used again. Several aspects of an item should be examined carefully when negative item validity occurs:
  • Is there more than one correct alternative?
  • Is there something ambiguous in the item or alternative that leads the better scoring students to an incorrect response?
  • Are students answering randomly because of not studying or lack of information?
  • Is it a very easy item on which one or two students has erred?
  • Is the item keyed correctly?
Standardized tests use validity coefficients of .40 to 1.0 as the general criterion for keeping items. The standards for classroom tests are less.

Return to Statistics and Printout Questions

What is the information at the bottom of the item analysis?

A summary of test statistics is located at the bottom of any page of the item analysis (short-form or long-form) and includes the following information:
Item Analysis footer

Filename:
The name of the file at Opscan that stores the information. This is rarely useful for your purposes but will help us should a problem arise and we need to look at the data again.
Number of Students:
The number of tests scanned and used to generate the statistics.
Lo/Hi Score:
This shows the lowest and highest scores achieved by any students on the test.
Test Mean:
The test mean is the average of test scores; i.e., it is the sum of the test scores divided by the number of students who took the exam.
Standard Deviation:
Standard deviation gives you a measure of the spread of scores around the test mean. It is a calculation of the average distance of any score to the mean.
Standard Error of Measurement:
Standard error is a way of determining how well the test was able to reflect the knowledge and ability of the students. For a large class (more than 30), the standard error of measurement can be interpreted as follows: If a person obtains a test score of 50 and the standard error is 3, there is a 68% chance that his or her true score lies between 47 and 53 (50 ± 3) and there is a 95% chance that his or her true score lies between 44 and 56 (50 ± 6). The larger the standard error, the greater the chance that a student’s obtained score does not reflect his or her true ability.
Reliability Coefficient (KR20):

Reliability describes the extent to which the test scores can be depended on to provide an actual measurement of the students’ abilities and knowledge. The Kuder-Richardson formula (KR20) is one such coefficient that measures reliability. The reliability coefficient ranges from 0.0 to 1.0. The closer the coefficient is to 0, the less of a relationship exists between the test scores and the students’ true abilities.* In other words, a score close to 0 means the scores for the test are random and don’t accurately reflect the student’s knowledge. The closer the coefficient is to 1, the more the obtained score reflects the student’s actual knowledge.

In determining acceptable levels of reliability, several factors must be considered:

  • Test length. Long tests are more reliable than short tests.
  • Item difficulty. Very easy and/or very difficult items reduce reliability.
  • Range of talent. Classes containing students with wider ranges in ability (and tests constructed to reflect the range) will result in higher test reliability coefficients.
  • Similarity of item content. Tests that are constructed with items that measure the same content will have higher reliability than those that measure different content areas.

* Gilbert Sax, Principles of Educational Measurement and Evaluation (Belmont, CA: Wadsworth, 1974), 174.

Return to Statistics and Printout Questions

What kind of information is on the student test responses report?

The student test responses report lists all the students, their social security numbers, their scores, and a compact printout of their chosen answers on the test.
Sample Student Test Responses

The boxes under each student's name represents a list of ten questions on the test. Correct answers are only shown with a dash (-); you only see the incorrect answers the student chose. For example, for Student01, in the first box you see his responses for questions 1 through 10. He got seven correct answers but missed question 3, where he chose response D; question 7, where he chose response E; and question 9, where he chose response B. In the next box to the right, we see his responses for questions 11 through 20, where he only got five questions correct.

Return to Statistics and Printout Questions

What kind of information is on the individual student feedback report?

The individual student feedback output is a sheet of paper for each student who took the test. The header gives information about the test if the instructor filled in this information on the answer key. Below that is the student's name, University ID number, the number of correct items, and their weighted score. The weighted score reflects the number of points each question was worth, as determined by the instructor, while the number right reflects a simple count of correct answers.
Sample Individual Student Feedback

For each item (i.e. question), the correct response is listed along with the student's response. If the student's response was not the correct answer, then a dollar sign ($) is listed next to it. If the student filled in more than one response bubble, an asterisk (*) is shown as their response.

Return to Statistics and Printout Questions