November 18, 2021
Assessments are one of the primary ways educators can gain insight into student learning. These assessments may take the form of quizzes, tests, midterms, or final exams.
Assessments also provide a key way for teachers to address students’ needs, by showing education or performance gaps. Ideally, assessment data will help educators to figure out which topics students are struggling with and—more importantly—why.
Test item analysis is a way to help teachers gain more insight into their students’ learning. But first, you must start by understanding what item analysis involves.
Let’s delve into the specifics of item analysis for teachers and explore its components.
What is test item analysis?
Item analysis is the act of analyzing responses to individual test questions, or items, to make sure that their difficulty level is appropriate. This means that the items discriminate well between students of different performance levels. Item analysis also involves looking deeper into other metrics of the test items, as I’ll explain below.
Item analysis is crucial to upholding both the fairness and effectiveness of tests. And while it’s often something teachers do unconsciously, formalizing the process and laying out the method to it provides a way to uphold academic integrity and improve assessments.
Why do we need item analysis?
Item analysis helps teachers examine assessments and figure out if they’re a good measure for testing their students. For example, if a test is too difficult or too easy for a group of students, then administering the assessment is a waste of time and doesn’t aid us in the measurement of student learning.
The frequent use of item analysis also allows teachers to measure assessments and figure out where any learning gaps may be present. Teachers can then provide the right instruction and support to target and bridge those gaps, as I mentioned earlier.
The 4 components of item analysis
The four components of test item analysis are item difficulty, item discrimination, item distractors, and response frequency. Let’s look at each of these factors and how they help teachers to further understand test quality.
#1: Item difficulty
The first thing we can look at in terms of item analysis is item difficulty. Item difficulty is a percentage of students scoring correctly on any one test item. As a rule of thumb, we’re looking for at least 20% of students to score correctly. If we have fewer than 20% of students scoring correctly on the item, it is likely too difficult.
At the same time, if we have more than 80% of students scoring correctly on the item, that item might be too easy. However, in some situations this might be okay.
For example, on a mastery test, we can expect a lot of items to be easy because a majority of students will have mastered the material. This is opposed to a pretest, where we can expect most of the items to be difficult, because the students have not yet been taught the material.
If there’s a test item that no students answer correctly, the reliability factor decreases sharply. (In other words, we learn that the item is far too difficult for them, but we do not gain any insight into what the students do know.) In contrast, when students give the right answers, it helps teachers track how knowledgeable the students are in any given subject.
Insights to move learning forward
Discover Renaissance assessments that help you to pinpoint every student’s needs.
#2: Item discrimination
The second component we can examine is item discrimination. In other words, how well does the item discriminate between students who performed higher and lower on a particular test?
Here, we look at how well the students scored on the assessment as a whole and how well the students scored on any given item. Are the students who performed higher on the assessment generally answering the item correctly? Are students who performed lower on the assessment generally answering the item incorrectly?
With item discrimination, you’re comparing the number of correct answers to the total test score numbers. Discrimination examines one question at a time and compares high-scoring students’ answers to those of low-scoring students to see which group answered which items correctly.
The overall point of item discrimination is to confirm that individual exam questions differentiate between the students who understand the material and those who don’t.
#3: Item distractors
Within item analysis, we usually use item distractors for assessments with multiple-choice questions. We need to understand if the answer choices appropriately “distract” students taking the test from the correct answer.
For example, suppose there is a multiple-choice question with four possible answers—but two of the answers are clearly incorrect and are easy for students to eliminate from consideration. So, instead of having a 25% chance of getting the answer right by guessing, students now have a 50/50 chance, given that only two of the four answer choices are plausible.
Bad item distractors are those that are obviously not correct, so they are far less effective for assessing student knowledge than if they were more cleverly disguised.
Effective item distractors force students to focus on critical thinking to help them answer the question. For this reason, effective distractors will usually attract more students with a lower overall score than those who score higher on the test.
#4: Response frequency
Once we look at item difficulty, item discrimination, and item distractors and have cleared potential flags, it’s important for us to look at the final component: response frequency.
For items such as multiple choice, multiple select, or those that have Part A and Part B, it’s crucial to examine which responses students are choosing. If they’re not choosing the correct answer, what are some of the options they’re selecting and why?
Let’s say the correct answer to a particular item is option C, but most of the students are choosing a distractor, option B. We need to look at this specific distractor and try to figure out the common misconception. In other words, why are students choosing that particular response? What makes this response appear to be correct?
Looking at response frequency—as well as the other item analysis components listed above—and noting the pattern of student errors can give teachers feedback on how effective a test is and provide support for designing future assessments.
How Renaissance provides high-quality test items for every classroom
Renaissance’s DnA platform includes a high-quality item bank and collection of pre-built assessments created by experts who ensure the content is accurately aligned to state standards and yields results that educators can use to drive instruction.
All of our item bank content undergoes a continuous evaluation process that uses psychometric item analysis to ensure test items are performing as expected. In other words, we’ve already done the work for you!
DnA offers more than 80,000 items in core subject areas, along with a wealth of reporting—including item distractor reports to help you identify learning disconnects and guide appropriate feedback.