Introducing methods for standard setting
Tests are made and taken to see how well students
understand a certain topic. When developing a test, a lot of decisions need to
be made. All these decisions combined are called the standard setting process,
in which is determined how well someone has to perform on a test to pass that
test. It also includes setting performance standards, making exam questions, and
selecting a method for setting a cut-score. The cut-sore represents the least number
of items that need to be answered correctly to pass the test. In this post will
be focused on explaining and comparing different methods to set a cut-score.
There
are a few methods for standard setting, and they can be divided in three
subgroups, norm-referenced, criterion-referenced, and mixed. In norm-referenced
methods students are compared to each other, while criterion-referenced methods
are chosen when the student needs a certain level of knowledge or skills to be
able to pass the test (Ertoprak & Dogan,
2016). It is also possible to use a mix of those methods.
The
passing percentage and Cohen’s method are examples of norm-referenced methods.
The passing percentage method can be used when there is a desired percentage of
students who need to pass. This can be due to selection or limited places available.
A reason not to use this method is because the content and quality are not considered
when deciding if people passed. Even if the test were made badly, people would
still be able to pass. The Cohen’s method is similar; however, the best
performing student is used as reference and 60-65% of that score is used to
determine the cut-score. The advantage is that student ability across exams is
more stable than panelist rating, additionally panelist ratings can be too
expensive. On the other hand, it might be that student ability across exams
fluctuates too much.
The linear
transformation and expert panels are examples of criterion-referenced methods. The
linear transformation draws a straight line between the guess-score and the
maximum point that can be obtained. This method can be used when there are no
differences in difficulty between exams, guessing score, and maximum score. However,
if there are differences, this method cannot be used. The second method entails
expert panels, this is also called Angoff-method. Around ten panelists estimate
the probability that a minimal competent student answers an item correctly.
They do this for all items on a test. With those experts setting the cut-score,
criterium reference, quality assurance, and minimal competence are clear.
Additionally, professionals are engaged in the process. Since this is a time
and money consuming process it might not always be the best method to choose.
Lastly,
the Hofstee-method is a combination of norm- and criterion-referenced methods.
Experts decide on an acceptable passing sore, they set the minimum and the maximum
failure rate and the minimum and maximum passing score. This is done to control
for extreme failure rates by critical panelists. A reason not to use this
method is that the ability of examinees across exams can differ a lot.
Analysing methods for setting the passing percentage
and cut-score
The analysis will be performed with data from a high-stakes
Mathematics exam to find similarities and differences between the previously
discussed methods. The outcomes and comparisons are discussed below. A few details
of the data: students could obtain 66 points in the exam, the guess-score was
16.5, and 1945 students took the exam.
Looking
at all methods, students had to answer between 40 and 44 items correctly to
pass the test. The passing percentage method shows that if 51% of the students
should pass, the cut-score should be set at 41. To let 57% of the students
pass, the cut-score should be 40. The linear method shows that students had to answer
41.5 items correctly to get the passing grade of 5.5. Since it is not possible
to get this score, the cut-score should be set to 41 or 42. A score of 41
results in a 5.0, while answering 42 items correctly results in a 6.0. There is
a big difference in how is decided if students pass the test when looking at
these two methods.
When setting
the cut-scores with the other methods, it becomes clear that there are less differences.
The graph of the Hofstee-method shows an intersection that provides a cut-score
of 41, in this case, 49% of the students will fail the test. Twelve panelists
performed the Angoff-method which resulted in a cut-score of 40. Lastly, the
Cohen’s method shows a cut-score of 41 and then 49,1% of the students failing. So,
the Hofstee- and Cohen’s method have similar results. The Angoff-method gives a
cut-score that is a little lower. If this cut-score would be used in the Hofstee-
and Cohen’s method, the percentage of students failing would lower to
respectively 43% and 43,2%.
Changing student ability, what happens to the passing
percentage and cut-score?
If the students who would take the same test would have
a lower ability level, there would be some changes in the passing percentage and
cut-score. Firstly, looking at the passing percentage method and linear
transformation method, there would be less items that need to be answered
correctly to pass the test. If still around 60% of the students need to pass
and they all score lower on the test, they will have to answer less than 40
items correctly. When looking at the linear transformation method, if the highest
score of one of the students is still 66, there would not be a difference in the
number of items that need to be answered correctly to pass the test. However, if
the highest score on that test would not be 66, the number of items that need
to be answered correctly to pass, would be lower. So, using the same exam in a
group in which student’s abilities are lower, the passing percentage could be
different depending on the method that is used and the highest score on the test.
Also
changes in the cut-score are dependent on the method that is used to set the
cut-score. The Hofstee-method provides a range in which the cut-score can lie,
the score will be different when the student’s performance is less because of
the change in the cumulative graph, not because the experts have set other
values for the minimum and maximum failure rate and passing score. If the students score lower on
the test, the cut-score will be lower. The cut-score determined by the Angoff-method
will be different because the probability of students answering the items
correct is considered. So, if the student’s ability is lower, the cut-score
will be lower. Lastly, the cut-score set by the Cohen’s method can be different
since it depends on the highest score in the group. So, the same holds as for
the linear transformation method. If the best performing student now scores
lower than in the previous group, the cut-score will also be lower. If the best
performing student performs equally well as in the previous group, the cut-score
will not change.
References
Ertoprak,
D. G., & Dogan, N. (2016). A research on the classification validity of the
decisions made according to norm and criterion-referenced assessment
approaches. Anthropologist, 23(3), 612–619.
https://doi.org/10.1080/09720073.2014.11891981
Hello Birgit,
BeantwoordenVerwijderenI enjoyed reading your blog. You show a good understanding of the topic. A quick one though, are you sure the cut-score will remain the same for lower ability students? The Angoff method is a criterion -referenced item orientation method. The experts set the cut-score using the item difficulty not the expected ability of the candidates. In my opinion, the cut-score for low ability students will remain same . What do you think?
Hi,
VerwijderenThe Angoff method is indeed criterion-referenced and experts set the cut-score using item difficulty, but they also take into consideration the probability of minimally performing students answering an item correctly. Therefore, if a group has a lower ability, the cut-score set by the experts will also be lower. On the other hand, if the experts believe that the minimally performing students in the lower ability group are not performing less than the minimally performing students in the first group, than there will not be a difference in the cut-score.
I hope this explanation adds to the comment I made in the post, which might have been missing the last part about the possibility that the cut-score also could stay the same.
I also like to add my comment on the Angoff method. As Atayo mentioned, I also believed that cut-score in Angoff method is set by considering the minimum competencies of a borderline candidate by experts. And specially this method is applied in high stake examinations like in medical examinations. Then the experts cannot change the minimum competencies required to pass the exam and get the certificate. So I also believe cut-score remains unchanged although the ability of students is low. Thank you for the good discussion point.
VerwijderenI have another question for open discussion. What do you think about the SS method which have higher control on passing rate among Cohen and Hofstee method. Of course percentage method has the highest capacity. My question among Cohen and Hofstee methods.
Chantha
BeantwoordenVerwijderenNice blog Birgit, i do however have a few questions about your blog. You state that the highest cut-score was equal to 44, for which method?
BeantwoordenVerwijderenyou also state that for the linear transformation does the maximum point change, when students' ability changes. But thats not trully correct, the maximum possible points stays the same, even when the ability changes.
And when you think a bit ahead: different methods result in different cut-scores, but what does it mean for the performance standard (in terms of the difficulty of the exam) if the cut-scores for the same exam are different?