Quality of tests need to be assessed before they can
be used to test knowledge or skills of the candidates. The RCEC review system
is an analytical review system that is developed to evaluate the quality of
educational exams (‘The RCEC review system for the quality of tests and exams’,
n.d.). This system has six criteria that together make up the substantive and
organizational aspect and the psychometric aspect. Purpose and use, test and
examination material, and test administration and security combined form the
first aspect. Representativeness, reliability, and standard setting and
maintenance form the second aspect. To measure if the six criteria are met, questions
are answered with either ‘insufficient’, ‘sufficient’, or ‘good’, this gives
respectively a score of 1, 2, or 3 to the question. At the end of the questions
for each criterium is checked whether enough points are gathered to be able to
say if that criterium is met. In this post, a few questions of the criteria for
the psychometric aspect are answered with data that was provided. This data
contains the analysis of the test and items. It was a test for group 7 (grade
5), 199 students participated, and the test consisted of 40 items.
For criterium 3, representativeness, the question 3.2
‘Is the degree of difficulty of the items and/or the actions adjusted to the
intended target population?’ was selected. To be able to answer this question
with sufficient, 75% - 90% of the items should have a p-value >0.20 and
≤0.80. If the percentage is lower than 75, the question is marked insufficient
and if the percentage is higher than 90, the question is marked as good. When
looking at this data, less than 75% of the items has a p-value between 0.20 and
0.80. Therefore, this question must be answered with insufficient and gets a
score of 1.
For criterium 4, reliability, the questions 4.2 and
4.3 were selected. Question 4.2 ‘Is the reliability of the test correctly
calculated?’ is answered by the number of candidates used for the calculation
of the reliability. At least 200 candidates should be used for the calculation however,
in this data, only 199 candidates took the test. Therefore, the answer to this
question is insufficient and gets a score of 1. If there would have been 200
candidates, the score would have gone up to sufficient. Additionally, there was
an objective scoring system, established in question 2.9 (criterium 2, question
9), therefore the score would go to good. So, with at least one extra candidate,
the score of this question would go from 1 to 3.
The second question in this criterium is 4.3, ‘Is the
reliability sufficient, considering the decisions that have to be based on the
test?’. To answer this question is looked at the reliability score. A
reliability between ≥0.80 and <0.90 is considered sufficient. Lower than
0.80 is insufficient and higher than 0.90 is good. In this data, the
coefficient alpha is only 68%, therefore also this question is answered with
insufficient and gets a score of 1.
For criterium 5, standard setting and maintenance, the
questions 5.1, 5.2a, and 5.2c were selected. Question 5.1 is ‘Are norms/
standards/ cut-off scores provided?’. So, either these are/ one of these is
given or not. The data shows that the Angoff method is used and the cut-off
score has been set. So, this question can be marked as good and gets a score of
3.
The second question is 5.2 ‘Has the standard setting
been carried out correctly?’, which is divided in three sub questions. However,
only sub question a and c will be discussed.
Sub question a is ‘Has the
standard setting method been carried out correctly?’. To answer this question professional
consideration or argumentation to support the decision for the cut-off score
needs to be considered. The Angoff method was used to set the cut-off score and
seems to be carried out correctly, however, the reasoning and support of the
experts is missing. Therefore, this question is answered as sufficient and gets
a score of 2.
Sub question c is ‘Is there
sufficient agreement between the qualified experts?’. Sufficient agreement is
between 0.60 and 0.80. In this data, the agreement between the qualified
experts is 89% which means that this question can be answered with good and
thus gets a score of 3.
In summary, the review has strict rules with which the
quality evaluation is executed. However, it is not always as straightforward as
it might seem, for example, look at criterium 4. Most importantly, no
conclusion can be drawn from answering a few questions since all questions must
be answered to produce a reliable evaluation of the quality of the test.
Reference
The RCEC review system
for the quality of tests and exams. (n.d.). Retrieved 18 May 2020, from
https://www.rcec.nl/en/review-system/
Hi Birgit,
BeantwoordenVerwijderenI liked reading your blog. I especially liked that you nuanced the conclusion of sub question 4.2 and your conclusion at the end. I agree with all of your conclusions and I think that you showed a great understanding of this topic. I believe that you could improve your blog by using headings for your paragraphs and by providing the reader with the exact values for the items within and outside the difficulty range of 0.2 and 0.80 and the cut-off score for the test. Overall, well done!
Kind regards,
Annelies
From dr. Arnold Brouwer: Thank you for your contribution. I think it is a well written presentation including a strong analysis. Well done!
BeantwoordenVerwijderenHi Birgit,
BeantwoordenVerwijderenI think that through your blog you show a good understanding of quality assessment using the RCEC system. You have mentioned every aspect without elaborating on it in depth, which I think is a good quality, because you are able to only mention the essential points. Overall I therefore really liked your blog! If I have to name an area for improvement it is maybe the lay-out of your blog, because I think adding soms sub-headings for instance will help the reader to easy go through the text.
Silke
Hi Birgit,
BeantwoordenVerwijderenI agree with the other comments: you showed a good understandig of quality assessment using the RCEC system. Furthermore, you used APA well, and added a short summary; which i liked!
I also agree with Silke, adding sub-headings would improve the readability of your blog.
Greetings,
Sjanne
Hi Birgit,
BeantwoordenVerwijderenI enjoyed reading your blog. Very precise.