Development and assessment of tests for education: Assessing quality of items and tests

Quality of tests need to be assessed before they can be used to test knowledge or skills of the candidates. The RCEC review system is an analytical review system that is developed to evaluate the quality of educational exams (‘The RCEC review system for the quality of tests and exams’, n.d.). This system has six criteria that together make up the substantive and organizational aspect and the psychometric aspect. Purpose and use, test and examination material, and test administration and security combined form the first aspect. Representativeness, reliability, and standard setting and maintenance form the second aspect. To measure if the six criteria are met, questions are answered with either ‘insufficient’, ‘sufficient’, or ‘good’, this gives respectively a score of 1, 2, or 3 to the question. At the end of the questions for each criterium is checked whether enough points are gathered to be able to say if that criterium is met. In this post, a few questions of the criteria for the psychometric aspect are answered with data that was provided. This data contains the analysis of the test and items. It was a test for group 7 (grade 5), 199 students participated, and the test consisted of 40 items.

For criterium 3, representativeness, the question 3.2 ‘Is the degree of difficulty of the items and/or the actions adjusted to the intended target population?’ was selected. To be able to answer this question with sufficient, 75% - 90% of the items should have a p-value >0.20 and ≤0.80. If the percentage is lower than 75, the question is marked insufficient and if the percentage is higher than 90, the question is marked as good. When looking at this data, less than 75% of the items has a p-value between 0.20 and 0.80. Therefore, this question must be answered with insufficient and gets a score of 1.

For criterium 4, reliability, the questions 4.2 and 4.3 were selected. Question 4.2 ‘Is the reliability of the test correctly calculated?’ is answered by the number of candidates used for the calculation of the reliability. At least 200 candidates should be used for the calculation however, in this data, only 199 candidates took the test. Therefore, the answer to this question is insufficient and gets a score of 1. If there would have been 200 candidates, the score would have gone up to sufficient. Additionally, there was an objective scoring system, established in question 2.9 (criterium 2, question 9), therefore the score would go to good. So, with at least one extra candidate, the score of this question would go from 1 to 3.

The second question in this criterium is 4.3, ‘Is the reliability sufficient, considering the decisions that have to be based on the test?’. To answer this question is looked at the reliability score. A reliability between ≥0.80 and <0.90 is considered sufficient. Lower than 0.80 is insufficient and higher than 0.90 is good. In this data, the coefficient alpha is only 68%, therefore also this question is answered with insufficient and gets a score of 1.

For criterium 5, standard setting and maintenance, the questions 5.1, 5.2a, and 5.2c were selected. Question 5.1 is ‘Are norms/ standards/ cut-off scores provided?’. So, either these are/ one of these is given or not. The data shows that the Angoff method is used and the cut-off score has been set. So, this question can be marked as good and gets a score of 3.

The second question is 5.2 ‘Has the standard setting been carried out correctly?’, which is divided in three sub questions. However, only sub question a and c will be discussed.

Sub question a is ‘Has the standard setting method been carried out correctly?’. To answer this question professional consideration or argumentation to support the decision for the cut-off score needs to be considered. The Angoff method was used to set the cut-off score and seems to be carried out correctly, however, the reasoning and support of the experts is missing. Therefore, this question is answered as sufficient and gets a score of 2.

Sub question c is ‘Is there sufficient agreement between the qualified experts?’. Sufficient agreement is between 0.60 and 0.80. In this data, the agreement between the qualified experts is 89% which means that this question can be answered with good and thus gets a score of 3.

In summary, the review has strict rules with which the quality evaluation is executed. However, it is not always as straightforward as it might seem, for example, look at criterium 4. Most importantly, no conclusion can be drawn from answering a few questions since all questions must be answered to produce a reliable evaluation of the quality of the test.

Reference

The RCEC review system for the quality of tests and exams. (n.d.). Retrieved 18 May 2020, from https://www.rcec.nl/en/review-system/

5 opmerkingen:

Anoniem19 mei 2020 om 09:14
Hi Birgit,

I liked reading your blog. I especially liked that you nuanced the conclusion of sub question 4.2 and your conclusion at the end. I agree with all of your conclusions and I think that you showed a great understanding of this topic. I believe that you could improve your blog by using headings for your paragraphs and by providing the reader with the exact values for the items within and outside the difficulty range of 0.2 and 0.80 and the cut-off score for the test. Overall, well done!

Kind regards,
Annelies
BeantwoordenVerwijderen
Reacties
Maaike20 mei 2020 om 13:53
From dr. Arnold Brouwer: Thank you for your contribution. I think it is a well written presentation including a strong analysis. Well done!
BeantwoordenVerwijderen
Reacties
Anoniem20 mei 2020 om 19:02
Hi Birgit,

I think that through your blog you show a good understanding of quality assessment using the RCEC system. You have mentioned every aspect without elaborating on it in depth, which I think is a good quality, because you are able to only mention the essential points. Overall I therefore really liked your blog! If I have to name an area for improvement it is maybe the lay-out of your blog, because I think adding soms sub-headings for instance will help the reader to easy go through the text.

Silke
BeantwoordenVerwijderen
Reacties
Anoniem24 mei 2020 om 16:31
Hi Birgit,

I agree with the other comments: you showed a good understandig of quality assessment using the RCEC system. Furthermore, you used APA well, and added a short summary; which i liked!
I also agree with Silke, adding sub-headings would improve the readability of your blog.

Greetings,
Sjanne
BeantwoordenVerwijderen
Reacties
Anny Rey Naizaque25 mei 2020 om 22:02
Hi Birgit,
I enjoyed reading your blog. Very precise.
BeantwoordenVerwijderen
Reacties

Reactie toevoegen

Development and assessment of tests for education

maandag 18 mei 2020

Assessing quality of items and tests

5 opmerkingen:

Defining educational measurement and describing its innovations and future

Zoeken in deze blog