Development and assessment of tests for education: april 2020

To gain more insight into how a test can measure someone’s knowledge it is useful to analyse the relationship between the questions of a test and the ability of respondents. This can be done with Item Response Theory (IRT). IRT represents the relationship between items in a test and the latent traits (e.g. someone’s ability in math). There are different models to describe this relationship and for this assignment three were examined, namely the Rasch, 2PL (parameter logistic), and 3PL model. The Rash model only takes the difficulty of the items into account. The 2PL model adds discrimination and the 3PL model considers difficulty, discrimination, and guessing. To learn how to find a fitting model for data, a Graduate Management Admission Test (GMAT) dataset of ShinyItemAnalysis was used (https://shiny.cs.cas.cz/ShinyItemAnalysis/).

At this website firstly, at the Data tab the GMAT2 dataset was loaded. Secondly, at the IRT tab, the subtab ‘Rasch model’ was selected. At this page, the item characteristic curves, item information curves, test information function, table of estimated parameters, ability estimates, scatter plot of factor scores and standardized total scores, and wright map are shown. This page was inspected to learn about the characteristics of the items. The same was done for the 2PL and 3PL models. See table 1 -3 for the estimated item parameters of the various models.

Lastly, the subtab ‘model comparison’ was selected to view the comparison of ShinyItemAnalysis of the three models that were taken a closer look at. This page shows a table of comparison statistics (see figure 1) in which the models are compared, and the best-fitting model is shown. In four out of five cases the table indicates that the 2PL model has the best fit with the data based on the comparison statistics of ShinyItemAnalysis. So, the 2PL model fits the data best. As said before, difficulty and discrimination are the two parameters of the 2PL model. Discrimination is defined as how well an item able to differentiate between people with higher and lower ability than the difficulty of the item. In figure 2, the item characteristic curves (ICC’s) are shown of the 20 items when they are analysed with the 2PL model. The ICC’s show the relationship between the difficulty in the various items and the chance a person answers the items correctly. The steeper the graph in the ICC, the more an item discriminates and is thus more informative.

Table 1

Item parameters for the Rasch model

	a	SE(a)	b	SE(b)	c	SE(c)
Item 1	1.00	-	-0.11	0.07	0.00	-
Item 2	1.00	-	-0.39	0.07	0.00	-
Item 3	1.00	-	-0.93	0.07	0.00	-
Item 4	1.00	-	-1.31	0.08	0.00	-
Item 5	1.00	-	-1.49	0.08	0.00	-
Item 6	1.00	-	-0.58	0.07	0.00	-
Item 7	1.00	-	-0.65	0.07	0.00	-
Item 8	1.00	-	-0.51	0.07	0.00	-
Item 9	1.00	-	-0.32	0.07	0.00	-
Item 10	1.00	-	-0.15	0.07	0.00	-
Item 11	1.00	-	-0.75	0.07	0.00	-
Item 12	1.00	-	-0.41	0.07	0.00	-
Item 13	1.00	-	-1.26	0.08	0.00	-
Item 14	1.00	-	0.34	0.07	0.00	-
Item 15	1.00	-	0.00	0.07	0.00	-
Item 16	1.00	-	0.29	0.07	0.00	-
Item 17	1.00	-	0.17	0.07	0.00	-
Item 18	1.00	-	0.51	0.07	0.00	-
Item 19	1.00	-	0.09	0.07	0.00	-
Item 20	1.00	-	0.23	0.07	0.00	-

Table 2

Item parameters for the 2PL model

	a	SE(a)	b	SE(b)	c	SE(c)
Item 1	0.70	0.11	-0.17	0.10	0.00	-
Item 2	0.82	0.12	-0.51	0.10	0.00	-
Item 3	0.25	0.10	-3.58	1.36	0.00	-
Item 4	0.41	0.11	-3.15	0.8	0.00	-
Item 5	0.67	0.12	-2.29	0.38	0.00	-
Item 6	0.69	0.11	-0.87	0.15	0.00	-
Item 7	0.44	0.10	-1.45	0.33	0.00	-
Item 8	0.49	0.10	-1.02	0.23	0.00	-
Item 9	0.23	0.09	-1.32	0.56	0.00	-
Item 10	0.44	0.09	-0.34	0.17	0.00	-
Item 11	0.49	0.10	-1.52	0.32	0.00	-
Item 12	0.35	0.09	-1.14	0.34	0.00	-
Item 13	0.46	0.11	-2.73	0.62	0.00	-
Item 14	0.76	0.11	0.47	0.11	0.00	-
Item 15	0.47	0.09	0.01	0.14	0.00	-
Item 16	0.75	0.11	0.41	0.11	0.00	-
Item 17	0.29	0.09	0.59	0.28	0.00	-
Item 18	1.03	0.13	0.56	0.09	0.00	-
Item 19	0.74	0.11	0.13	0.10	0.00	-
Item 20	0.32	0.09	0.70	0.28	0.00	-

Table 3

Item parameters for the 3PL model

	a	SE(a)	b	SE(b)	c	SE(c)
Item 1	0.86	0.40	0.28	0.85	0.14	2.22
Item 2	0.82	0.12	-0.50	0.13	0.00	10.24
Item 3	0.83	0.87	1.69	0.76	0.62	0.51
Item 4	1.42	1.19	1.01	0.52	0.70	0.37
Item 5	0.66	0.13	-2.30	0.47	0.01	10.46
Item 6	1.17	0.61	0.36	0.67	0.37	0.81
Item 7	0.45	0.15	-1.28	1.76	0.04	10.57
Item 8	0.52	0.17	-0.86	1.50	0.03	11.29
Item 9	0.60	0.62	2.04	0.97	0.44	0.77
Item 10	1.25	0.84	1.37	0.32	0.41	0.40
Item 11	0.80	0.66	0.10	2.00	0.36	1.93
Item 12	0.35	0.10	-1.09	0.70	0.01	10.82
Item 13	0.47	0.12	-2.59	1.31	0.03	10.58
Item 14	1.07	0.46	0.88	0.36	0.15	1.05
Item 15	0.53	1.09	0.39	6.63	0.09	19.26
Item 16	0.74	0.11	0.43	0.14	0.00	10.41
Item 17	0.32	0.11	0.66	1.21	0.02	10.08
Item 18	2.84	1.57	0.95	0.12	0.22	0.31
Item 19	2.11	1.02	0.96	0.17	0.32	0.29
Item 20	2.62	1.92	1.72	0.22	0.40	0.12

Figure 1. Screenshot of the comparison statistics table.

Figure 2. Item characteristic curves, 2PL model.

Development and assessment of tests for education

vrijdag 24 april 2020

Analysing data with item response theory

Defining educational measurement and describing its innovations and future

Zoeken in deze blog