PISA – Programme
for International Student Assessment
The Organisation for Economic
Co-operation and Development (OECD) commissions the PISA study every three
years (“PISA,” n.d.). This is an
international test in which the skills and level of 15- year olds regarding
reading, mathematics, and science are tested (“PISA,” n.d.). Students all
over the world participate in this study, which makes it possible to compare
the (quality of) education of different countries and track the development of education (Feskens, 2020). Every year, there is one main topic which means
there are more items on either reading, mathematics, or science (“PISA (Programma for International Student
Assessment,” n.d.). The test
usually has only a few items since it is a low stakes test, no personal
consequences based on the result, and to keep the motivation and attention
high. Next to the test part, there is also a survey which is filled out by the
parents of the students taking part in the test to learn about the background
of the test takers (Feskens, 2020).
On the results that were gathered in 2018, a few analyses
were done. Firstly, the data were loaded, then classical test theory was performed
to get the results of the item and test statistics, the results of the
Netherlands and Germany were compared, and finally the performance of all
countries were compared.
Loading data in R
Before any analysis can be
done, the data need to be loaded in the program. The program that is used to
analyse the data is Rstudio. Figure 1 shows how the necessary libraries are loaded, the
working directory is set, the files are given a variable name so that they can
more easily be referred to. Lastly, a dexter project is started with the scoring
rules and item responses that are needed for the analysis. Additionally, previews
are shown of the data file and the items responses.
Figure 1. Code showing data being loaded in
RStudio
CTT analysis
The classic test theory
analysis is a first framework used to analyse test data (Feskens, 2020). Test
and item statistics are shown in Figure 2. These statistics show the number of items (nItems),
the alpha value, the mean p-, rit-, and rir-value, the maximum test score, and
the number of responses (N). The test statistics show the average values of the
test, while the item statistics show the values for each item in the data set. The
Cronbach’s alpha shows the reliability of the test and a value of 0.83 means
that the test is reliable.
Comparing results
The data set can be analysed
in general, however, also specific countries can be analysed separately or
compared. In this example, the test statistics of the Netherlands and Germany are
compared. As shown in Figure 3, there is a difference in the alpha value between the
Netherlands and Germany. The alpha value for the Netherlands is 0.82, the alpha
value for Germany is 0.67. This means that the results from the Netherlands are
more reliable than the results from Germany.
Comparing performances
To make a ranking of the performances
of all countries that participated in the PISA study, the test scores need to
be compared. Figure 4 shows a part of the individual test scores of this
PISA study. Figure 5 shows a part of the test scores per country and Figure 6 shows a graph of the test scores of all countries
that participated in this PISA study. This last graph shows that Japan
performed best.
Figure 4. Individual test scores |
Figure 5. Test score per country (partly)
Figure 6. Test scores of all countries
References
Feskens, R. (2020, June
8). Programme for International Student Assessment [Slides]. Retrieved from
https://canvas.utwente.nl/courses/5049/pages/pisa?module_item_id=148084
PISA. (n.d.). Retrieved June 15, 2020, from
https://www.oecd.org/pisa/
PISA (Programma for International Student Assessment. (n.d.).
Retrieved June 12, 2020, from
https://www.cito.nl/kennis-en-innovatie/onderzoek/in-opdracht/internationaal-pisa/
Hi Birgit,
BeantwoordenVerwijderenI really liked reading your blogpost. I thought it was succinct and shows your understanding of the topic well. A possible area of area improvement in Exercise B could be elaborating on what the different test and item statistics shown mean (Rit, Rir, p).
Regards,
Niveditha
Hi Birgit,
BeantwoordenVerwijderenthis is very great post! It clearly shows that you understand the topic well and you know how to work in R. I personally appreciate that you included all the codes and computed results in your post.
Well done!
Hi Birgit,
VerwijderenI think you've written a nice and clear blog. You showed that you managed to do the exercises through Rstudio. The figures were a nice addition. As a tip: next time try to explain the different values of the CTT analysis more.
Greetings
Kim
I enjoyed reading your post. It was straight to the point and you gave an indication that you mastered the art of conducting analysis on large scale within the CTT framework using R. Good job.
BeantwoordenVerwijderenHi Birgit,
BeantwoordenVerwijderenYou've written a nice blog where you show insight.
I see that we end up with the same p-value, ride and rir values, only I see that we end up with a different country in the best scoring countries. Because I found the program RStudio itself very difficult to handle, I dare not say if this is correct.
A small detail: you have visualized the scores of each country nicely. However, this is very small and difficult to see. You could use text to name the lowest and highest scores.
In general: nice blog!
Kind regards,
Sjanne