Polytomous scoring correction and its effect on the model fit: A case of item response theory analysis utilizing R

Agus  Santoso; Timbul  Pardede; Ezi  Apino; Hasan  Djidu; Ibnu  Rafi; Munaya Nikma  Rosyada; Heri  Retnawati; Gulzhaina K.  Kassymova

doi:10.33292/petier.v5i1.148

Agus Santoso ⁽¹⁾, Timbul Pardede ⁽²⁾, Ezi Apino ⁽³⁾, Hasan Djidu ⁽⁴⁾, Ibnu Rafi ⁽⁵⁾, Munaya Nikma Rosyada ⁽⁶⁾, Heri Retnawati ⁽⁷⁾, Gulzhaina K. Kassymova ⁽⁸⁾

(1) Universitas Terbuka, Indonesia

(2) Universitas Terbuka, Indonesia

(3) Universitas Negeri Yogyakarta, Indonesia

(4) Universitas Sembilanbelas November Kolaka, Indonesia

(5) Universitas Negeri Yogyakarta, Indonesia

(6) Universitas Negeri Yogyakarta, Indonesia

(7) Universitas Negeri Yogyakarta, Indonesia

(8) Satbayev University; Abai Kazakh National Pedagogical University, Kazakhstan

Fulltext View | Download

Abstract:

In item response theory, the number of response categories used in polytomous scoring has an effect on the fit of the model used. When the initial scoring model yields unsatisfactory estimates, corrections to the initial scoring model need to be made. This exploratory descriptive study used response data from Take Home Exam (THE) participants in the Statistical Methods I course organized by the Open University, Indonesia, in 2022. The stages of data analysis include coding the rater’s score; analyzing frequency; analyze the fit of the model based on graded, partial, and generalized partial credit models; analyze the characteristic response function (CRF) curve; scoring correction (rescaling); and re-analyze the fit of the model. The fit of the model is based on the chi-square test and the root mean square error of approximation (RMSEA). All model fit analyzes were performed by using R. The results revealed that scoring corrections had an effect on model fit and that the partial credit model (PCM) produced the best item parameter estimates. All results and their implications for practice and future research are discussed.

References

Abal, F. J. P., Auné, S. E., Lozzia, G. S., & Attorresi, H. F. (2017). Funcionamiento de la categoría central en ítems de confianza para la matemática. Revista Evaluar, 17(2). https://doi.org/10.35670/1667-4545.v17.n2.18717

Abdelhamid, G. S. M., Bassiouni, M. G. A., & Gómez-Benito, J. (2021). Assessing cognitive abilities using the WAIS-IV: An item response theory approach. International Journal of Environmental Research and Public Health, 18(13), 6835. https://doi.org/10.3390/ijerph18136835

Auné, S. E., Abal, F. J. P., & Attorresi, H. F. (2020). Análisis psicométrico mediante la Teoría de la Respuesta al Ítem: modelización paso a paso de una Escala de Soledad. Ciencias Psicológicas, 14(1). https://doi.org/10.22235/cp.v14i1.2179

Brookhart, S. M., & Nitko, A. J. (2019). Educational assessment of students (8th ed.). Pearson.

Chalmers, R. P. (2012). mirt : A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6). https://doi.org/10.18637/jss.v048.i06

Hambleton, R. K., & Jones, R. W. (2005). An NCME instructional module on: Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.

Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4(2), 73–79. https://doi.org/10.1027/1614-2241.4.2.73

Masters, G. N. (1988). The analysis of partial credit scoring. Applied Measurement in Education, 1(4), 279–297. https://doi.org/10.1207/s15324818ame0104_2

Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206

Rafi, I., Retnawati, H., Apino, E., Hadiana, D., Lydiati, I., & Rosyada, M. N. (2023). What might be frequently overlooked is actually still beneficial: Learning from post national-standardized school examination. Pedagogical Research, 8(1), em0145. https://doi.org/10.29333/pr/12657

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Nuha Medika.

Retnawati, H. (2016). Analisis kuantitatif instrumen penelitian. Parama Publishing.

Retnawati, H., Hadi, S., Nugraha, A. C., Ramadhan, M. T., Apino, E., Djidu, H., Wulandari, N. F., & Sulistyaningsih, E. (2017). Menyusun laporan hasil asesmen pendidikan di sekolah: referensi untuk pendidik, mahasiswa, & praktisi pendidikan. UNY Press.

Retnawati, H., Kartowagiran, B., Arlinwibowo, J., & Sulistyaningsih, E. (2017). Why are the mathematics national examination items difficult and what is teachers’ strategy to overcome it? International Journal of Instruction, 10(103), 257–276. https://doi.org/10.12973/iji.2017.10317a

Reynolds, C. R. (2010). Measurement and assessment: An editorial view. Psychological Assessment, 22(1), 1–4. https://doi.org/10.1037/a0018811

Safitri, A., & Retnawati, H. (2020). The estimation of mathematics literacy ability of junior high school students with partial credit model (pcm) scoring on quantity. Journal of Physics: Conference Series, 1581, 012030. https://doi.org/10.1088/1742-6596/1581/1/012030

Samejima, F. (1970). Erratum Estimation of latent ability using a response pattern of graded scores. Psychometrika, 35(1), 139–139. https://doi.org/10.1007/BF02290599

Suciati, Munadi, S., & Sugiman. (2022). Estimation of test item parameters with polytomous item response using Partial Credit Model (PCM). Proceedings of the 2nd International Conference on Innovation in Education and Pedagogy (ICIEP 2020). https://doi.org/10.2991/assehr.k.211219.042

Sun, X., Zhong, F., Xin, T., & Kang, C. (2021). Item response theory analysis of general self-efficacy scale for senior elementary school students in China. Current Psychology, 40(2), 601–610. https://doi.org/10.1007/s12144-018-9982-8

The R Development Core Team. (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.yumpu.com/en/document/view/6853895/r-a-language-and-environment-for-statistical-computing

Yılmaz, H. B. (2019). A comparison of IRT model combinations for assessing fit in a mixed format elementary school science test. Lnternational Electronic Journal of Elementary Education, 11(5), 539–545. https://doi.org/10.26822/iejee.2019553350

Zanon, C., Hutz, C. S., Yoo, H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexão e Crítica, 29(1), 18. https://doi.org/10.1186/s41155-016-0040-x

Research and Social Study Institute

Psychology, Evaluation, and Technology in Educational Research

Polytomous scoring correction and its effect on the model fit: A case of item response theory analysis utilizing R

Abstract:

References