Comparison of the results of the generalizability theory with the inter-rater agreement coefficients

  • Mehmet Taha Eser Aydın Adnan Menderes University
  • Gökhan Aksu


The agreement between raters is examined within the scope of the concept of “inter-rater reliability”. Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to use. In this study, the comparison of eight different agreement coefficients used for the same purpose and the similarity of the results obtained with the G coefficient calculated within the framework of generalizability theory were examined. Within the scope of the study, it was determined that there were differences between the agreement coefficients of the evaluations made by the seven raters for 49 students over six open-ended items. As a result of the study, it was determined that the agreement coefficients differed significantly according to the method used and the level of agreement could be interpreted as low-medium-high according to the method used. In addition, as a result of the generalizability analysis, it was determined that the largest proportion of the variance components resulted from the difference between the raters and equal to 40% of the total variance between the raters. For this reason, it is recommended that researchers first examine the variance components originating from the person, item, and raters while determining the inter-rater reliability, and finally, report a few of the appropriate coefficients in case the inter-rater variance is low.