Can Exams Be Shortened? Using a New Empirical Approach to Test in Finance Courses
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32799
Can Exams Be Shortened? Using a New Empirical Approach to Test in Finance Courses

Authors: Eric S. Lee, Connie Bygrave, Jordan Mahar, Naina Garg, Suzanne Cottreau

Abstract:

Marking exams is universally detested by lecturers. Final exams in many higher education courses often last 3.0 hrs. Do exams really need to be so long? Can we justifiably reduce the number of questions on them? Surprisingly few have researched these questions, arguably because of the complexity and difficulty of using traditional methods. To answer these questions empirically, we used a new approach based on three key elements: Use of an unusual variation of a true experimental design, equivalence hypothesis testing, and an expanded set of six psychometric criteria to be met by any shortened exam if it is to replace a current 3.0-hr exam (reliability, validity, justifiability, number of exam questions, correspondence, and equivalence). We compared student performance on each official 3.0-hr exam with that on five shortened exams having proportionately fewer questions (2.5, 2.0, 1.5, 1.0, and 0.5 hours) in a series of four experiments conducted in two classes in each of two finance courses (224 students in total). We found strong evidence that, in these courses, shortening of final exams to 2.0 hrs was warranted on all six psychometric criteria. Shortening these exams by one hour should result in a substantial one-third reduction in lecturer time and effort spent marking, lower student stress, and more time for students to prepare for other exams. Our approach provides a relatively simple, easy-to-use methodology that lecturers can use to examine the effect of shortening their own exams.

Keywords: Exam length, psychometric criteria, synthetic experimental designs, test length.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1336436

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1459

References:


[1] R. Cox, "Examinations and higher education: a survey of the literature,” in Higher Education Quarterly, 21(3):pp. 292-340, 1967.
[2] D. A. Frisbie, "Reliability of scores from teacher-made tests,” in Educational Measurement: Issues and Practice, 7, no.1: 25-35, 1988.
[3] E. Lee, C. Bygrave, J. Mahar, and N. Garg, "Can higher education exams be shortened? A proposed methodology for studying this issue,” in International Conference on Higher Education and Management, submitted for publication, October 2013.
[4] B. J. Hill, "Examination paper length: How many questions?” in Brit. J. Educ. Psych., vol. 48, pp. 186–195, June 1978.
[5] E. S. Lee, and T. Whalen, "Synthetic designs: A new form of true experimental design for use in information system development,” in ACM Sigmetrics Performance Evaluation Review, 35(1), pp. 191-202, 2007.
[6] J. Nunnally, and I. Bernstein, in Psychometric theory (3rd ed.). Toronto: McGraw-Hill, 1994.
[7] L. Crocker, and J. Algina, Introduction to classical & modern test theory. Fort Worth: Harcourtolt, Brace and Jovanovich, 2008.
[8] C. Dracup, "The reliability of marking on a psychology degree,” in British Journal of Psychology, 88, pp. 691-708, 1997.
[9] B. Thompson, "Understanding reliability and coefficient alpha, really,” in B. Thompson (Ed.), Score reliability (pp. 3-23). London, UK: Sage Publications, 2003.
[10] R. Henson, "Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha,” in Measurement and Evaluation in Counseling and Development, 34, 177- 189, 2001.
[11] N. Schmitt, "Uses and abuses of coefficient alpha,” in Psychological Assessment, 8, pp. 350-353, 1996.
[12] X. Fan, and B. Thompson, "Confidence intervals about score reliability coefficients, please: An EPM guidelines editorial,” in Educational and Psychological Measurement, 61, pp. 517-531, 2001.
[13] D. W. Zimmerman, B. D. Zumbo, and C. Lalonde, "Coefficient alpha as an estimate of test reliability under violation of two assumptions," in Educational and Psychological Measurement, 53, pp. 33-49, 1993.
[14] G. J. Cizek, S. Rosenberg, and H. Koons, "Sources of validity evidence for educational and psychological tests,” in Educational and Psychological Measurement, 68, pp. 397-412, 2008.
[15] G. J. Cizek, D. Bowen, and K. Church, "Sources of validity evidence for educational and psychological tests: A follow-up study,” in Educational and Psychological Measurement, 70, pp. 732-743, 2010.
[16] G. J. Cizek, "Defining and distinguishing validity: Interpretations of score meaning and justifications of test use,” in Psychological Methods, 17, pp. 31-43, 2012.
[17] W. Schaufeli, A. Bakker, and M. Salanova, "The measurement of work engagement with a short questionnaire,” in Educational and Psychological Measurement, 66 (4): pp. 701-716, 2006.
[18] L. Wilkinson, and APA Task Force on Statistical Inference, "Statistical methods in psychology journals,” in American Psychologist, 54, pp. 594-604, 1999.
[19] B. Thompson, "If statistical significance tests are broken/misused, what practices should supplement or replace them?” in Theory & Psychology, vol. 9, pp. 165-181, 1999.
[20] G. Cumming, and S. Finch, "A primer on the understanding, use and calculation of confidence intervals based on central and noncentral distributions,” in Educational and Psychological Measurement, vol. 61, pp. 530-572, 2001.
[21] R. E. Kirk, "Practical significance: A concept whose time has come,” in Educational and Psychological Measurement, vol. 56, pp. 746–759, 1996.
[22] R. Bakeman, "Recommended effect size statistics for repeated measures,” in Behavior Research Methods, vol. 37, pp. 379-384, 2005.
[23] S. Olejnik, and J. Algina, "Generalized eta and omega squared statistics: Measures of effect size for some common research designs,” in Psychological Methods, vol. 8, pp. 434-447, 2003.
[24] D. C. Howell, Statistical Methods for Psychology. Belmont, CA: Duxbury Press, 2002.
[25] J. H. Steiger, and R. T. Fouladi, "R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation,” Behavior Research Methods, Instruments, and Computers, vol. 24, pp. 581-582, 1992.
[26] J. H. Steiger, and R. T. Fouladi, "Noncentrality interval estimation and the evaluation of statistical models,” in L. Harlow, S. Mulaik, & J.H. Steiger, Eds., What If There Were No Significance Tests? New Jersey: Lawrence Erlbaum, 1997.
[27] L. Barker, E. Luman, M. McCauley, and S. Chu, "Assessing equivalence: An alternative to the use of difference tests for measuring disparities in vaccination coverage,” in American Journal of Epidemiology, vol. 156, pp. 1056-1061, 2002.
[28] J. Rogers, K. Howard, and J. Vessey, "Using significance tests to evaluate equivalence between two experimental groups," in Psychological Bulletin, vol. 113, pp. 553-565, 1993.
[29] C. W. Dunnett, "A multiple comparison procedure for comparing several treatments with a control,” in Journal of the American Statistical Association, vol. 50, pp. 1096-1121, 1955.
[30] C. W. Dunnett, "New tables for multiple comparisons with a control,” Biometrics, vol. 20, pp. 482-491, 1964.
[31] D. C. Howell, "Multiple comparisons with repeated measures,” retrieved from the web on 27 Sep 2009.
[32] R. E. Kirk, Experimental Design: Procedures for the Behavioral Sciences (3rd ed.). Pacific Grove, CA: Brooks/Cole, 1995.
[33] S. E. Maxwell, "Pairwise multiple comparisons in repeated measures designs,” Journal of Educational Statistics, vol. 5, pp. 269-287, 1980.
[34] J. Jensen, D. Berry, and T. Kummer, "Investigating the effects of exam length on performance and cognitive fatigue,” in PLoS ONE, vol. 8(8), pp. 1-9, e70270, 2013.