Admission testing for higher education: A multi-cohort study on the validity of high-fidelity curriculum-sampling tests

A. Susan M. Niessen; Rob R. Meijer; Jorge N. Tendeiro

doi:10.1371/journal.pone.0198746

Abstract

We investigated the validity of curriculum-sampling tests for admission to higher education in two studies. Curriculum-sampling tests mimic representative parts of an academic program to predict future academic achievement. In the first study, we investigated the predictive validity of a curriculum-sampling test for first year academic achievement across three cohorts of undergraduate psychology applicants and for academic achievement after three years in one cohort. We also studied the relationship between the test scores and enrollment decisions. In the second study, we examined the cognitive and noncognitive construct saturation of curriculum-sampling tests in a sample of psychology students. The curriculum-sampling tests showed high predictive validity for first year and third year academic achievement, mostly comparable to the predictive validity of high school GPA. In addition, curriculum-sampling test scores showed incremental validity over high school GPA. Applicants who scored low on the curriculum-sampling tests decided not to enroll in the program more often, indicating that curriculum-sampling admission tests may also promote self-selection. Contrary to expectations, the curriculum-sampling tests scores did not show any relationships with cognitive ability, but there were some indications for noncognitive saturation, mostly for perceived test competence. So, curriculum-sampling tests can serve as efficient admission tests that yield high predictive validity. Furthermore, when self-selection or student-program fit are major objectives of admission procedures, curriculum-sampling test may be preferred over or may be used in addition to high school GPA.

Citation: Niessen ASM, Meijer RR, Tendeiro JN (2018) Admission testing for higher education: A multi-cohort study on the validity of high-fidelity curriculum-sampling tests. PLoS ONE 13(6): e0198746. https://doi.org/10.1371/journal.pone.0198746

Editor: Leonidas G. Koniaris, Indiana University, UNITED STATES

Received: November 21, 2017; Accepted: May 24, 2018; Published: June 11, 2018

Copyright: © 2018 Niessen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Due to ethical and privacy reasons, the data cannot be made publicly available. Others may be able to identify a student based on the combined information provided in the dataset. The data can be accessed upon request by all interested and qualifying researchers, either through the authors or though the Department of Psychometrics and Statistics of the University of Groningen (department secretary: Hanny Baan, j.m.baan@rug.nl).

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Curriculum-sampling tests are increasingly used in admission procedures for higher education across Europe. For example, in Finland, Belgium, the Netherlands, and Austria these test are used across various academic disciplines such as medicine [1–3], psychology [4,5], teacher education [6], economics and business [7], and computer science [8]. The rationale behind these tests is to mimic later behavior that is expected during an academic study. Thus, curriculum samples often mimic representative parts of the academic program that the student is applying to. Often, these samples are small-scale versions of an introductory course of a program, because performance in such courses is a good indicator for later academic performance (e.g., [4,9,10]). An example is studying domain-specific literature or watching video-lectures, followed by an exam.

There are several arguments for using curriculum samples in admission to higher education in addition to or instead of traditional admission criteria used in higher education, such as high school GPA. High school GPA is a good predictor of academic performance in higher education (e.g., [11–14]). However, due to the increasing internationalization and different educational ‘routes’ to higher education [15], these grades are often difficult to compare across applicants. Also, Sackett et al. [16] and Kuncel et al. [17] found that matching the content of the predictor to the criterion was beneficial to the predictive validity.

There are, however, few studies in which the validity of curriculum sampling tests has been investigated. Two types of relevant validity evidence can be distinguished: predictive validity for academic achievement, showing the value of these tests in admission procedures, and construct validity or “construct saturation” (i.e., the degree to which the score variance reflects construct variance [18–20]), which can contribute to explaining the predictive validity of curriculum samples. In the first study described in this paper we investigated the predictive validity of curriculum sampling for academic achievement in a psychology program. In the second study, we investigated the construct saturation of curriculum samples.

Signs, samples, and construct saturation

Most traditional assessments for performance prediction, like cognitive ability tests and personality inventories, are based on a signs approach. Signs are psychological constructs that are theoretically linked to the performance or behavior of interest [21]. In contrast, samples are based on the theory of behavioral consistency: Representative past or current performance is the best predictor for future performance [21,22]. The samples approach originated in the context of personnel selection, where high-fidelity simulations like work sample tests and assessment centers were good predictors of future job-performance [18,23,24]. An explanation for the high predictive validity of sample-based assessments is the ‘point-to-point correspondence’ of the predictor and the criterion [25]. Curriculum-sampling tests are based on the same rationale as work sample tests; they are designed as high-fidelity simulations of (parts of) an academic program.

It is often assumed that sample-based assessments are multifaceted compound measures that are saturated with cognitive- and noncognitive constructs that also underlie performance on the criterion task [22,26]. The construct saturation an assessment represents the degree to which score variance reflects variance on different constructs. The concept of construct saturation may seem in conflict with the underlying idea of a samples approach, since they are explicitly not designed to measure distinct constructs. However, construct saturation studies can help in explaining the predictive validity of sample-based assessments through investigating what scores on sample-based assessments represent in terms of psychological constructs [20]. In addition, construct saturation can affect predictive validity and the size of sub-group score differences of test scores [18,19,27]. Especially noncognitive saturation may provide great benefits in high-stakes assessment. Noncognitive constructs like personality traits, self-efficacy, and self-regulation are good predictors of future job-performance and academic performance, and show incremental validity over cognitive abilities [28]. However, they are difficult to measure validly in high-stakes assessment due to faking [29,30]. A performance-based assessment method that is able to tap into noncognitive traits and skills may provide a solution to that problem.

Curriculum sampling: Existing research

Curriculum-sampling tests often require studying domain-specific literature or–lectures, followed by an exam, but this approach can take many different forms, depending on the curriculum the test is designed for. The approach may also involve the preparation of a demonstration lesson in the case of teacher education [6], in which internships within schools make up a large proportion of the curriculum, a massive online open course (MOOC) for admission to a computer science program, in which applicants successfully have to complete programming assignments from the first academic year [8], or a ‘virtual semester’ in medical school, followed by an exam [3].

Most studies on curriculum sampling compared the academic performance of students admitted through curriculum-sampling procedures with students admitted via other methods. Results showed that students admitted through a curriculum-sampling procedure earned higher grades, progressed through their studies faster, and dropped out less often compared to students admitted via lottery [1,5] or via traditional entrance tests or matriculation exam grades [8]. However, participation in the curriculum-sampling admission procedures was voluntary in these studies, and admission could also be achieved through other procedures. Thus, these differences may also be caused by, for example, motivation; highly motivated applicants may have chosen to participate in the curriculum-sampling procedure that requires effort, whereas less motivated applicants may have chosen an alternative route [31].

Reibnegger et al. [3] compared the dropout rate and time to completion of the first part of medical school for three cohorts of students admitted through open admission, a ‘virtual semester’ in medicine followed by a two-day examination, or secondary school-level knowledge exams about relevant subjects. The best results were found for the cohort admitted through the ‘virtual semester’. However, the selection ratio was also the lowest for that cohort, which may have influenced the results.

Lievens and Coetsier [2] examined the predictive validity of two curriculum-sampling tests for medical school for first year GPA, one using a video-based lecture (r = .20) and one using written medical material (r = .21). However, these tests had relatively low reliability (α = .55 and .56, respectively). Their study is, to our knowledge, also the only study that investigated the construct saturation of curriculum sampling tests. They found moderate relationships between the curriculum-sampling exams scores and scores on a cognitive ability test (r = .30 and r = .31, respectively), indicating at least some cognitive ability saturation, but they did not find relationships with scores on the Big Five personality scales. In addition, Niessen et al. [4] found that the predictive validity of a curriculum-sampling test for undergraduate psychology applicants was r = .49 for first year GPA, r = .39 for credits obtained in the first year, and r = -.32 for dropping out of the program in the first year. Furthermore, the curriculum-sampling test scores were related to subsequent self-chosen enrollment, indicating that it may serve as a self-selection tool. Booij and van Klaveren [7] found similar results based on an experiment in which applicants to an economics- and business program were randomly assigned to a non-binding placement procedure consisting of an intake interview or a curriculum-sampling day without a formal exam or assignment. Compared to the interview condition, fewer students from the curriculum-sampling condition enrolled in the program and more students passed the first year. Enrolling students who participated in the curriculum-sampling procedure also reported that the program met their expectations more often than enrollees from the interview condition.

Aim of the current article

The few existing predictive validity studies [2,4] only used single cohorts and only used first year academic achievement as a criterion measure. To our knowledge, there are no predictive validity studies using outcomes such as graduation rates and long-term college GPA. Furthermore, there is only one study [2] in which the construct saturation of curriculum-sampling tests was investigated. To address these shortcomings in the literature, we conducted two studies. The aim of the first study was to investigate the predictive validity of curriculum-sampling tests for academic performance in a psychology program at a Dutch university. We studied predictive validity in different cohorts, using not only first year GPA but also third year GPA and bachelor-degree attainment as criterion measures, thus extending the Niessen et al. [4] study. The aim of the second study was to investigate the construct saturation of curriculum-sampling tests by gaining more insight into how these test scores are related to psychological constructs, in order to explain their predictive validity.

The present studies.

The academic outcomes in Study 1 were academic performance, study progress, and college retention. In addition, we studied (1) the incremental validity of the curriculum-sampling scores over high school GPA, (2) the predictive- and incremental validity of curriculum-sampling tests for achievement in specific courses, over specific skills tests designed to predict performance in those courses [16], and (3) the relationship between curriculum-sampling tests scores and self-chosen enrollment decisions. We studied two types of curriculum-sampling tests: A literature-based test and a video-lecture test. Given the correspondence between criterion and the predictors, we expected to replicate the high predictive validity results from Niessen et al. [4] for first year academic achievement and we expected somewhat lower predictive validity for later academic achievement. In addition, we expected that the skills tests would predict performance in courses they were designed to predict (the math test for statistics courses and the English test for theoretical courses), and that the curriculum-sampling tests would predict performance in both types of courses, but that the correlation with statistics course performance would be lower compared to the math test.

In the second study we investigated the hypothesis that the curriculum-sampling tests are saturated with both cognitive- and noncognitive constructs. Assuming that the curriculum-sampling test scores represent a ‘sample’ of future academic performance, we expected that the curriculum-sampling test scores would be saturated with variables that also predict academic performance, such as cognitive ability [32,33] and several noncognitive constructs and behavioral tendencies [28,34]. To investigate this hypothesis, we studied if and to what extent the scores on the curriculum-sampling test can be explained by cognitive ability, conscientiousness, procrastination tendencies, study-related cognitions, and study strategies [28,34].

Study 1: Predictive validity

Method

Procedure.

All applicants to an undergraduate psychology program at a Dutch university were required to participate in an admission procedure. In 2013 and 2014, the admission procedure consisted of the administration of a literature-based curriculum-sampling test and two skills tests: A math test and an English reading comprehension test. In 2015, a math test and two curriculum-sampling tests (a literature-based test and a video-lecture test) were administered. Administration time for each test was 45 minutes, with15-minute breaks in between. Each test score was the sum of the number of items answered correctly. Applicants were ranked based on a composite score with different weights for the individual tests in each cohort. The highest weight was always assigned to the literature-based curriculum-sampling test. All applicants received feedback after a few weeks, including their scores on each test and their rank. In addition, the lowest ranking applicants (20% in 2013 and 15% in 2014 and 2015) received a phone call to discuss their results with an advice to rethink their application. However, the selection committee did not reject applicants because the number of applicants willing to enroll did not exceed the number of available places. The applicants did not know this beforehand, thus the applicants perceived the admission procedure as high stakes. The study program and the procedure could be followed in English or in Dutch. Applicants to the English program were mostly international students.

Curriculum-sampling tests.

The literature-based curriculum-sampling test was used in each cohort, and was designed to mimic the first-year course Introduction to Psychology. The applicants were instructed to study two chapters of the book used in that course. The second curriculum-sampling test, only administered in 2015, required applicants to watch a twenty-minute video lecture on the topic Psychology and the Brain. A lecturer who taught a related course in the first year provided the lecture. At the selection day, the applicants completed multiple-choice exams about the material. The exams were similar to the exams administered in the first year of the program and were designed by faculty members who taught in the first year. The first curriculum-sampling test consisted of 40 items in 2013 and 2014, and of 39 items in 2015. The second curriculum-sampling test consisted of 25 items. The exams consisted of different items each year. Cronbach’s alpha for each test is displayed in S1 Table.

Skills tests.

The English test was included in the procedure because most of the study material is in English, also in the Dutch-taught program, and the math test was included because statistics courses are a substantial part of the program. The English test consisted of 20 multiple choice items on the meaning of different English texts. The math test consisted of 30 multiple choice items in 2013 and 2014, and 27 multiple choice items in 2015, testing high-school level math knowledge. The applicants did not receive specific material to prepare for these tests, but example items were provided for the math test. The tests consisted of different items each year.

High school performance.

High school grades of enrolled students who completed the highest level of Dutch secondary education (pre-university education, in Dutch: vwo) were collected through the university administration. The grades were not part of the admission procedure. The mean high school grade (HSGPA) was the mean of the final grades in all high school courses, except courses that only resulted in a pass/fail grade. For most courses, 50% of the final course grade was based on a national final exam. The other 50% consisted of the grades obtained in the last three years of secondary education.

Academic achievement.

Outcomes on academic achievement were collected through the university administration. For all cohorts, the grade on the first course, the mean grade obtained in the first year (FYGPA, representing academic performance), the number of credits obtained in the first year (representing study progress), and records of dropout in the first year (representing retention) were obtained. For the 2013 cohort we also collected the mean grade obtained after three years (TYGPA, representing academic performance) and bachelor degree attainment after three years (representing study progress). The bachelor program can be completed in three years. All grades were on a scale of 1 to 10, with 10 being the highest grade and a six or higher representing a pass. Mean grades were computed for each student using the highest obtained grade for each course (including resits). Courses only resulting in a pass/fail decision were not taken into account. Credit was granted after a course was passed; most courses earned five credit points, with a maximum of 60 credits per year. The first and second year courses were mostly the same for all students; the third year consisted largely of elective courses. Since the skills tests were designed to predict performance in particular courses, we also computed a composite for statistics courses (SGPA) and theoretical courses (all courses that required studying literature and completing an exam about psychological theories; TGPA) in the first year. The SGPA is the mean of the final grades on all statistics courses and the TGPA is the mean final grades on courses about psychological theory. In addition, we also obtained information on whether students chose to enroll after participating in the admission procedure. Because we only used data available at the university, there were no manipulations in this study, and no identifiable information was presented, informed consent was not obtained. This was in line with the university’s privacy policy. This study was approved by and in accordance with the rules of the Ethical Committee Psychology from the University of Groningen [35].

Applicants.

The 2013 cohort (this is the same cohort used in Niessen et al.[4]) consisted of 851 applicants, of whom 652 (77%) enrolled in the program, and 638 participated in at least one course. For enrollees the mean age was 20 (SD = 2.0), 69% was female, 46% were Dutch, 42% were German, 9% had another European nationality, 3% had a non-European nationality, and 57% followed the program in English. A high school GPA obtained at the highest level of Dutch secondary education was available for 201 enrollees. Third year academic performance was available for 492 students, the others dropped out of the program in the first or second year. A high school GPA was available for 159 of these students.

The 2014 cohort consisted of 823 applicants, of whom 650 enrolled in the program (79%) and 635 participated in at least one course. For the enrollees the mean age was 20 (SD = 2.1), 66% was female, 44% were Dutch, 46% were German, 7% had another European nationality, 3% had a non-European nationality, and 59% followed the program in English. The high school GPA obtained at the highest level of Dutch secondary education was available for 217 enrollees.

The 2015 cohort consisted of 654 applicants, of whom 541 (83%) enrolled in the program, and 531 participated in at least one course. For enrollees the mean age was 20 (SD = 2.0), 70% was female, 43% were Dutch, 46% were German, 9% had another European nationality, 2% had a non-European nationality, and 62% followed the program in English. The high school GPA obtained at the highest level of Dutch secondary education was available for 188 enrollees.

Analyses.

To assess the predictive validity of the curriculum-sampling tests, correlations were computed for each cohort between the different predictor scores and FYGPA, obtained credits in the first year, dropout, TYGPA, and degree attainment after three years. Because the literature-based curriculum-sampling tests was designed to mimic the first course, and because the first course was previously found to be a good predictor of subsequent first year performance [4,10], correlations between the curriculum-sampling test scores and the first course grade and correlations between the first course grade and subsequent first year performance were also computed. For these analyses, results from the first course were excluded from the FYGPA and the number of obtained credits. In addition, to compare the predictive validity of the curriculum-sampling tests to the predictive validity of HSGPA, correlations between HSGPA and the academic performance outcomes were also computed. In addition, the relationship between the admission test scores and enrollment was studied by computing point-biserial correlations between the scores on the admission tests and enrollment. Low-ranking applicants were contacted by phone and were encouraged to reconsider their application. To further investigate the relationships between enrollment and the scores on the admission tests, logistic regression analyses controlling for receiving a phone call were conducted in each cohort, with the admission test scores as independent variables and enrollment as the dependent variable. To ease interpretation of the odds ratios, the admission test scores were standardized first.

Corrections for range restriction.

Although applicants were not rejected by the admission committee, indirect range restriction occurred due to self-selection through self-chosen non-enrollment for first year results, and through dropout in earlier years of the program for third year results, which may result in underestimation of operational validities. Therefore, the individual correlations (r) were corrected for indirect range restriction (IRR) using the Case IV method [36] resulting in an estimate of the true score correlation (ρ) corrected for unreliability in the predictor and the criterion, and for IRR, and the operational validity (r_c), only corrected for IRR and unreliability in the criterion variable. Corrections for criterion unreliability were only made for GPA criterion variables, and not for obtained credits and drop out. The corrected correlations were computed using the selection package in R [37]. In the discussion of the results we focus on the operational validities [38]. Statistical significance (α = .05) of individual correlations was determined before corrections were applied. The correlations were aggregated across cohorts when applicable (resulting in , and ).

Because the number of cohorts was small and the admission procedures and samples were very similar, the validity estimates were aggregated (, and ) applying a fixed effects model, using the metafor package in R [39]. It was not possible to correct the correlations between first year academic results and HSGPA and the first course grade for IRR, since only data of enrolled students were available for these variables. Therefore, these correlations were only corrected for predictor- and criterion unreliability , and for criterion unreliability . For third year results, the correlations for the first course grade and high school GPA could be corrected for range restriction due to dropout in earlier years. The reliability of the first course grade was only known for the 2013 sample (α = .74) and was assumed constant across cohorts.

In addition, the incremental validity of the literature-based curriculum-sampling test over HSGPA was studied based on the observed and corrected aggregated correlations, including an IRR correction on the correlation between HSGPA and the curriculum-sampling test scores. We conducted these analyses using the full samples, and using only data from students that had a high school GPA at the highest level of Dutch secondary education. Furthermore, the skills tests (math and English reading comprehension) were included in the admission procedure to predict performance in first year statistics courses and theoretical courses, respectively. We studied the predictive validity for these courses by computing correlations between the scores on the admission tests and the mean grade on these two types of courses in the first year, and the incremental validity of the skills tests over the literature-based curriculum-sampling test. The corrections and aggregation procedures described above were also applied to these analyses.

Estimating GPA reliability.

Reliability estimates for GPA variables were obtained in the same way as described in Bacon and Bean [9]. First, we computed intraclass correlations (ICC’s) between the grades that were used to compute each GPA variable, using the mean squares resulting from ANOVA’s with grade as the DV and student as the IV. Next, the Spearman-Brown prophecy formula was applied to the ICC’s using the mean number of grades in each GPA variable. The resulting reliability estimates are shown in S2 Table and were in line with previous results on the reliability of college GPA [9,40]. The same procedure was used to compute the reliability of high school GPA, shown in S1 Table.

Results

Short-term predictive validity.

S1 and S2 Tables contain all descriptive statistics and S3 Table shows the observed correlations between the predictors and the first year academic outcomes in each cohort. Table 1 shows the aggregated observed, true, and operational validities of each predictor measure for the first year academic performance outcomes.

Download:

Table 1. Correlations between predictors and first year academic outcomes aggregated across cohorts.

https://doi.org/10.1371/journal.pone.0198746.t001

Curriculum samples as predictors.

The validity of the literature-based curriculum-sampling test was consistent across cohorts and the aggregated operational validity was high for first year academic performance in terms of GPA and moderate for obtained credits and for dropout in the first year . The video-lecture test (only administered in 2015) showed moderate predictive validity for FYGPA (r_c = -.36) and obtained credits (r_c = .29), and a small negative correlation with dropout (r_c = -.15). In the entire applicant sample, the correlation between the scores on both curriculum-sampling tests equaled r = .51. In addition, the video-lecture test showed very small incremental validity for predicting FYGPA over the literature-based curriculum-sampling test (ΔR² = .01, R² = .20, ΔF_{(1, 528)} = 5.69, p = .02, and based on the corrected correlations, ΔR²_c = .01, R²_c = .29).

Specific skills tests and grades as predictors.

The operational validities of the math and English skills tests were less consistent across cohorts (S3 Table) and the aggregated operational validities were moderate to small for all outcome measures. The data needed to check for and, if needed, correct for range restriction in HSGPA were not available, so we only computed correlations corrected for unreliability with the academic outcomes.

High school GPA showed high predictive validity for FYGPA , and moderate predictive validity for the obtained number of credits and dropout . The first course grade showed very high predictive validity for subsequent performance in the first year, with for FYGPA, for obtained credits, and for dropout. These correlations were substantially higher than those for the literature-based curriculum-sampling test, which was modeled after this first course. The literature-based curriculum-sampling test showed an aggregated correlation with the first course grade of ( after correction for IRR).

Long-term predictive validity.

Table 2 shows observed and corrected correlations between the predictors and academic performance after three years (only studied for the 2013 cohort). The literature-based curriculum-sampling test showed a high operational validity of r_c = .57 for third year GPA, and a moderate operational validity of r_c = .32 with bachelor’s degree attainment in three years. The math skills test showed small validities (r_c = .28 for TYGPA and r_c = .19 for TYBA), and the English test showed small validity for TYGPA of r_c = .18, and small, non-significant validity for TYBA (r_c = .06). High school GPA had a high correlation with TYGPA of r_c = .65 and a moderate correlation with TYBA of r_c = .31. Lastly, the first course grade on Introduction to Psychology obtained in the first year showed large correlations with TYGPA (r_c = .73) and with TYBA (r_c = .48). Thus, the curriculum-sampling test scores, high school GPA, and the first course grade were good predictors of academic performance after three years of studying in the Psychology program.

Download:

Table 2. Correlations between predictors and third year academic outcomes.

https://doi.org/10.1371/journal.pone.0198746.t002

Incremental validity.

The incremental validity of the literature-based curriculum-sampling tests over high school GPA was computed based on the aggregated correlations across cohorts, both observed and operational. These analyses were conducted using data based on the entire samples (Table 3), and on the subsets of applicants who had a high school GPA data (S1 Appendix). The correlations between the curriculum-sampling test scores and academic achievement were similar or slightly lower based on these subsets, compared to the results presented in Table 1 (see S1 Appendix). When the analyses were conducted for each cohort separately (results not shown but available upon request), the incremental validity of the literature-based curriculum-sampling test over HSGPA was statistically significant in each cohort and for each criterion. The aggregated correlation between the curriculum-sampling test score and high school GPA was and . The curriculum-sampling test showed a substantial increase in explained variance over high school GPA for predicting FYGPA , and for predicting the number of obtained credits . Together, high school GPA and the curriculum-sampling test scores explained a large percentage of the variance in FYGPA and in obtained credits in the first year . For predicting TYGPA, the curriculum-sampling test and high school GPA combined explained 49% of the variance, but the incremental validity of the curriculum-sampling test over high school GPA was modest . The results based on the subsets of applicants with a high school GPA were similar, with slightly lower incremental validity over high school GPA for FYGPA and FYECT, and slightly higher incremental validity over TYGPA (see S1 Appendix).

Download:

Table 3. Incremental validity of the literature-based curriculum-sampling test over high school GPA.

https://doi.org/10.1371/journal.pone.0198746.t003

Specific course achievement.

The predictive validity of the admission tests for specific course achievement in the first year is shown in Table 4. The aggregated correlation between the mean grade on the statistics courses (SGPA) and the mean grade on theoretical courses (TGPA) was . As expected, the scores on the English test only showed a small correlation with performance in the statistics courses . The curriculum-sampling tests and the math test predicted performance in statistics course equally well (around ). Their scores combined accounted for a large percentage of the variance in statistics GPA , with an incremental validity of the math test over the curriculum-sampling test of . The curriculum-sampling tests were the strongest predictors of performance in theoretical courses, with for the literature-based test and for the video-lecture test. The English test and the math test showed small correlations ( and ) with theoretical course performance. The scores on the English test and the literature-based curriculum-sampling test combined accounted for a large percentage of the variance in theoretical GPA and with an incremental validity of the English test over the curriculum-sampling test of . The results based on the observed, uncorrected correlations are described in S2 Appendix.

Download:

Table 4. Predictive validity for specific course achievement in the first year.

https://doi.org/10.1371/journal.pone.0198746.t004

Enrollment.

The aggregated operational and true score correlations between the admission test scores and enrollment are shown in Table 1, and the observed correlations per cohort are in S3 Table. The aggregated operational correlation between the scores on the literature-based curriculum-sampling tests and enrollment was ; for the video-lecture test this was r = .18. The math test scores and the English test scores showed small correlations with enrollment. So, the admission test scores showed small relationships with enrollment, with the largest correlation for the literature-based curriculum-sampling test.

To assess the relationship between the scores on the admission tests and enrollment further, enrollment was regressed on the admission test scores, controlling for receiving a discouraging phone call. To ease interpretation of the odds ratios, the admission test scores were standardized first. The results of the logistic regression are shown in Table 5. In each cohort, the score on the literature-based curriculum-sampling test was significantly related to enrollment, with higher scores associated to higher odds for enrollment. With each standard deviation unit increase in the test score, the odds of enrollment increased with e^B = 1.33, e^B = 2.04, and e^B = 1.55 in 2013, 2014, and 2015, respectively. In the 2013 and 2014 cohorts, the curriculum-sampling test score was the only significant predictor of enrollment. The video-based curriculum-sampling test score and the math test score also showed statistically significant relationships with enrollment in the 2015 cohort (e^B = 1.34 and e^B = 0.78). Notably, a higher score in the math test was associated with lower odds of enrollment.

Download:

Table 5. Logistic regression results for predicting enrollment based on admission test scores.

https://doi.org/10.1371/journal.pone.0198746.t005

Discussion

The results showed that the predictive validity of the literature-based curriculum-sampling test for first year academic achievement was consistent across cohorts, with high correlations with FYGPA and moderate correlations with the number of obtained credits and dropout. These results replicated the results obtained by Niessen et al. [4]. However, the validity of the curriculum-sampling test was lower than the predictive validity of the grade in the course that it was designed to mimic; this course grade showed very large correlations with later academic achievement. In addition, the video-lecture based curriculum-sampling test showed moderate predictive validity and little incremental validity over the literature-based test. Notably, the operational predictive validity of the literature-based curriculum-sampling test for academic performance after three years of bachelor courses was still high. The predictive validity of the curriculum-sampling test was mostly comparable to or slightly higher than the predictive validity of high school GPA; in comparison, high school GPA was a better predictor for third year GPA. Furthermore, whereas the literature-based curriculum-sampling test scores and high school GPA were strongly related, the unique explanatory power of the curriculum test scores over high school GPA was substantial.

For the prediction of specific course achievement, our expectations were partially confirmed. The math test scores did not predict statistical course performance better than the curriculum-sampling tests, but they did show some incremental validity over the curriculum-sampling test. The English test scores did not add incremental validity over the curriculum-sampling test for predicting theoretical course performance. A possible explanation may be that English reading comprehension was also implicitly assessed by the curriculum-sampling test, as the material to be studied was in English. Finally, in the regression analyses, relationships between test scores and enrollment decisions were only found for the curriculum-sampling test scores, indicating that this relationship may be an advantage of representative sample-based admission tests in particular.