The design of the onSET
The onSET is a standardized language placement test using the C-test format. It enables a quick and accurate measurement of general language proficiency in German or English. Participants edit several short texts containing 20 gaps each. The gaps are created by deleting parts of certain words. The texts are on different topics and are presented in increasing level of difficulty. Each participant edits a different combination of texts but the overall level of difficulty is the same for each participant. One point is awarded for each correctly filled gap. The sum of the points for all texts is assigned to one of the CEFR levels A2 to C1.
Trained task designers from g.a.s.t. research suitable texts. They check the tasksâ€™ suitability as a C-test text using a program called â€śErPELâ€ť (Erlanger Programm zur Erstellung von LĂĽckentexten). Experts examine the texts from a theoretical, linguistic and mother-tongue perspective and only then do they release suitable tasks for trialling. Regular communication takes place between all parties involved.
Each newly created C-test is tested thoroughly before it is used operationally in the onSET. At least 200 language learners edit the texts. They belong to the same target group as the participants in the onSET. The same rules apply during the trial tests as later in the real test. This means, for example, that there is a time limit for completing the task and it is not possible to return to an already completed text. Participants' responses are then evaluated centrally at the TestDaF-Institut.
CALIBRATED ITEM BANK
The difficulty and degree of accuracy of the new tasks, as well as the ability of the participants, are determined by means of psychometric analyses based on Rasch models. The tasks are tested according to differential item functioning (DIF) and other deviations from model assumptions. Tasks with acceptable statistical properties are included in the item bank.
The Rasch analyses provide essential information for the standard setting, i.e. for the definition of test scores on which the proficiency levels of the participants are based (cut scores). The definition of the cut scores rests on the prototype group method (PGM) in combination with an analysis employing the receiver operating characteristic (ROC) model.
STANDARDIZED TEST ADMINISTRATION
The test is carried out according to what is known as â€śLinear-on-the-Fly Testingâ€ť (LOFT). Each participant receives a different, newly compiled task set. When selecting the individual tasks from the item bank, the topic and level of difficulty are taken into account. The overall level of difficulty of the tasks within the sets is the same for all participants. The use of the LOFT method and the automatic evaluation of the participant responses ensure an equally objective and fair test for all participants.
The TestDaF-Institut conducts ongoing research into C-tests and the onSET in particular. In addition to issues of construct validity, research is being carried out into testlet effects, the standard setting and the difficulty of individual test tasks. The results of the investigations are published in specialist journals and books.
What do C-tests do? Which linguistic structures or competencies are reflected in the responses to C-test tasks? What cognitive processes underlie the responses?
How can the onSET cut scores be determined in accordance with the CEFR? Which standard setting methods are suitable for this? How well-defined are the cut scores?
Which psychometric models are suitable for scaling C-tests? How well do the estimated parameters for tasks (difficulty level) and the participants (ability level) agree with the different models? Which models and software packages provide rapid creation and extension of a calibrated item bank?
What influence does the dependency between gaps in the onSET tasks (testlets) have on the reliability estimation, as well as the estimation of person parameters (general language competence) and item parameters (difficulty, separation, precision)?
How does the overall language competence measured by the onSET relate to competencies such as reading comprehension, listening comprehension, written and oral production, which are recorded separately in examinations as part of standardized language tests? Therefore, how well can the performance in the TestDaF be predicted on the basis of the onSET results?
What factors determine the difficulty of the C-test tasks of both the texts and the gaps?
The following list contains only publications which refer directly to the onSET or to German language C-tests closely connected to the onSET.
Eckes, T. (2017). Setting cut scores on an EFL placement test using the prototype group method: A receiver operating characteristic (ROC) analysis. Language Testing, 34, 383â€“411.
Kaufmann, N. (2016). Die Vorhersage der Schwierigkeit deutscher C-Test-Texte: Untersuchungen am Beispiel des onDaF. Zeitschrift fĂĽr Interkulturellen Fremdsprachenunterricht, 21(2), 111â€“126.
Beinborn, L., Zesch, T. & Gurevych, I. (2015). Candidate evaluation strategies for improved difficulty prediction of language tests. In Proceedings of the 10th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 1â€“11). Denver, CO: Association for Computational Linguistics.
Eckes, T. & Baghaei, P. (2015). Using testlet response theory to examine local dependence in C-tests. Applied Measurement in Education, 28, 85-98.
Beinborn, L., Zesch, T. & Gurevych, I. (2014). Predicting the difficulty of language proficiency tests. Transactions of the Association for Computational Linguistics, 2, 517â€“529.
Eckes, T. (2014). Die onDaF-TestDaF-Vergleichsstudie: Wie gut sagen Ergebnisse im onDaF Erfolg oder Misserfolg beim TestDaF vorher? In R. Grotjahn (Hrsg.), Der C-Test: Aktuelle Tendenzen/The C-test: Current trends (S. 137â€“162). Frankfurt: Lang.
Eckes, T. (2012). Examinee-centered standard setting for large-scale assessments: The prototype group method. Psychological Test and Assessment Modeling, 54, 257â€“283.
Eckes, T. (2011). Item banking for C-tests: A polytomous Rasch modeling approach. Psychological Test and Assessment Modeling, 53, 414â€“439.
Eckes, T. (2010). Der Online-Einstufungstest Deutsch als Fremdsprache (onDaF): Theoretische Grundlagen, Konstruktion und Validierung. In R. Grotjahn (Hrsg.), Der C-Test: BeitrĂ¤ge aus der aktuellen Forschung / The C-test: Contributions from current research (S. 125â€“192). Frankfurt: Lang.
Eckes, T. (2010). Rasch models for C-tests: Closing the gap on modern psychometric theory. In A. Berndt & K. Kleppin (Hrsg.), Sprachlehrforschung: Theorie und Empirie - Festschrift fĂĽr RĂĽdiger Grotjahn (S. 39â€“ 49). Frankfurt: Lang.
Eckes, T. (2010). Standard-Setting bei C-Tests: Bestimmung von Kompetenzniveaus mit der Prototypgruppenmethode. Diagnostica, 56, 19â€“32.
Eckes, T. (2007). Konstruktion und Analyse von C-Tests mit Ratingskalen-Rasch-Modellen. Diagnostica, 53, 68â€“82.
Eckes, T. (2006). Rasch-Modelle zur C-Test-Skalierung. In R. Grotjahn (Hrsg.), Der C-Test: Theorie, Empirie, Anwendungen/The C-test: Theory, empirical research, applications (S. 1â€“44). Frankfurt: Lang.
Eckes, T. & Grotjahn, R. (2006). A closer look at the construct validity of C-tests. Language Testing, 23, 290â€“325.
Eckes, T. & Grotjahn, R. (2006). C-Tests als Anker fĂĽr TestDaF: Rasch-Analysen mit dem kontinuierlichen Ratingskalen-Modell. In R. Grotjahn (Hrsg.), Der C-Test: Theorie, Empirie, Anwendungen/The C-test: Theory, empirical research, applications (S. 167â€“193). Frankfurt: Lang.
Arras, U., Eckes, T. & Grotjahn, R. (2002). C-Tests im Rahmen des "Test Deutsch als Fremdsprache" (TestDaF): Erste Forschungsergebnisse. In R. Grotjahn (Hrsg.), Der C-Test: Theoretische Grundlagen und praktische Anwendungen (Bd. 4, S. 175â€“209). Bochum: AKS-Verlag.
Differentiation of the CEFR levels
The result of the onSET is reported as the total score. In addition, participants receive a classification in accordance to the Common European Framework of Reference for Languages (CEFR). For this reason, the onSET total score is assigned to one of the following CEFR levels: A2, B1, B2 or C1 or higher. If an application needs more refinement in terms of classification, such as B1.1 or B1.2, then this is possible within the core areas:
If, for example, a participant whose language competence in the onSET-Deutsch has been scored as B1 reaches a total score of 64 points, his or her performance could be assigned to the lower B1 level (10 points below the lower limit of the B1 core range). With a total of 87 points, the upper B1 range would be considered (8 points above the upper limit of the B1 core area).
The Society for Academic Study Preparation and Test Development (Gesellschaft fĂĽr Akademische Studienvorbereitung und Testentwicklung e. V. â€“ g.a.s.t.) and the Association of Language Centres (Arbeitskreis der Sprachenzentren â€“ AKS) / UNIcertÂ© promote scientific research on questions of tests and exams and provide the onSET with data for promising research projects.
The prerequisites for obtaining data vary according to the type of data required for the particular research project.
- Information published on g.a.s.t. websites for the public without restriction.
- Published example tasks.
- Publications by g.a.s.t. employees in specialist journals, books or on the internet.
The use of these data and materials is not subject to any prerequisites except that, as is good scientific practice, the source (e.g. website or publication) should be cited. Employees of g.a.s.t. can advise on where to source the information.
2. Summary statistics on test results or participant data in individual test centres, regions or countries
g.a.s.t. provides these statistics to licensed test centres in an anonymised form. The request must clearly indicate, in detail, what information is required and what information is to be used. Requests of this type from non-licensed institutions or individual will be answered only if a genuine research interest is demonstrated and justified. In such cases, g.a.s.t. should be cited as the information source for any form of use of statistical data provided in publications.
Raw data includes, for example, results of individual participants in the onSET, demographic information on the participants, evaluations of participant performance in the productive parts of a the TestDaF exam, etc. Such access is linked to a number of prerequisites:
- The applicant must submit a multi-page research proposal that shows the scientific question, the hypotheses to be tested and the planned data analysis.
- This outline is subject to a careful assessment procedure.
In the case of a positive assessment, only the data required to answer the question referred to in the proposal will be made available by g.a.s.t. The applicant undertakes to explicitly mention in his or her own work both the data source as well as g.a.s.t. and special editions or copies of the published works. This also applies to academic qualification work (B.A., Master, dissertations).
In cases where it is justified, direct access may be granted to applicants for such purposes, to universities or individuals, or in co-operation with licensed partners of g.a.s.t. This will be decided by the g.a.s.t. management.