Which reflects the difficulty of the test items. Tests in the technology of block teaching of mathematics to high school students

Itr is an indicator for measuring the ease and difficulty of a test question (statement), taking into account ALL correct answers given by the test takers.

Difficulty index Itr is calculated as follows:

Itr = (H + L)/n x 100

where: N is the number of correct answers in the “strong” group;

L- the number of correct answers in the “weak” group;

    total number subjects in both groups (1/3+1/3).

A difficulty index of 95% indicates that for 95% of subjects the answer to this test was not a problem. A test with low Difficulty Index scores is either too difficult or incorrectly designed. The optimal value of the difficulty index is 50-60%, and acceptable fluctuations are from 30 to 70%. Tests with an ITR value below 30% and more than 70% are excluded from the test program (or are not taken into account in the final calculation of all points for the entire test program).

The quality of each test can be characterized using discriminativity index (ID). The discrimination index shows how much a given test can distinguish (discriminate) more trained specialists from less trained ones:

ID = 2 x (H - L)/n

(notations are the same as for calculating Itr).

It has been established experimentally:

An ID value of 0.35 and above is an excellent test.

0.25-0.34 is a good test

0.15-0.24 - controversial test

below 0.15 - the test is poorly designed and is subject to exclusion from the “bank” of tests.

After calculating the difficulty and discrimination indices, the test program is revised - tests with unsatisfactory Itr and Id indices and, above all, tests with an Itr value of more than 70% (easy tests) and Id less than 0.25 are excluded from it.

Test reliability characterized by reproducibility of results when repeated testing of the same group of subjects and, like difficulty, is determined experimentally.

All learning objectives can be divided conditionally into three types: objective, logical and psychological, which in turn can be divided into groups that differ in the mechanism of the mental actions they cause.

Subject types of problems. When solving them, the student has to navigate in a certain subject field, which can “place” not only objects (things), but also people, living organisms, as well as their models (drawings, drawings, diagrams, etc.) Orientation in the subject field is mental actions when a person, based on certain signs known to him, finds objects in it, makes a mental classification of them in order to operate only with significant objects that allow him to solve a problem.

Logical task types . These are tasks that require reasoning according to the laws of logic to solve them, i.e. actions in the mind, without any reliance on material reference objects. Reasoning is aimed at identifying what data is really needed to solve problems, what data should be eliminated as unnecessary, and what necessary data is missing in the formulation of the problem. You need to ask your teacher or find them yourself.

What types of logical type tasks can be included in a set for teaching mental actions?

There are four types in total:

    tasks in which there is no unnecessary data, but also some necessary ones (A–B–), and, finally,

    tasks in which, due to the presence of unnecessary data, not all necessary data are available (A – B+).

    Psychological types of tasks. They can provoke erroneous actions by the student, because essential points that are directly related to solving the problem can be hidden behind unimportant ones. The subject is required to have intelligence and will in order not to succumb to the temptation to take the easy path, as well as thoughtfulness in actions, prudence in analyzing the conditions of the task.

Psychological types of tasks can be distinguished by the following characteristics:

a) the signs of the phenomenon presented in the problem resemble those that characterize the desired (required, related to this activity) phenomenon, but in fact it is something else ( similar, but not the same)

b) the observed signs resemble the desired phenomenon, and in fact it is so (and it looks like both);

c) the observed signs do not seem to clearly relate to the phenomenon being sought, but nevertheless it turns out that these are precisely its signs ( Not it seems, otherwise);

d) from the visual signs of a phenomenon that does not quite resemble the one we are looking for, we can conclude that they do not belong to him (it doesn’t look like it and it’s not the same).

Since such situations can occur in real life, the student needs to develop the ability to accurately focus on the essential signs of what he is looking for, not to succumb to illusions of sight or hearing, and not to mistake what is similar for what is needed for correct execution activity, but not to miss what is externally different, but internally inherent in it. Therefore, a set of practical educational tasks must necessarily contain data that lures the student into a trap, provokes an erroneous action, and gives false signals. Knowing this, the student must be extremely careful, not fall for a cleverly disguised trick, and strictly follow the accepted criteria for assessing situations.

The ability to solve problems of a psychological type indicates that the student has sufficiently and comprehensively mastered the activity, all his actions are meaningful, and have a high degree of consciousness.

Methodological development of a thematic lesson includes the following items:

    Topic title.

    Objectives of the lesson.

    Total class time.

    Equipment of the lesson (material and technical (equipment, instruments), methodological, information support(list of educational tables, stands, teaching aids, drugs, programs, etc.).

    Plan (according to the diagram above).

    Educational and training materials.

    Control materials (tasks for initial and final control of assimilation).

Difficulty indicator test task as the most important dough-forming factor.

Krasheninnikova Galina Gennadievna

Ph.D. ped. Sciences, Magadan branch of the Russian State University for the Humanities

One of the main characteristics of a test task is its difficulty. The level of difficulty of the task, as well as the level of preparedness of the person being tested, are latent parameters that cannot be directly observed. In order to evaluate these parameters, it is necessary to use indicators closely related to them. When testing students' knowledge, the test tasks themselves serve as an indicator. The task arises: to convert the indicator values ​​into the values ​​of latent parameters. There are various approaches to solving this problem. Classical and modern theory testing offer their own methods for estimating latent parameters.

The traditional measure of item difficulty in classical test theory for many years What remains is the ratio of the number of correct answers to a given task to the total number of subjects in the group. The easier the task, the higher the percentage of those who completed this task.

However this definition carries a semantic inaccuracy: an increase numerical value statistical indicator indicates a decrease in the level of difficulty of the task, and vice versa. Therefore in lately Attempts are being made to introduce new units of difficulty. The classic measure of difficulty is replaced by the opposite one and represents the proportion of incorrect answers in the group of subjects, which, in our opinion, more accurately reflects the meaning of the “task difficulty” parameter.

Modern testing theory – Item Response Theory (IRT) – is based on the theory of latent structural analysis (LSA) created by P. Lazarsfeld. In IRT, unlike the classical theory, the latent parameter is treated not as a constant value, but as a continuous variable. IRT methods can be classified according to the number of parameters they use. The most famous are the one-parameter model of G. Rasch and the two- and three-parameter models of A. Birnbaum.

Georg Rasch placed both the level of preparedness of the test taker and the level of difficulty of the task on the same scale, introducing a common unit of measurement for them - the logit. One logit of task difficulty is equal to natural logarithm the ratio of the proportion of incorrect answers to this task to the proportion of correct answers.

Despite the fact that IRT has recently become widespread, it nevertheless has many disadvantages. In particular, when testing educational achievements, significant discrepancies are noted between calculated values ​​and empirical data. A high correlation (about 0.9) between the results obtained using the Rasch model and the results obtained by classical methods has also been proven. This fact allows us, without compromising the accuracy of calculations, to use the methods of classical test theory to characterize the difficulty of test tasks.

Although the classic formula for calculating the difficulty of a task is quite convenient for execution and subsequent interpretation of the results obtained, in our opinion, it is not without some subjectivity: the difficulty of the task directly depends on the sample of test takers. In this regard, let us consider another view on assessing the level of difficulty of a test task, which, although not widespread, is of some interest to us.

To approach the essence of the latent parameter “difficulty”, let us turn to the classification of levels of knowledge acquisition adopted in pedagogical literature. One can notice a completely objective increase in the degree of difficulty of assimilation for each subsequent level of knowledge acquisition. Thus, we can conclude that there is a direct relationship between the levels of mastery and the difficulty levels of tasks corresponding to each level of mastery. This allows us to identify such concepts as “level of difficulty” and “level of mastery” in relation to test tasks. Taking as a basis the classification of V.P. Bespalko, we distinguish four levels of difficulty: “student”, typical, heuristic, creative.

Currently, expert methods are widely used in pedagogy. Therefore, expert assessment of the level of difficulty of test tasks deserves attention as another option for assessing the difficulty indicator. For example, in the work of A.P. Ivanov provides a description of such an assessment, when, before the start of a test experiment, several experts are asked to evaluate the difficulty of tasks of all test options in points. To obtain an expert assessment, the author provides a list of eight factors with corresponding evaluation criteria from 1 to 5 points for each.

In a well-designed test, item difficulty should not be affected by either the form or the organization of testing. The difficulty indicator depends only on the content and level of preparedness of the test takers. True, there is an opinion that the degree of difficulty of a task is influenced by the location of this task in the test structure. In this case, it is recommended to use several test options that differ in the sequence of assignments. V.S. Avanesov believes that the main principle for developing the content of pedagogical tests is the increasing difficulty of test tasks. In his opinion, only after determining the degree of difficulty does the task have a chance to become a test. Until then, it remains just a task in test form.

Inclusion in the test large number assignments of average difficulty increases its reliability, but leads to a decrease in its content validity. A test consisting of easy tasks that test minimal knowledge cannot give an idea of ​​​​the actual level of knowledge. The selection of test items of a high degree of difficulty can help increase motivation in studies, but can also affect reverse side. Thus, tests made from difficult tasks also distort test results. In addition, the content of the test should vary depending on the level of preparedness of the student groups. The difficulty of the test for weak students differs markedly from the difficulty level of the test offered to strong students.

According to A. Anastasi and S. Urbina, the choice of task difficulty level depends on the purpose of the test and on how the test indicators are intended to be used. For subject-oriented tests, the difficulty of tasks should be at the level of 0.8-0.9. Determining its informativeness by the level of difficulty of a task, the authors show that the most informative is a task with an average difficulty level of 0.50.

Thus, we can conclude that tasks with an average level of difficulty have the greatest differentiating ability. And, if the purpose of testing is to differentiate test takers and comparatively assess their level of knowledge, then the simplest and most difficult tasks should be excluded from the test. If the purpose of the test is to determine whether the student has sufficiently mastered a certain set of competencies necessary to move to the next stage of training, then it may contain both the easiest and most difficult tasks.

Bibliography

1. Avanesov V.S. Using tasks in test form in new ones educational technologies// School technologies. – 2007. – No. 3. – P. 146–163.

2. Anastasi A., Urbina S. Psychological testing. – St. Petersburg. : Peter, 2002. – 688 p.

3. Bespalko V.P. Components educational technology. – M.: Pedagogy, 1989. – 192 p.

4. Ivanov A.P. Systematization of knowledge in mathematics in specialized classes using tests. – M.: Fizmatkniga, 2004. – 416 p.

5. Ingenkamp K. Pedagogical diagnostics. – M.: Pedagogy, 1991. – 240 p.

6. Kim V.S. Analysis of test results in the process of Rasch measurement // Pedagogical measurements. – 2005. – No. 4. – P. 39–45.

7. Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. – Chicago & London, 1980. – 199 p.


If a pedagogical test is briefly defined as a system of tasks of uniformly increasing difficulty, then it will become clear that the difficulty of the tasks is the most important, let’s say, formative indicator of the test. Many school leaders believe that their teachers are able to “come up with” short time they can do as many “tests” as they want. In fact, you can come up with quite a lot of tasks in test form. And not tests at all, but only assignments. They cannot be included in real test until the measure of their difficulty, as well as other characteristics, becomes known. The difficulty measure is empirically tested. From this requirement it becomes clear that preliminary empirical testing of each task is mandatory before testing begins. During the verification process, many tasks do not meet the requirements for them, and therefore are not included in the test. The first requirement for test tasks: in the test, the tasks must vary in level of difficulty, which follows from the previously given definition of the test and the principle under consideration.

The attentive reader has probably already caught the differences in the vocabulary of the three basic concepts of the theory of pedagogical measurements introduced here, as if “imperceptibly”: the concept of a pedagogical test, a task in test form and a test task. The requirements for the first of them have already been discussed in the article “Definition of a pedagogical test” (USh No. 30, August 1999).

It is better to introduce the requirements for the second concept now, at least briefly listing them, so as not to be distracted from the main topic of the article. The following requirements apply to tasks in the test form:

  • brevity;
  • manufacturability;
  • correct form;
  • correctness of content
  • logical form of statement;
  • the same rules for evaluating answers;
  • availability specific place for answers;
  • identical instructions for all subjects;
  • correct location of task elements;
  • adequacy of instructions to the form and content of the task
Avanesov V.S. Fundamentals of the pedagogical theory of measurements // Pedagogical Measurements, 1, 2004. P. 17.

A detailed interpretation of these requirements will follow in the following articles, but now I would like to draw the reader’s attention to the fact that there is no requirement for a known difficulty of the task, while such a requirement is imposed on the test and the test task. From reflection on this and previously published material, two conclusions can be drawn. The first is that the test has no place for tasks with an unknown degree of difficulty. And the second is that not all proposed tasks in test form can become test tasks: this different concepts. In the first concept, the most essential requirements are content and form. For test tasks, first of all, there is a requirement of known difficulty, something that is clearly not required for tasks in test form. It can be repeated that tasks have a chance to become tests only after empirical testing of the measure of their difficulty on typical groups of subjects.

The indicator of the difficulty of the test and test items is meaningful and formal at the same time. Meaningful because good test the difficulty can depend only on the difficulty of the content of the tasks and on the level of preparedness of the subjects themselves. While in a bad test the results begin to be noticeably influenced by the form of the tasks (especially if it is not adequate to the content), poor testing organization, if there are opportunities for cheating and information leakage. In this regard, the harmful practice of targeted training of students in a single state exam. The Russian Minister of Education in 1907, I. Tolstoy, called teachers who were involved in this type of work “trainers.” But teachers are least of all to blame. The faulty system of “egging” is to blame, which encourages such erroneous practices. As is control, so is education.

The formal component of the difficulty indicator arises when testing is considered as a process of confrontation between the test subject and the task offered to him. It is useful to consider the resulting outcome as the result of such a confrontation. With a simplified interpretation of each case of presentation of a task, two outcomes are often, but not necessarily, considered: victory of the subject with the correct solution of the task, for which he receives one point, or defeat, for which he is given zero points. The assessment of the result of the confrontation depends on the ratio of the level of knowledge of the test taker to the level of difficulty of the task, on the chosen unit of measurement of knowledge and on the pre-adopted rule (convention) - what is considered a “victory” of the test taker, and whether a draw is acceptable, if we speak in the language of sports.

The principle of increasing difficulty is used when presenting the content of many textbooks and manuals, especially in those academic disciplines that are built on a cumulative principle, which means: knowledge of subsequent elements of the course explicitly depends on knowledge of previous educational elements. This structure is inherent in textbooks on mathematics, logic, foreign languages, statistics, technical and many other sciences. In them, previously studied concepts are actively used in subsequent topics. Therefore, you need to study such disciplines only from the very beginning, and without gaps.

Most authors, especially foreign ones, do not distinguish between the concepts of “difficulty” and “complexity”. Many test developers are the same. However, there are works in which these concepts are defined differently. For example, A.N. Zakharov and A.M. Matyushkin note that the degree of difficulty of a learning task does not coincide with its complexity. Degree of difficulty educational material is characterized by the real (objective) intensity of the educational task and the form of its presentation, and the degree of difficulty always presupposes a correlation of the educational material to be mastered with previously acquired educational material and the intellectual capabilities of students (1).

L.N. Landa explained the difficulty of the educational task by the fact that students often do not know the operations that must be performed to find a solution. If a system of operations for solving a certain class of problems is called a solution method, then, in his opinion, the difficulty is associated with ignorance of the method, with ignorance of how to think in the solution process, how and in what sequence one should act with the conditions of the problem (2). The difficulties that arise are explained by the fact that the teacher often tries to give knowledge about the content of what is being studied and cares much less about how to think and reason (ibid.). This interpretation intersects with the idea that the complexity of a task is related to the number of operations that must be completed to achieve success. These definitions of difficulty and complexity are largely psychological; they are useful for psychological analysis content of test tasks.

For many years, the traditional measure of the difficulty of each task was the proportion of correct answers in a group of subjects, represented by the symbol pj, where the subscript j indicates the number of the task of interest (1, 2, etc.). For example, if the correct answers of subjects to the third task of the test are assessed by one point, and incorrect ones - by zero, then the value of the p3 indicator can be found from the elementary relation:

P3 = R3/N,
where R3 means the number of correct answers to a given task, and N is the total number of subjects in the group. General formula calculating the proportion of correct answers to any task (j) has the form

Pj = Rj/N
Indicator pj. has long been used as a measure of difficulty in so-called classical test theory (3). Later, the semantic inaccuracy contained in it was realized: after all, an increase in the value of pj does not indicate an increase in difficulty, but, on the contrary, an increase in ease, if such a word can be used. Therefore in recent years the opposite statistics began to be associated with the indicator of task difficulty - the proportion of incorrect answers (qj). This proportion is calculated from the ratio of the number of incorrect answers (Wj- from the English word Wrong - incorrect) to the number of subjects (N):

It is naturally assumed that pj + qj = 1. In classical test theory, for many years only empirical indicators of difficulty were considered. In new versions of psychological and pedagogical theories of tests, more attention has been paid to the nature of mental activity of students in the process of performing test tasks various forms (4).

The content of the test cannot be only easy, medium or difficult. Here the well-known idea about the dependence of the results of the method used is fully manifested. Easy test items only give students the appearance of knowledge because they test minimal knowledge. In this regard, it can be noted that the focus of the federal education management body on testing the minimum level of knowledge does not and cannot, even by definition, give an idea of ​​the real level of knowledge, i.e. provide the information that society and government authorities have long needed. It distorts test results and the selection of obviously difficult tasks, as a result of which the majority of schoolchildren end up with low scores. Focus on difficult tasks is often seen as a means of increasing motivation to learn. However, this remedy has mixed effects. Difficult assignments can push some people to study, while others can push them away from it. Such an orientation distorts the results and ultimately reduces the quality of the pedagogical measurement. If the test is built strictly from tasks of increasing difficulty, then this opens the way to the creation of one of the most interesting measurement scales - the L. Gutman scale.

When defining the test, it was already noted that all test tasks, I would like to emphasize, regardless of the content of topics, sections and academic disciplines, are arranged in order of increasing difficulty. The widespread, until recently, recommendation to include more tasks of average difficulty in the test is justified from the point of view of determining the reliability of measurement using the so-called formulas. classical test theory. The methods of assessing test reliability existing in this theory result in a decrease in reliability when easy and difficult tasks are included in the test. At the same time, being carried away by tasks of only moderate difficulty leads to a serious deformation of the content of the test: the latter loses the ability to normally reflect the content of the discipline being studied, in which there is always easy and difficult material. Thus, in the pursuit of abstract theoretically high reliability, the substantive validity of test results is lost. The desire to increase the validity of test results is often accompanied by a decrease in their accuracy. This phenomenon is theoretically known as the paradox of the American psychometrics theorist F. Lord

If a weak group of students is tested, it turns out that the difficult test items simply do not work because not a single student can answer them correctly. Such tasks are removed from further data processing. They are not offered in adaptive control systems. The content of the test for weak students will differ markedly from the content of the test for strong students. For the latter, on the contrary, easy tasks do not work, since all knowledgeable subjects answer easy tasks correctly. Thus, the content of a traditional test varies significantly depending on the level of preparedness of those groups of students whose knowledge the test is aimed at measuring.

Optimal mapping of the content of educational material into test tasks of the required level of difficulty requires the possibility of choosing an appropriate form. The content of the test is expressed in one of four main forms of tasks. These are: 1) tasks with the choice of one or more correct answers from those proposed; 2) open-form tasks, where the subject completes the answer himself, in the space provided for this; 3) tasks to establish compliance, and 4) tasks to establish correct sequence actions.

Literature
  1. Zakharov A.I., Matyushkin A.M. Problems adaptive systems training // Cybernetics and problems of training. - M.: Progress, 1970.- 389 p.
  2. Landa L.N. Algorithmization in training. M., Education, 1966
  3. Gulliksen H. Theory of Mental Tests. N - Y. Wiley. 1950 - 486 p. and many more etc.
  4. Tatsuoka, K.K. Item construction and psychometric models appropriate for constructed response. Princeton, N-J, 1993. - 56 pp; Frederiksen, N., Mislevy R. J., Bejar I. J. (Eds). Test theory for a new generation of tests. Lawrence Erlbaum Ass. Publ. 1993, Hillsdale, N-J, 404pp. etc.

We will define the complexity and difficulty of test tasks based on the definitions of the words “complex” and “difficult” from explanatory dictionary Russian language Ushakov.

So, “Complex - consisting of several parts or elements, formed through the connection, addition of parts.” Obviously, from this definition it becomes clear how the complexity of a test task can be determined. To do this, it is enough to analyze the number of knowledge elements covering this test task, and also to establish how deeply the test taker must know the subject area when answering in order to correctly answer this TK. In other words, difficulty can be related to the number of mental operations that must be performed to arrive at the correct answer. If the subject does not know some of these operations, then the task will be difficult for him; if he knows, it will be easier.

“Difficult - requiring a lot of mental effort, effort, difficult, tricky.” This definition provides one of the criteria for assessing the difficulty of a test task, namely, how much time the test taker will need so that the efforts aimed at finding the correct answer are not wasted. The concept of difficulty can be based on statistical estimates, for example, the fewer correct answers, the more difficult the task.

In the general case, complexity and difficulty are determined by the developer of the specification and indicate the subjective value of how difficult it will be for a test taker with a minimum level of training to solve a given test task in a certain time.

We can distinguish two types of complexity and difficulty - theoretical (a priori) and actual (a posteriori). A priori complexity and difficulty are determined by experts before testing. In the process of testing a bank of test tasks and calculating using certain methods, a posteriori complexity and difficulty are obtained.

It should be taken into account that when using a bank of test items during testing, the difficulty and complexity can be adapted in accordance with the audience, i.e. by obtaining statistical data on answers to tasks, it is possible to determine how easily the task is perceived by test takers, and what conclusions are given on it. In accordance with this, the actual (posterior) difficulty and complexity of the test task are established.

The most important purpose of the complexity and difficulty of technical specifications is the use of adaptive testing algorithms. In the absence of information about complexity and difficulty, it is impossible to adapt test tasks to the current level of knowledge of the test taker. Moreover, if the difficulty and complexity are incorrectly specified, the adaptive testing algorithms will not work correctly, and as a result, an assessment of the level of educational achievements will be given that has a large error.

As can be seen from the definition, the difficulty of a task can be calculated based on the time allocated to solve this task for a test taker with an average level of knowledge. For example, 30 seconds, or 1 minute and 50 seconds. Obviously, the difficulty in this case will depend on the complexity, since the more complex the technical assignment, the more time it will take to solve it, the more difficult it is. On the other hand, the more difficult the task, the more knowledge you need to have to find the answer, the more difficult the task becomes. Thus, difficulty and complexity obviously depend on each other. That is why in the theory of pedagogical measurements, as a rule, one concept is used - difficulty, relative to which we will consider in more detail how the difficulty of a test task can be determined.

  1. the number of concepts required to solve the technical specifications;
  2. the way of thinking that the technical specification is aimed at;
  3. TK form;
  4. depth of location of technical specifications in the specification;
  5. number of distractors and correct conclusions;
  6. level of significance.

A concept is further understood as a certain conclusion (formula, rule, axiom, etc.) that allows us to approximate the right decision TK. The more steps you need to complete to get the correct answer, the higher the difficulty, the more difficult the technical task is considered. The difficulty must, of course, be assessed in relation to the number of concepts involved in finding the right solution.

Let's give the following examples:

Easy test task

To solve such a task, you do not need to perform any actions other than remember the name of the famous Russian poet, known to everyone by school curriculum. An easy (simple) test task has one concept.

Medium difficulty task

Roots quadratic equation formula" src="http://hi-edu.ru/e-books/xbook688/files/17-1.gif" border="0" align="absmiddle" alt="(Answer: 1 and -1).

The task is difficult

An equation in which one of the roots is equal to the formula" src="http://hi-edu.ru/e-books/xbook688/files/17-3.gif" border="0" align="absmiddle" alt="( !LANG:)

marker">

  • space and time;
  • maximizing the positive and minimizing the negative;
  • induction-deduction;
  • cause-and-effect (analytical, positive, deductive) thinking;
  • dialectical-algorithmic (synthetic, negative, deductive) thinking;
  • holographic, or fully descriptive thinking;
  • vortex, or synergetic.
  • Each test taker has his own subjective world of perception. It is formed from what a person pays attention to most. Consequently, if the test taker is initially determined to perceive the testing procedure as something difficult, unattainable, and psychologically traumatic, then even the simplest test task can be perceived by him as a very difficult task. If initially the subjects are set up to believe that this is a fairly objective procedure for checking what they were able to teach and what still needs to be worked on, and they themselves would be interested in finding out what they learned well, then the attitude towards the procedure itself will be positive, therefore , and the difficulty of the task will be perceived more objectively by the test taker.

    Let's try to identify the ways of thinking in relation to which the difficulty of test tasks is formed.

    Technical task of simple difficulty level:

    • “identification” of some object or verification of “knowledge-familiarity”;
    • choosing one answer option from many using knowledge of just one concept;
    • exercise open type, aimed at identifying knowledge of the definition of a monosyllabic basic term.

    TK of medium difficulty level:

    • is aimed at applying previously acquired knowledge in typical situations (i.e., in those situations with which the subject is familiar) or at testing “knowledge of copy reproduction.” TK of this level of difficulty should include TK aimed at thinking associated with statements of a conjunctive or disjunctive type or TK with several concepts for selecting a subset of correct options from a given set of conclusions. In some cases, technical specifications of this level of difficulty may include technical specifications for compliance and order.

    Difficult technical assignments:

    • is aimed at applying acquired knowledge and skills in non-standard conditions (i.e., in conditions previously unfamiliar to the subject) or at testing “knowledge of skills and application.” Tasks of this level of difficulty include tasks that evoke conclusions formulated in the form of statements of an implicative type. Such tasks require the use of reasoning in the form of deductive, inductive inference and analogy, and some sequence of inferences (several concepts) are required to obtain a final answer.

    It should also be taken into account that the difficulty of technical assignment can be determined taking into account the form of the test judgment. Here the number of concepts must be taken into account, because if in order to select the correct answer to a task it is necessary to have some additional knowledge or solve the problem, then the difficulty of the task increases. The simplest form of test tasks is considered to be closed, when the test taker is asked to choose the correct option(s) from those proposed. The most difficult is considered to be open form, because in order to give the correct conclusion, you need to understand the meaning of the test judgment and select the desired definition from several existing ones. Sequence and correspondence forms most often refer to tasks with an average level of difficulty.

    Difficulty can be assigned based on whether the technical specification belongs to the “depth” level of the test specification. If the technical specification reveals the lowest level of the test specification hierarchy (for example, some “Concept”), then such a task will be easy. Belonging to the middle levels of the test specification hierarchy (for example, some “Topic” or “Subtopic”) increases the difficulty. Such technical tasks can be considered technical tasks with average difficulty. Finally, TK related to the top level, the root of the hierarchy tree (for example, to “Section”, “Chapter”) can be considered difficult. Consequently, when considering the difficulty taking into account the specification of the bank of test items, we will assume that the task for considering a more specific case has less difficulty than the task considered for a more general topic.

    Increasing the number of distractors and correct conclusions affects the level of difficulty of the task. How larger number distractors and correct conclusions, the longer the thought process requires a correct conclusion. The time to respond increases. The more difficult the task is considered.

    The difficulty of a task can be determined based on whether the TK belongs to the main and additional materials (the level of significance of the TK). Obviously, for each discipline there is a certain set of basic concepts, for example, those prescribed in the State Educational Standard, and there are concepts related to additional material, that is, material that is given only to the most successful groups of students. In addition, it is acceptable to give about 10% of the entire course material at the discretion of the teacher. Therefore, if the TK reveals a basic concept, then such a task can be considered simple, but if the TK belongs to additional material (i.e., in order to give the correct conclusion you need to have additional knowledge and operate with several concepts), then it can be considered difficult.

    When determining the difficulty of test materials, it is important to be able to compare the given factors for different cases and take into account all the features of the subject area.

    To simplify the task of drawing conclusions on the difficulty of test tasks, we will determine quantitative indicators of qualitative factors.

    For example, let’s take 2 tasks from the bank, for which the following difficulty levels are defined:

    • task No. 1 is theoretically interpreted as difficult;
    • task No. 2 - theoretically interpreted as medium difficulty.

    A sample of 10 groups of subjects who were tested in the same discipline is considered. As an example, let’s take the average score obtained as a result of testing by subjects of one group on the two proposed tasks.

    New page 1

    Groups Points for 1st TOR Points for 2nd TK
    № 1 32,1 45
    № 2 20 65
    № 3 55 34
    № 4 70 58
    № 5 64,2 40
    № 6 45 36
    № 7 46,1 67
    № 8 80 54
    № 9 72,3 44
    № 10 46,7 53

    To assess the difficulty of a test task, select various scales. Let's take the following scale, which we will a priori (theoretically) consider as a reference scale. Let W be the score scored by subjects on the i-th task during testing. Then the 5-point scale for the distribution of percentages (based on 100%) and grades may be as follows:

    formula" src="http://hi-edu.ru/e-books/xbook688/files/23.gif" border="0" align="absmiddle" alt=".gif" border="0" align="absmiddle" alt="

    where mark ">n is the number of evaluation coefficients on the selected scale (for example, “unsatisfactory,” “satisfactory,” “good,” “excellent”). Thus, the difference between the ratings is in this example equal to 15%.

    The scales for this sample would look like this:

    Based on the calculated standards, we will carry out a final calculation of points, as a result of which we obtain that:

    • “easy” - this task was for 30% of the subjects;
    • “medium difficulty” - for 50% of subjects;
    • “difficult” - for 20% of subjects.

    Let's calculate the average value obtained from the technical specifications by ten groups for the second task:

    which corresponds to the average difficulty on the reference (a priori) scale.

    Initially, this task was interpreted by the test writer as of medium difficulty. Consequently, the a priori value of the difficulty of the task in this case will coincide with the a posteriori value, which we will consider true for this sample. You can also calculate the posterior scale for the second task.