The average value of the interval series. Variation indicators: concept, types, formulas for calculations

According to the sample survey, depositors were grouped according to the size of their deposit in the city’s Sberbank:

Define:

1) scope of variation;

2) average deposit size;

3) average linear deviation;

4) dispersion;

5) standard deviation;

6) coefficient of variation of contributions.

Solution:

This distribution series contains open intervals. In such series, the value of the interval of the first group is conventionally assumed to be equal to the value of the interval of the next one, and the value of the interval of the last group is equal to the value of the interval of the previous one.

The value of the interval of the second group is equal to 200, therefore, the value of the first group is also equal to 200. The value of the interval of the penultimate group is equal to 200, which means that the last interval will also have a value of 200.

1) Let us define the range of variation as the difference between the largest and lowest value sign:

The range of variation in the deposit size is 1000 rubles.

2) The average size of the contribution will be determined using the weighted arithmetic average formula.

Let us first determine the discrete value of the attribute in each interval. To do this, using the simple arithmetic mean formula, we find the midpoints of the intervals.

The average value of the first interval will be:

the second - 500, etc.

Let's enter the calculation results in the table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, xxf
200-400 32 300 9600
400-600 56 500 28000
600-800 120 700 84000
800-1000 104 900 93600
1000-1200 88 1100 96800
Total 400 - 312000

The average deposit in the city's Sberbank will be 780 rubles:

3) The average linear deviation is the arithmetic mean of the absolute deviations of individual values ​​of a characteristic from the overall average:

The procedure for calculating the average linear deviation in the interval distribution series is as follows:

1. The weighted arithmetic mean is calculated, as shown in paragraph 2).

2. The absolute deviations from the average are determined:

3. The resulting deviations are multiplied by frequencies:

4. Find the sum of weighted deviations without taking into account the sign:

5. The sum of weighted deviations is divided by the sum of frequencies:

It is convenient to use the calculation data table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 480 15360
400-600 56 500 -280 280 15680
600-800 120 700 -80 80 9600
800-1000 104 900 120 120 12480
1000-1200 88 1100 320 320 28160
Total 400 - - - 81280

The average linear deviation of the size of the deposit of Sberbank clients is 203.2 rubles.

4) Dispersion is the arithmetic mean of the squared deviations of each attribute value from the arithmetic mean.

Calculation of variance in interval distribution series is carried out using the formula:

The procedure for calculating variance in this case is as follows:

1. Determine the weighted arithmetic mean, as shown in paragraph 2).

2. Find deviations from the average:

3. Square the deviation of each option from the average:

4. Multiply the squares of the deviations by the weights (frequencies):

5. Sum up the resulting products:

6. The resulting amount is divided by the sum of the weights (frequencies):

Let's put the calculations in a table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 230400 7372800
400-600 56 500 -280 78400 4390400
600-800 120 700 -80 6400 768000
800-1000 104 900 120 14400 1497600
1000-1200 88 1100 320 102400 9011200
Total 400 - - - 23040000

When statistically processing the results of the research itself various kinds the obtained values ​​are often grouped into a sequence of intervals. To calculate generalized collations of such sequences, it is sometimes necessary to calculate middle interval- “central option”. The methods for calculating it are quite primitive, but have some features arising from both the scale used for measurement and the nature of the grouping (open or closed gaps).

Instructions

1. If the interval is a section of a constant numerical sequence, then to find its middle, use ordinary mathematical methods for calculating the arithmetic mean. Minimum value interval(his preface) add with the maximum (end) and divide the total in half - this is one of the methods for calculating the arithmetic mean. Let's say this rule applies when we're talking about about age interval X. Let's say, mid-age interval in the range from 21 to 33 years the mark will be 27 years old because (21+33)/2=27.

2. Sometimes it is more convenient to use another method of calculating the arithmetic mean between the upper and lower limits interval. In this option, first determine the width of the range - subtract the minimum value from the maximum value. After this, divide the resulting value in half and add the total to the minimum value of the range. Let's say, if the lower limit corresponds to the value of 47.15, and the upper limit corresponds to 79.13, then the width of the range will be 79.13-47.15 = 31.98. Then the middle interval will be 63.14 because 47.15+(31.98/2) = 47.15+15.99 = 63.14.

3. If the interval is not part of an ordinary number sequence, then calculate it middle in accordance with the repeatability and dimension of the measuring scale used. Let's say, if we are talking about a historical period, then the middle interval will be a certain calendar date. So for interval from January 1, 2012 to January 31, 2012, the midpoint will be January 16, 2012.

4. In addition to ordinary (closed) intervals, statistical research methods can also operate with “open” ones. For such ranges, one of the boundaries is not defined. For example, the open period can be specified by the wording “from 50 years and older.” The middle in this case is determined by the method of analogies - if all other ranges of the sequence in question have identical widths, then it is assumed that this open interval has the same dimension. In the opposite case, you need to determine the dynamics of metamorphosis of the width of the gaps preceding the open one, and derive its conditional width based on the resulting tendency of metamorphosis.

Occasionally in everyday activities there may be a need to detect middle straight line segment. For example, if you need to make a pattern, a sketch of a product, or easily saw a wooden block into two equal parts. Geometry and a little bit of everyday ingenuity come to the rescue.

You will need

  • Compass, ruler; pin, pencil, thread

Instructions

1. Use ordinary tools prepared for measuring length. This is the easiest method to find middle segment. Measure the length of the segment with a ruler or tape measure, divide the resulting value in half and measure the resulting total from one end of the segment. You will get a point corresponding to the middle of the segment.

2. There is a more accurate method for finding the midpoint of a segment, learned from a school geometry course. To do this, take a compass and a ruler, and the ruler can be replaced by any object of suitable length with a straight side.

3. Set the distance between the legs of the compass so that it is equal to the length of the segment or larger than half of the segment. After this, place the compass needle at one end of the segment and draw a semicircle so that it intersects the segment. Move the needle to the other end of the segment and, without changing the span of the legs of the compass, draw the second semicircle correctly in the same way.

4. You have received two points of intersection of semicircles on both sides of the segment, middle which we want to discover. Combine these two points using a ruler or a flat block. The connecting line will pass exactly in the middle of the segment.

5. If you don’t have a compass at hand or the length of the segment significantly exceeds the possible span of its legs, you can use a simple device from improvised means. It can be made from an ordinary pin, thread and pencil. Tie the ends of the thread to a pin and a pencil, and the length of the thread should slightly exceed the length of the segment. With such an improvised substitute for a compass, all that remains is to follow the steps described above.

Video on the topic

Helpful advice
You can quite accurately locate the middle of a board or block using an ordinary thread or cord. To do this, cut the thread so that it matches the length of the board or bar. All that remains is to fold the thread in half and cut it into two equal parts. Attach one end of the resulting measurement to the end of the object being measured, and the 2nd end will correspond to its middle.

When calculating the arithmetic mean for an interval variation series, first determine the mean for each interval as the half-sum of the upper and lower limits, and then the mean of the entire series. In the case of open intervals, the value of the lower or upper interval is determined by the size of the intervals adjacent to them.

Example 3 . Define average age evening students.

Age in years

Number of students

Average value of the interval

Product of the midpoint of the interval (age) and the number of students

up to 20

(18 + 20) / 2 =19 18 in this case, the boundary of the lower interval. Calculated as 20 - (22-20)

20 - 22

(20 + 22) / 2 = 21

22 - 26

(22 + 26) / 2 = 24

26 - 30

(26 + 30) / 2 = 28

30 or more

(30 + 34) / 2 = 32

Total

Averages calculated from interval series are approximate.

  1. Structural averages

In addition to power averages, in statistics, structural averages are used: mode and median for the relative characterization of the value of a varying characteristic and the characteristics of distribution series.

Fashion- This is the most common variant of the series. Fashion is used, for example, in determining the size of clothes and shoes that are most in demand among buyers.

The mode for a discrete series is the one with the highest frequency.

When calculating the mode for an interval variation series, you must:

    first determine the modal interval (by maximum frequency),

    then - the value of the modal value of the attribute according to the formula:

Determining the mode graphically: The mode is determined by the histogram of the distribution. For this

the right vertex of the modal rectangle is connected to the upper right corner of the previous rectangle, and the left vertex of the modal rectangle is connected to the upper left corner of the subsequent rectangle. The abscissa of the intersection point of these lines will be the distribution mode.

Median

Median- this is the value of the characteristic that divides the variation series into two equal parts.

Median for a discrete series.

For determining medians in a discrete serieswith odd number of observation units first median number using the formula: , and then determine which value of the option has an accumulated frequency equal to the median number.

If the series contains even number of elements, then the median will be equal to the average of the two characteristic values ​​located in the middle. The number of the first of these signs is determined by the formula: , for the second - . = n (number of elements in a row).

Median for an interval series

When calculating the median for interval variation series First, the median interval within which the median lies is determined.

For this:

Example . Find the mode and median for the interval series.

Age groups

Number of students

Sum of accumulated frequencies ΣS

25 - 30

1054

2272

45 years or more

Solution :

    Let's define fashion

In this example, the modal interval is within the age group of 25-30 years, since this interval has the highest frequency (1054).

Let's calculate the magnitude of the mode:

This means that the modal age of students is 27 years.

    Let's determine the median.

The median interval is in age group 25-30 years, since within this interval there is an option that divides the population into two equal parts (Σf i /2 = 3462/2 = 1731). Next, we substitute the necessary numerical data into the formula and get the median value:

This means that one half of the students are under 27.4 years old, and the other half are over 27.4 years old.

Graphically, the median is determined by the cumulate. To determine it, the height of the largest ordinate, which corresponds to the sum of all frequencies, is divided in half. Through the received point

draw a straight line parallel to the abscissa axis until it intersects with the cumulate. The abscissa of the intersection point is the median.

Often in statistics, when analyzing a phenomenon or process, it is necessary to take into account not only information about the average levels of the indicators being studied, but also scatter or variation in the values ​​of individual units , which is important characteristic the population being studied.

The most subject to variation are stock prices, supply and demand volumes, interest rates V different periods time and in different places.

The main indicators characterizing the variation , are range, dispersion, standard deviation and coefficient of variation.

Range of variation represents the difference between the maximum and minimum values ​​of the characteristic: R = Xmax – Xmin. The disadvantage of this indicator is that it evaluates only the limits of variation of a trait and does not reflect its variability within these boundaries.

Dispersion lacks this shortcoming. It is calculated as the average square of deviations of the attribute values ​​from their average size:

A simplified way to calculate variance carried out using the following formulas (simple and weighted):

Examples of application of these formulas are presented in tasks 1 and 2.

A widely used indicator in practice is standard deviation :

The standard deviation is defined as Square root from the variance and has the same dimension as the trait being studied.

The considered indicators allow us to obtain the absolute value of the variation, i.e. evaluate it in units of measurement of the characteristic being studied. Unlike them, the coefficient of variation measures variability in relative terms - relative to the average level, which in many cases is preferable.

Formula for calculating the coefficient of variation.

Examples of solving problems on the topic “Indicators of variation in statistics”

Problem 1 . When studying the influence of advertising on the size of the average monthly deposit in banks in the region, 2 banks were examined. The following results were obtained:

Define:
1) for each bank: a) average deposit per month; b) contribution dispersion;
2) the average monthly deposit for two banks together;
3) Deposit variance for 2 banks, depending on advertising;
4) Deposit variance for 2 banks, depending on all factors except advertising;
5) Total variance using the addition rule;
6) Coefficient of determination;
7) Correlation relationship.

Solution

1) Let's create a calculation table for a bank with advertising . To determine the average monthly deposit, we will find the midpoints of the intervals. In this case, the value of the open interval (the first) is conditionally equated to the value of the interval adjacent to it (the second).

We will find the average deposit size using the weighted arithmetic average formula:

29,000/50 = 580 rub.

We find the variance of the contribution using the formula:

23 400/50 = 468

Similar actions we will produce for a bank without advertising :

2) Let’s find the average deposit size for the two banks together. Хср =(580×50+542.8×50)/100 = 561.4 rub.

3) We will find the variance of the deposit for two banks, depending on advertising, using the formula: σ 2 =pq (formula for the variance of an alternative attribute). Here p=0.5 is the proportion of factors dependent on advertising; q=1-0.5, then σ 2 =0.5*0.5=0.25.

4) Since the share of other factors is 0.5, then the variance of the deposit for two banks, depending on all factors except advertising, is also 0.25.

5) Determine the total variance using the addition rule.

= (468*50+636,16*50)/100=552,08

= [(580-561,4)250+(542,8-561,4)250] / 100= 34 596/ 100=345,96

σ 2 = σ 2 fact + σ 2 rest = 552.08+345.96 = 898.04

6) Determination coefficient η 2 = σ 2 fact / σ 2 = 345.96/898.04 = 0.39 = 39% - the size of the contribution depends on advertising by 39%.

7) Empirical correlation ratio η = √η 2 = √0.39 = 0.62 – the relationship is quite close.

Problem 2 . There is a grouping of enterprises according to the size of marketable products:

Determine: 1) the dispersion of the value of marketable products; 2) standard deviation; 3) coefficient of variation.

Solution

1) By condition, an interval distribution series is presented. It must be expressed discretely, that is, find the middle of the interval (x"). In groups of closed intervals, we find the middle using a simple arithmetic mean. In groups with an upper limit - as the difference between this upper limit and half the size of the next interval (200-(400 -200):2=100).

In groups with a lower limit - the sum of this lower limit and half the size of the previous interval (800+(800-600):2=900).

We calculate the average value of marketable products using the formula:

Хср = k×((Σ((x"-a):k)×f):Σf)+a. Here a=500 is the size of the option at the highest frequency, k=600-400=200 is the size of the interval at the highest frequency Let's put the result in the table:

So, the average value of commercial output for the period under study is generally equal to Хср = (-5:37)×200+500=472.97 thousand rubles.

2) We find the variance using the following formula:

σ 2 = (33/37)*2002-(472.97-500)2 = 35,675.67-730.62 = 34,945.05

3) standard deviation: σ = ±√σ 2 = ±√34,945.05 ≈ ±186.94 thousand rubles.

4) coefficient of variation: V = (σ /Хср)*100 = (186.94 / 472.97)*100 = 39.52%

Example : It is required to determine the average age of the student correspondence form training according to the data specified in the following table:

Age of students, years ( X)

Number of students, people ( f)

average value of the interval (x",xcentral)

xi*fi

26 and older

Total:

To calculate the average in interval series, first determine the average value of the interval as the half-sum of the upper and lower limits, and then calculate the average using the arithmetic weighted average formula.

Above is an example with equal intervals, with the 1st and last being open.

Answer: The average student age is 22.6 years, or approximately 23 years.

Harmonic mean has a more complex structure than the arithmetic mean. Used in cases where statistical information does not contain frequencies for individual values ​​of the attribute, and is represented by the product of the attribute value by frequency . The harmonic mean as a type of power mean looks like this:

Depending on the form of presentation of the source data, the harmonic mean can be calculated as simple or weighted. If the source data is not grouped, then average harmonic simple :

It is used in cases of determining, for example, the average costs of labor, materials, etc. per unit of production for several enterprises.

When working with grouped data, use harmonic mean weighted:

Geometric meanapplies in cases where when the total volume of the averaged feature is a multiplicative quantity,those. is determined not by summing, but by multiplying the individual values ​​of the characteristic.

Shape of geometric weighted mean in practical calculations not applicable .

Mean square used in cases where, when replacing individual values ​​of a characteristic with an average value, it is necessary to keep the sum of squares of the original values ​​unchanged .

home scope of its use – measurement of the degree of fluctuation of individual values ​​of a characteristic relative to the arithmetic mean(standard deviation). In addition, the mean square used in cases where it is necessary to calculate the average value a characteristic expressed in square or cubic units of measurement (when calculating the average size of square areas, average diameters pipes, trunks, etc.).

The root mean square is calculated in two forms:

All power means differ from each other in the values ​​of the exponent. Wherein, the higher the exponent, the morequantitative value of the average:

This property of power averages is called property of majorance of averages.