Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

chapter three common descriptive statistics, Summaries of Statistics

The range, standard deviation, variance, standard error of the mean, and confidence intervals provide information about the “dispersion” or variability of the ...

Typology: Summaries

2022/2023

Uploaded on 02/28/2023

amritay
amritay 🇺🇸

4.7

(14)

12 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download chapter three common descriptive statistics and more Summaries Statistics in PDF only on Docsity! COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance, standard error of the mean, and confidence intervals. These statistics are used to summarize data and provide information about the sample from which the data were drawn and the accuracy with which the sample represents the population of interest. The mean, median, and mode are measurements of the “central tendency” of the data. The range, standard deviation, variance, standard error of the mean, and confidence intervals provide information about the “dispersion” or variability of the data about the measurements of central tendency. MEASUREMENTS OF CENTRAL TENDENCY The appropriateness of using the mean, median, or mode in data analysis is dependent upon the nature of the data set and its distribution (normal vs non-normal). The mean (denoted by x) is calculated by dividing the sum of the individual data points (where Σ equals “sum of”) by the number of observations (denoted by n). It is the arithmetic average of the observations and is used to describe the center of a data set. mean=x= x n Σ The mean is commonly used to describe numerical data that is normally distributed. It is very sensitive to extreme values in the data set. For example, the mean of the data set {1,2,3,4,5} is 15/5 or 3. If the number 20 is substituted for the 5, the data set becomes {1,2,3,4,20} and the mean is 30/5 or 6. Whereas a mean of 3 accurately describes the “center” of the first data set, a mean of 6 does not accurately describe the distribution of the second data set. Thus, the mean is subject to extreme values or “outliers” and may not accurately represent the true center of the data if such outlying values are present. The mean is only appropriate if the data are normally distributed as is the case in the first data set. The median is another method of describing the center of a data set. It is the middle value of the data if the number of observations, n, is odd, or the average of the two middle values if n is even. By definition, half of the data points reside above the median and half reside below the median. For example, the median for each of the above data sets is 3 despite the outlying value of 20 in the second data set. The median is therefore useful for describing the center of a data set that is non-normally distributed as is the case in the second data set. It is also commonly used with ordinal data that is non-numerical. The mode is the value which occurs most frequently in the data. There may be one or more modes for each data set and this makes the mode a useful method for describing a population that is bimodal. In the data set {2,3,4,4,5,6,7,7,8,9}, for example, there are two modes, 4 and 7. Whereas a data set may have one or more modes, there can only be one mean and one median. The mean, median, and mode can be used to evaluate the symmetry of a data distribution. This is essential to choosing the appropriate statistical test. If the mean and median are equal, the data is usually symmetrical or normally distributed and a test for normally distributed data should be used. If the mean and median differ markedly, the data are likely skewed and a test for non-normally distributed data is appropriate. MEASUREMENTS OF VARIABILITY As we have seen, the mean, median, and mode are used to describe the central tendency of the data. Used alone, however, they do not adequately describe a data set. We also need a way of accurately describing the “dispersion” or variability of the data about the measurements of central tendency. Remember that it is this variability that mandates the use of statistical analysis. One method of describing this variability is the range which is defined as the difference between the smallest and largest values in the data set. It is frequently given as the minimum and maximum values and is used to demonstrate the presence of extreme values or “outliers” which would tend to skew the mean and median in one direction or another. 14 / A PRACTICAL GUIDE TO BIOSTATISTICS Perhaps the most commonly used method to describe the variability of a data set is the standard deviation. It is the cornerstone behind most of the commonly used statistical tests. The standard deviation is an estimate of the average distance of the values from their mean. Assuming the data is normally distributed (i.e., assumes a bell-shaped curve with half of the data lying on either side of the mean), approximately 68% of the data will lie within 1 standard deviation, approximately 95% within 2 standard deviations, and approximately 99% within 3 standard deviations. x-2sd x-1sd x x+1sd x+2sd 68% 95% 99% Figure 3-1: The Normal Distribution The standard deviation is easily calculated by virtually any computer, but the equation is included below as we will use it repeatedly in describing the commonly used statistical tests for data analysis. The variance is the square of the standard deviation. ( ) ( ) standard deviation = sd = x - x n 1 2Σ − where x = each data point, x = mean, and n = number of observations Assume we wish to calculate the mean and standard deviation for a series of serum sodium measurements to determine their central tendency and variability. Such calculations are the basis for the “normal laboratory ranges” which we use everyday. The normal range for most laboratory tests is defined as the mean ± 2 standard deviations, which encompasses the test results for 95% of the population. Although easily calculated with a computer, creation of a table such as the one below illustrates the steps involved in calculating the standard deviation. Serum Na (x) (x-x) (x-x)2 138 -2 4 137 -3 9 143 3 9 141 1 1 138 -2 4 142 2 4 136 -4 16 140 0 0 145 5 25 143 3 9 mean = x = 140 sum = 81 standard deviation = sd = 81 9 9 3= = In describing this data, we would state that the mean is 140 mEq/L with a standard deviation of 3 and variance of 9, and that 95% of the values are within 6 mEq of the mean (2 standard deviations). This is similar to the normal clinical range for serum sodium (135-145 mEq/L). COMMON DESCRIPTIVE STATISTICS / 17 alone. Further, confidence intervals provide information on important effects that, although not statistically significant, may be useful clinically. They are also helpful in determining whether the sample size was large enough to detect a significant difference to begin with (i.e., whether the study had sufficient statistical power). COEFFICIENT OF VARIATION, BIAS, AND PRECISION Several descriptive statistics exist for the special situation in which we wish to compare two laboratory tests or measurements to determine whether they accurately measure the same physiologic parameter. The coefficient of variation is one method that is frequently used to describe the reliability of a laboratory test or measurement. It is calculated by dividing the standard deviation by the mean and multiplying by 100%. coefficient of variation = cov = standard deviation mean 100%× As it has no units, the coefficient of variation can be used to compare two tests which are measured on different scales. It is commonly used to describe the reliability and measurement error associated with electronic instruments and monitors. For example, a cardiac output computer might be described as having a coefficient of variation of 5% which means that the measurement error associated with sequential cardiac output measurements will be less than or equal to 5%. Two other statistics which are used to gauge measurement accuracy are bias and precision. Note that this bias is different from statistical bias which was discussed in the previous chapter. The bias of two methods is simply the difference between their measurements. For example, if a cardiac output by thermodilution is 3.4 L/min and that by radionuclide ventriculography is 3.2 L/min, the bias involved in these two measurements is 3.4-3.2 or 0.2 L/min. Precision is defined as the standard deviation of the bias (the difference between measurements) and is a measure of the variability of the difference. Bias and precision are both necessary to evaluate the agreement between two methods of measurement. DESCRIBING NOMINAL DATA Proportions, percentages, ratios, and rates are used to describe nominal data. A proportion is defined as the number of observations possessing a given characteristic of interest divided by the total number of observations. If we are evaluating successful extubation of patients from mechanical ventilation, the proportion of patients successfully extubated is: successfully extubated patients successfully extubated + reintubated patients A percentage is defined as a proportion multiplied by 100%. A ratio compares the incidence of an event or disease in one group with that in another group. The ratio of successfully extubated to reintubated patients is therefore: successfully extubated patients reintubated patients The odds for an event (see Chapter One) is an example of a ratio. A rate is defined as a proportion multiplied by a particular “base” and is expressed per unit time. If, for example, it is known that the proportion of male patients with angina who have an acute myocardial infarction is 0.05 (i.e., 5 out of every 100 patients) per year, the incidence rate for myocardial infarction in male patients with angina will be 500 per 100,000 patients per year. Rates will be discussed further in the next chapter. 18 / A PRACTICAL GUIDE TO BIOSTATISTICS SUGGESTED READING 1. Emerson JD, Colditz GA. Use of statistical analysis in the New England Journal of Medicine. In: Bailar JC, Mosteller F (Eds.). Medical uses of statistics (2nd Ed). Boston: NEJM Books, 1992:45-57. 2. Wassertheil-Smoller S. Biostatistics and epidemiology: a primer for health professionals. New York: Springer-Verlag, 1990:41-63. 3. Dawson-Saunders B, Trapp RG. Basic and clinical biostatistics (2nd Ed). Norwalk: Appleton and Lange, 1994:41-63. 4. Moses LE. Statistical concepts fundamental to investigations. In: Bailar JC, Mosteller F (Eds.). Medical uses of statistics (2nd Ed). Boston: NEJM Books, 1992:5-26. 5. O’Brien PC, Shampo MA. Statistics for clinicians: introduction. Mayo Clin Proc 56:45-46, 1981. 6. O’Brien PC, Shampo MA. Statistics for clinicians: 1. descriptive statistics. Mayo Clin Proc 56:47-49, 1981. 7. O’Brien PC, Shampo MA. Statistics for clinicians: 4. estimation from samples. Mayo Clin Proc 56:274- 276, 1981. 8. O’Brien PC, Shampo MA. Statistics for clinicians: 9. evaluating a new diagnostic procedure. Mayo Clin Proc 56:573-575, 1981. 9. Fletcher RH, Fletcher SW, Wagner EH. Clinical epidemiology - the essentials. Baltimore: Williams and Wilkins, 1982:27-31. 10. Bartko JJ. Rationale for reporting standard deviations rather than standard errors of the mean [Editorial]. Am J Psychiatry 1985: 1060. 11. Altman DG. Statistics and ethics in medical research: V - analyzing data. Brit Med J 1980:1473-1475. 12. Gardner MJ, Altman DG. Confidence intervals rather than p values: estimation rather than hypothesis testing. Brit Med J 1986: 746-750.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved