Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Tools for Tropical Animal Feeding Experiments, Lecture notes of Designs and Groups

Agricultural StatisticsAnimal NutritionExperimental Design in Animal Science

An overview of designing and analyzing experiments related to tropical animal feeding research. It covers the collection and management of experimental data, the use of parametric statistics such as regression analysis and analysis of variance, and the importance of considering interactions between factors. It also discusses the use of chi-squared analysis for experiments involving yes/no data or counts, and the limitations of this approach. examples and formulas for calculating the required sample size for detecting significant differences.

What you will learn

  • What is the importance of considering interactions between factors in tropical animal feeding experiments?
  • How is data collected and managed in tropical animal feeding experiments?
  • What statistical tools are used to analyze experimental data in tropical animal feeding research?

Typology: Lecture notes

2021/2022

Uploaded on 07/05/2022

paul.kc
paul.kc 🇦🇺

4.7

(64)

1K documents

1 / 26

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Tools for Tropical Animal Feeding Experiments and more Lecture notes Designs and Groups in PDF only on Docsity! Tropical animal feeding: a manual for research workers 155 Chapter 8 Design and analysis of experiments This chapter was contributed by Andrew Speedy, University of Oxford, UK. The objective is to assist researchers to compile and analyze data. To this end, use is made of one of the simpler statistics programs (MINITAB, Minitab Inc., Philadelphia, USA) as the model. More powerful statistical packages may be required for studies in plant and animal genetics and agricultural economics. But, in line with the general philosophy of this manual, it is considered that simplicity and ease of understanding are the principal attributes required of a computer program and, in this respect, Minitab has much to commend it and is therefore selected as the example. But obviously there are various other - often more sophisticated - statistical software packages available on the market. THE OVERALL APPROACH The objectives The most important aspect of conducting good research is the definition of the objective(s). No matter how good the design of the experiment, how sophisticated the methods used or how clever is the statistical treatment of results, the work is of little value if it does not answer a question of scientific importance and practical relevance. Studying the literature, thinking about the questions and discussing them with colleagues, and especially the farmers who will ultimately apply the technology, is the most important part of planning a research programme. Research must be oriented to solving farmers' problems. The methodology Once the objectives are clear, the methodology can be considered. This should be planned to provide the data to answer the questions raised and 156 Design and analysis of experiments to satisfy the needs of the researcher and also others who may wish to adopt the findings and apply them in other situations. It must also be possible within the confines of the resources available (land, animals, buildings, pens, laboratory equipment, etc.). Some of these problems (such as numbers of replicates and land resources) may be overcome by conducting the research 'on-farm', which also has important implications for short-cutting the process of research application or technology transfer. Analysis of the data When the data are finally collected, they must be analyzed in a way that will provide meaningful conclusions. Planning the analysis of the data is part of the initial process of setting up the research programme. Knowing how the data can be correctly analyzed and interpreted will affect how the data are collected and the numbers of observations required. It is often valuable to produce a 'dummy' set of data, calculated on the computer, to test the statistical method. The following section describes the rules and basic methods for planning, analyzing and interpreting data relating to feed resources and their use by animals. PLANNING, ANALYZING AND INTERPRETING DATA Statistical programs There are many computer packages available for statistical analysis. Throughout this chapter, examples will be given from data analyzed using the package MINITAB which is available for IBM-compatible, Apple Mackintosh and also mainframe computers. The necessary inputs and outputs for this package will be shown. It is taken as an example of a simple yet accurate system for the research worker as well as the student. Management of experimental data Collection of data on a daily or weekly basis will yield results that must be used to calculate the variables required for analysis: average daily gain (kg) for each animal, average daily food intake, etc. Such initial calculations (although they may be managed with MINITAB) are best Tropical animal feeding: a manual for research workers 159 The actual size of the experiment will vary with both the number of replicates and the number of treatments. Fewer replicates are needed in factorial experiments where the overall total is greater. Again, as a general rule, ensure that the design has at least 15 degrees of freedom for error (residual degrees of freedom). Blocks Blocking is a way to deal with known sources of variation which may be sites on a gradient of fertility down a slope, different litters of pigs, different farms, etc. Each block contains all treatments with replicates. The analysis enables the variable 'block' to be measured and removed from the error variation, eg: GLM output = block + treatment It is good to block experiments wherever a known source of variation occurs. There is little point in including an interaction between treatment and block because this will be difficult to interpret even if it is significant. Covariates Inclusion of covariates in an analysis is another way of taking out known variation. Covariates are continuous variables such as initial weight, initial milk yield, etc. Their use is vital in experiments involving dairy production where it is normal for animals of different ages, stages of lactation and potential yield to be used. The command is: GLM yield = initial + treatment; covariate initial. ANALYSIS OF CONTINUOUS DATA Experiments with two treatments The simplest experiment compares the results of two treatments. We may wish to compare two or more populations (breeds of animal or varieties of plant) and take samples from each. Our samples must be taken at random and must represent the populations and their variation. Experiments often involve applying some action or actions to a sample of the population to measure its effect. The sample of the 160 Design and analysis of experiments population is divided and the treatment(s) applied to part(s) of the sample. If we want to know how the treated sample differs from the untreated one, we need to keep the untreated ones as a 'control'. Treatments must be applied at random. When the data have been collected, we want to analyze the results to compare the two (or more) samples or the treated groups with the control. This is done by calculating the variance and partitioning it between that due to treatment and the natural ('residual' or 'error') variance. The process is called an 'analysis of variance'. An example involves two treatments (or a treated group and control) with 10 replicates of each. The means of the two treatments are 10 and 11. MTB > PRINT C1-C2 ROW C1 C2 1 11.5 9.7 2 11.6 11.6 3 10.5 10.9 4 10.1 10.8 5 10.2 9.6 6 9.0 10.7 7 9.1 12.5 8 8.5 11.2 9 10.2 12.6 10 9.3 11.5 MTB > TWOSAMPLE C2 C1; SUBC> POOLED. TWOSAMPLE T FOR C2 VS C1 N MEAN STDEV SE MEAN C2 10 11.11 1.01 0.32 C1 10 10.00 1.04 0.33 95 PCT CI FOR MU C2 - MU C1: (0.15, 2.07) TTEST MU C2 = MU C1 (VS NE): T= 2.43 P=0.026 DF= 18 POOLED STDEV = 1.02 Explanation: The data consist of two sets of values (two treatments) stored in C1 and C2. These are listed with the MINITAB command 'PRINT'. Then the data are compared using a 't-test' with the command 'TWOSAMPLE'. The printout shows the means, standard deviations and standard error of the means and calculated t value. The probability value of 0.026 is less Tropical animal feeding: a manual for research workers 161 than 0.05 and therefore the null hypothesis that C2 is NOT different to C1 is rejected, i.e. C2 is significantly greater than C1 (P<0.05). Relationships between variables In some types of data, the objective is to test the relationship between two variables and to produce an equation which describes this relationship. This is frequently done by regression analysis. In the example here, the alternative 'GLM' command (Generalized Linear Model) is used to perform the regression analysis. The example is to test the relationship between OIL and ENERGY in feed samples: MTB > brief 3 MTB > glm Energy=Oil; SUBC> covariate Oil. Analysis of Variance for Energy Source DF Seq SS Adj SS Adj MS F P Oil 1 0.26167 0.26167 0.26167 7.94 0.011 Error 18 0.59334 0.59334 0.03296 Total 19 0.85501 Term Coeff Sdev t-value P Constant 10.9324 0.9046 12.09 0.000 Oil 0.5116 0.1816 2.82 0.011 Unusual Observations for Energy Obs. Energy Fit Sdev.Fit Residual Sd.Resid 5 13.9461 13.4981 0.0412 0.4480 2.53R 17 13.6340 13.7359 0.1000 -0.1020 -0.67 X R denotes an obs. with a large Sd.Resid. X denotes an obs. whose X value gives it large influence. Explanation: The two variables are stored in columns C1-C2 and labelled Energy and Oil. The GLM model to test is C2=C1 and the subcommand COVARIATE C1 (abbreviated to 'cova C1') tells MINITAB to treat C! as a continuous variable and not a discrete series of treatments. The probability value (P=0.011) tells us that there IS a significant relationship between Energy and Oil (P<0.05) and the equation is given below. Badly fitting data are also indicated. The constant and coefficient of the regression equation are given and the equation can be derived as: 164 Design and analysis of experiments Unusual Observations for LWG Obs. LWG Fit Stdev.Fit Residual Sd.Resid 27 627.328 517.695 19.620 109.633 2.24R R denotes an obs. with a large sd. resid. Means for LWG treat Mean Stdev 1 546.7 15.20 2 604.6 15.20 3 607.7 15.20 In this example, both block and treatment are significant (P<0.05). There are significant differences between treatment 1 and both the other two treatments but not between T2 and T3. Latin square design A Latin Square is a special sort of block design with symmetrical arrangement of treatments in two directions. It is particularly useful in experiments where numbers are restricted by facilities. Take an animal experiment to measure protein degradability by the nylon bag technique, using 4 fistulated animals. Four feeds (A, B, C, D) are studied and each feed is incubated in the rumen of each animal in turn. The design looks as follows: Animal Period 1 2 3 4 1 B A D C 2 A D C B 3 C B A D 4 D C B A The analysis would appear as follows: Tropical animal feeding: a manual for research workers 165 MTB> table c1 c2; SUBC> means c4. ROWS: Row COLUMNS: Column 1 2 3 4 ALL 1 42.800 42.900 69.100 49.400 51.050 2 47.400 53.500 47.100 56.800 51.200 3 52.300 61.200 40.500 51.800 51.400 4 61.300 51.200 54.000 39.700 51.550 ALL 50.950 52.200 52.625 49.425 51.300 CELL CONTENTS - C4:MEAN MTB> table c3; SUBC> stats c4. ROWS: Feed Dg Dg Dg N MEAN STD DEV 1 4 42.575 3.504 2 4 48.525 4.176 3 4 52.300 2.061 4 4 62.100 5.117 ALL 16 51.300 8.138 MTB> glm dg = row column feed Factor Levels Values Row 4 1 2 3 4 Column 4 1 2 3 4 Feed 4 1 2 3 4 Analysis of Variance for Dg Source DF Seq SS Adj SS Adj MS F P Row 3 0.58 0.58 0.19 0.01 0.999 Column 3 24.81 24.81 8.27 0.32 0.811 Feed 3 812.88 812.88 270.96 10.49 0.008 Error 6 155.04 155.04 25.84 Total 15 993.32 Explanation: The analysis shows a significant effect of feed (P<0.01); the table of means is given at the top of this page, together with their standard deviations. In general, a 4x4 (or better, a 6x6) latin square is suitable for this type of experiment. The design can be chosen at random from lists of latin square designs in statistical textbooks. 166 Design and analysis of experiments Experiments with interactions When there are two factors in an experiment, we require to know not only whether there is an effect of each factor alone but also whether there is an INTERACTION between them (one factor affects the response of the animal to the other). This can be analyzed using GLM by specifying the terms: MTB> GLM Y = A B A*B Alternatively, the above expression can be abbreviated to: MTB> GLM Y = A ! B The following example refers to an experiment with three energy treatments and three protein treatments. MTB > table 'energy' 'protein'; SUBC> stats 'LWG'. ROWS: Energy COLUMNS: Protein 1 2 3 ALL 1 3 3 3 9 513.11 636.95 650.69 600.25 50.23 63.41 35.10 79.06 2 3 3 3 9 640.75 650.09 711.54 667.46 65.69 18.29 22.36 48.96 3 3 3 3 9 649.58 737.08 627.28 671.32 40.65 59.81 54.66 67.68 ALL 9 9 27 601.15 674.71 663.17 646.34 80.60 64.84 50.98 71.94 CELL CONTENTS -- LWG:N MEAN STD DEV MTB > glm LWG = energy ! protein; SUBC> means energy ! protein. Factor Levels Values Energy 3 1 2 3 Protein 3 1 2 3 Tropical animal feeding: a manual for research workers 169 response. By including more levels of the treatment we can test the linear, quadratic and cubic effects. That is, we can see if the response is curved. We can also find the equation which describes the curve. To do this, we treat the factors as CONTINUOUS variables. As a general rule, it is better to include more levels of treatments in this type of experiment as we obtain more information about the response. We make better use of the available experimental material and, provided we have a reasonable number, we lose very little in precision (only 1 degree of freedom for each level). The following is an experiment with 5 levels of energy and 5 levels of protein. We can test for the response to both and also for the interaction between energy and protein. MTB > table 'Energy' 'Protein'; SUBC> stats 'LWG'. ROWS: Energy COLUMNS: Protein 1 2 3 4 5 ALL 1 3 3 3 3 3 15 522.31 522.62 565.98 535.65 596.00 548.51 20.46 32.52 18.15 29.77 48.59 39.96 2 3 3 3 3 3 15 555.63 564.93 631.32 638.35 643.72 606.79 28.50 23.54 19.19 29.58 18.51 44.64 3 3 3 3 3 3 15 558.41 620.30 638.36 646.82 641.30 621.04 5.65 25.63 41.76 8.58 27.69 40.03 4 3 3 3 3 3 15 621.73 664.42 673.26 666.18 667.57 658.63 30.48 34.30 36.86 26.19 36.15 33.97 5 3 3 3 3 3 15 616.58 667.53 690.55 698.05 691.20 672.78 17.21 4.94 14.17 9.19 36.06 35.10 ALL 15 15 15 15 15 75 574.93 607.96 639.89 637.01 647.96 621.55 43.91 62.68 50.47 59.79 44.08 58.05 CELL CONTENTS -- LWG:N MEAN STD DEV MTB > glm LWG=Energy Protein Energy*Energy Protein*Protein Energy*Protein; SUBC> cova Energy Protein; SUBC> test Energy Protein Energy*Energy Protein*Protein Energy*Protein /error. 170 Design and analysis of experiments Analysis of Variance for LWG Source DF Seq SS Adj SS Adj MS F P Energy 1 135343 18105 18105 22.13 0.000 Protein 1 45991 14473 14473 17.69 0.000 Energy* Energy 1 4514 4514 4514 5.52 0.022 Protein* Protein 1 6682 6682 6682 8.17 0.006 Energy* Protein 1 414 414 414 0.51 0.479 Error 69 56438 56438 818 Total 74 249382 Term Coeff Stdev t-value P Constant 396.39 26.68 14.86 0.000 Energy 61.38 13.05 4.70 0.000 Protein 54.88 13.05 4.21 0.000 Energy*Energy 4.636 1.974 2.35 0.022 Protein*Protein 5.641 1.974 2.86 0.006 Energy*Protein 1.175 1.651 0.71 0.479 F-test with denominator: Error Denominator MS = 817.94 with 69 degrees of freedom Numerator DF Seq MS F P Energy 1 135343 165.47 0.000 Protein 1 45991 56.23 0.000 Energy*Energy 1 4514 5.52 0.022 Protein*Protein 1 6682 8.17 0.006 Energy*Protein 1 414 0.51 0.479 Explanation: The first TABLE gives the means for each sub-treatment with standard deviations. The mean for each main treatment is shown at the right hand side and bottom of the table. Then the analysis of variance is performed. Notice that both Energy and Protein are set as continuous variables with the subcommand COVA Energy Protein. Notice also an additional subcommand TEST. This requires some explanation. TEST is used as a sub-command to GLM to force MINITAB to use the sequential sums-of-squares and consequent mean squares in the test of significance, rather than the adjusted sums-of-squares and mean squares, which is the default action. The difference between them is that the adjusted sum-of- squares refers to each factor when all the others have been accounted for; the sequential sum-of-squares is calculated sequentially from the top so that each factor is taken out in turn. Tropical animal feeding: a manual for research workers 171 The TEST sub-command should always be used when the factors are NOT independent, as is inevitably the case with linear, quadratic and cubic effects (X, X*X, X*X*X). In other experiments where the sequential sums-of-squares and adjusted sums-of-squares are very different, non-independence is implied and the TEST sub-command should be used to force the use of the sequential sums- of-squares. The factors tested by the above commands are: Energy: linear effect of energy Protein: linear effect of protein Energy*Energy: quadratic effect of energy Protein*Protein: quadratic effect of protein Energy*Protein: Energy x Protein interaction. In assessing significance, the LAST table should be used (F test with denominator: Error). In the example, the linear and quadratic effects for both Energy and Protein are significant but there is no interaction (NS). This shows that the effects of Energy and Protein are curvilinear (diminishing response in this case as the quadratic coefficients are negative). There is little reason for a farmer to increase either energy or protein above the third level in both cases. An accurate equation can be obtained by rerunning the analysis with the interaction removed (because it was not significant) and using the constant and coefficients to construct the equation. Note that in experiments with two treatments where we wish to test the interaction, the model can be abbreviated to: MTB> GLM LWG = FEED ! SYSTEM This will test the main effects and the interaction (FEED, SYSTEM and FEED*SYSTEM). This could not be used in the above example because we excluded some of the more complex interactions. Dealing with unbalanced designs Particularly in on-farm research, we may not be able to apply all of the treatments, all of the time. With ANOVA, this presented serious problems and necessitated calculating 'missing plots'. However, GLM 174 Design and analysis of experiments Four treatments are applied to 100 cows each and the results measured as 'conceived' or 'failed' to conceive: Treatment Conceived Failed High energy - high protein 81 19 High energy - low protein 88 12 Low energy - high protein 75 25 Low energy - low protein 43 57 First, compute the chi-squared value for the whole table (3 d.f.): Total treatment effect (3 df) chi ² = 58.549 > 11.3 significant (P<0.01) Now combine rows 1+2 and 3+4 into a 2x2 table and calculate chi-squared (1 df) to calculate the energy effect and combine rows 1+3 and 2+4 into another 2x2 table and calculate the chi-squared to test the protein effect: Energy effect (1 df) chi ² = 32.080 > 6.63 significant (P<0.01) Protein effect (1 df) chi² = 7.709 > 6.63 significant (P<0.01) Subtract the energy and protein chi-squared values from the total chi-squared to get the remaining effect which is due to the interaction. Energy x protein (1 df) chi ² = 6.760 > 6.63 significant (P<0.01) There is a significant effect of energy and protein, and there is also an interaction between energy and protein. Note how the chi-squared values are additive and we partition the original 3 df into 1 for each main effect and 1 for the interaction. Numbers required for chi-squared analysis (eg: animal reproductive performance) The numbers required to obtain significant differences in this type of analysis are usually greater than with measurements such as growth or yield. Consider the results of chi-squared analysis where there is a difference of 10% in fertility of cows: Tropical animal feeding: a manual for research workers 175 25 cows per treatment conceived failed 20 5 chi ² = 0.439 18 7 50 cows per treatment conceived failed 40 10 chi² = 0.877 36 14 100 cows per treatment conceived failed 80 20 chi² = 1.754 72 28 150 cows per treatment conceived failed 120 30 chi² = 2.632 18 7 225 cows per treatment conceived failed 180 45 chi² = 3.947 162 63 It is only when we have 225 cows per treatment that we can detect the 10% difference in fertility (P<0.05), which is an important practical difference. Limitations of chi-squared analysis Certain rules must be considered when applying chi-squared analysis. One of these is that all cells should contain values greater than 5 (Snedecor). Otherwise, chi-squared is unreliable particularly with only 1 df. As an improvement, Yates (1939) proposed an adjustment known as 'Yate's Correction Factor'. This is simply an adjustment of the formula as follows: 176 Design and analysis of experiments chi² adjusted = (| observed - expected | - 0.5)² expected Exact probabilities Occasionally it is possible to obtain only limited amounts of data, for example, if to obtain data would destroy experimental units. When the numbers in a 2 x 2 table are very small, it may be best to compute exact probabilities rather than to rely on the chi-squared approximation. Example: Have Have not Total Standard 5 2 7 Treatment 3 3 6 Total 8 5 13 We compute the probability of obtaining the observed distribution or a more extreme one, the more extreme ones being: 6 1 7 7 0 7 2 4 6 and 1 5 6 8 5 13 8 5 13 We require the sum of the probabilities associated with the three distributions. Marginal totals are the same for all three tables. The sum of the probabilities will be used in judging significance. The probability associated with the distribution: n11 n12 n1. n21 n22 n2. n.1 n.2 n.. is Tropical animal feeding: a manual for research workers 179 Percentage data based on counts and a common denominator, where the range of percentages is 0-20% or 80-100% (but not both), may also be analyzed using %x. Percentages between 80-100 should be subtracted from 100 before the transformation is made. It can be seen that when there are mostly low counts with a few very high ones, the probability will be skewed and taking the square root will pull in the high 'tail'. Notice also that this type of data will have a fixed end, 0 (or 100% in the case of high percentages) which prevents it from showing a two sided normal distribution shape. When very small values are involved, %x tends to overcorrect and %(x+0.5) should be used when some of the values are <10 and especially when zeros are present. The logarithmic transformation The logarithmic transformation (log10 x) is used with positive integers which cover a wide range. This will again pull in a high 'tail' particularly when the high values are 100's or 1000's. When values are low (and obviously with 0), log(x+1) should be used. (The log transformation is also appropriate in experiments in which the variable is the variance.) The angular transformation The angular transformation(arcsin %x or sin-1 %x) is applicable to binomial data expressed as a decimal fraction or percentages when the percentages cover a wide range. (%x was recommended for percentages 0-20 and 80-100. For percentages 30-70 it is doubtful if any transformation is required.) Data may require to be divided by the numerator or 100 in the case of percentages to produce the decimal fractions required. Classical binomial data are the 'success or failure' type variables - conception rate, germination rate, etc. When given as a proportion, the angular transformation is appropriate. The square root may be applied when they are given as percentages (80-100%). It is not always obvious which type of transformation is required. It may be helpful to plot the data and data transformed by various methods to check the effect on the shape of the curve. 180 Design and analysis of experiments SIMULATING EXPERIMENTAL DATA The data referred to here are not real. They were produced by simulation, using MINITAB to produce sets of random data conforming to normally distributed probabilities and with appropriate variance. This can be a very useful technique, used to run the experiment in a theoretical way (using appropriate means and SE's obtained from previous experience or the literature). We can then try the statistical analysis before the experiment starts and identify and limitations in the design. The appropriate MINITAB commands to create a set of 20 normally distributed data, with mean 10 and SD±1 in column 1, are: MTB> RANDOM 20 C1; MTB> NORMAL 10 1. We might do the same in C2, with mean 12 and SD±1 and perform an ANOVAR on the two columns. The technique can be used to simulate factorial experiments, randomized block designs, latin squares, etc., using appropriate columns for different effects and variances. These can be summed to produce the simulated values for the data column and the appropriate analysis performed. It is a good method to 'practise' statistics, while gaining an appreciation of the effects of numbers, different levels of variation and different methods of analysis.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved