Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Package Corpora - Elementary Latin - 2009 | LATIN 1, Exams of Latin language

Material Type: Exam; Class: Elementary Latin; Subject: Latin; University: University of California - Los Angeles; Term: Spring 2009;

Typology: Exams

Pre 2010

Uploaded on 08/30/2009

koofers-user-ywl
koofers-user-ywl 🇺🇸

10 documents

1 / 17

Toggle sidebar

Partial preview of the text

Download Package Corpora - Elementary Latin - 2009 | LATIN 1 and more Exams Latin language in PDF only on Docsity! Package ‘corpora’ April 17, 2009 Type Package Title Statistics for corpus linguists Version 0.3-2.1 Depends R (>= 2.0.0) Date 2009-02-25 Author Stefan Evert <stefan.evert@uos.de> Maintainer Stefan Evert <stefan.evert@uos.de> Description Utility functions for the statistical analysis of corpus frequency data Encoding latin1 License GPL URL http://purl.org/stefan.evert/SIGIL/ Repository CRAN Date/Publication 2009-02-25 08:29:17 R topics documented: binom.pval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 BNCcomparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 BNCdomains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 BNCInChargeOf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 chisq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 chisq.pval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 cont.table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 fisher.pval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 prop.cint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 rel.risk.cint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 VSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 z.score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 z.score.pval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1 2 binom.pval Index 17 binom.pval P-values of the binomial test for frequency counts (corpora) Description This function computes the p-value of a binomial test for frequency counts. In the two-sided case, a fast approximation is used that may be inaccurate for small samples. Usage binom.pval(k, n, p = 0.5, alternative = c("two.sided", "less", "greater")) Arguments k frequency of a type in the corpus (or an integer vector of frequencies) n number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples) p null hypothesis, giving the assumed proportion of this type in the population (or a vector of proportions for different types and/or different populations) alternative a character string specifying the alternative hypothesis; must be one of two.sided (default), less or greater Details When alternative is two.sided, a fast approximation of the two-sided p-value is used (mul- tiplying the appropriate single-sided tail probability by two), which may be inaccurate for small samples. Unlike the exact algorithm of binom.test, this implementation can be applied to large frequencies and samples without a serious impact on performance. Value The p-value of a binomial test applied to the given data (or a vector of p-values). Author(s) Stefan Evert See Also z.score.pval, prop.cint chisq 5 Usage data(BNCInChargeOf) Format A data set with 250 rows and the following columns: collocate: a collocate of the key phrase in charge of (word form) f.in: occurrences of the collocate within a distance of 3 tokens from the key phrase, i.e. inside the span N.in: total number of tokens inside the span f.out: occurrences of the collocate outside the span N.out: total number of tokens outside the span Details Punctuation, numbers and any words containing non-alphabetic characters (except for -) were not considered as potential collocates. Likewise, the number of tokens inside / outside the span given in the columns N.in and N.out only includes simple alphabetic word forms. Author(s) Stefan Evert (http://purl.org/stefan.evert) References Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/. Sinclair, John (1991). Corpus, Concordance, Collocation. Oxford University Press, Oxford. chisq Pearson’s chi-squared statistic for frequency comparisons (corpora) Description This function computes Pearson’s chi-squared statistic (often written as X2) for frequency compar- ison data, with or without Yates’ continuity correction. The implementation is based on the formula given by Evert (2004, 82). Usage chisq(k1, n1, k2, n2, correct = TRUE, one.sided=FALSE) 6 chisq Arguments k1 frequency of a type in the first corpus (or an integer vector of type frequencies) n1 the sample size of the first corpus (or an integer vector specifying the sizes of different samples) k2 frequency of the type in the second corpus (or an integer vector of type frequen- cies, in parallel to k1) n2 the sample size of the second corpus (or an integer vector specifying the sizes of different samples, in parallel to n1) correct if TRUE, apply Yates’ continuity correction (default) one.sided if TRUE, compute the signed square root of X2 as a statistic for a one-sided test (see details below; the default value is FALSE) Details The X2 values returned by this function are identical to those computed by chisq.test. Unlike the latter, chisq accepts vector arguments so that a large number of frequency comparisons can be carried out with a single function call. The one-sided test statistic (for one.sided=TRUE) is the signed square root of X2. It is positive for k1/n1 > k2/n2 and negative for k1/n1 < k2/n2. Note that this statistic has a standard normal distribution rather than a chi-squared distribution under the null hypothesis of equal proportions. Value The chi-squared statistic X2 corresponding to the specified data (or a vector of X2 values). This statistic has a chi-squared distribution with df = 1 under the null hypothesis of equal proportions. Author(s) Stefan Evert References Evert, Stefan (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart. Published in 2005, URN urn:nbn:de:bsz:93-opus-23714. Available from http://www.collocations.de/phd.html. See Also chisq.pval, chisq.test, cont.table chisq.pval 7 chisq.pval P-values of Pearson’s chi-squared test for frequency comparisons (corpora) Description This function computes the p-value of Pearsons’s chi-squared test for the comparison of corpus frequency counts (under the null hypothesis of equal population proportions). It is based on the chi-squared statistic X2 implemented by the chisq function. Usage chisq.pval(k1, n1, k2, n2, correct = TRUE, alternative = c("two.sided", "less", "greater")) Arguments k1 frequency of a type in the first corpus (or an integer vector of type frequencies) n1 the sample size of the first corpus (or an integer vector specifying the sizes of different samples) k2 frequency of the type in the second corpus (or an integer vector of type frequen- cies, in parallel to k1) n2 the sample size of the second corpus (or an integer vector specifying the sizes of different samples, in parallel to n1) correct if TRUE, apply Yates’ continuity correction (default) alternative a character string specifying the alternative hypothesis; must be one of two.sided (default), less or greater Details The p-values returned by this functions are identical to those computed by chisq.test (two- sided only) and prop.test (one-sided and two-sided) for two-by-two contingency tables. Value The p-value of Pearson’s chi-squared test applied to the given data (or a vector of p-values). Author(s) Stefan Evert See Also chisq, fisher.pval, chisq.test, prop.test, rel.risk.cint 10 prop.cint References Fisher, R. A. (1934). Statistical Methods for Research Workers. Oliver & Boyd, Edinburgh, 2nd edition (1st edition 1925, 14th edition 1970). See Also fisher.test, chisq.pval, rel.risk.cint prop.cint Confidence interval for proportion based on frequency counts (cor- pora) Description This function computes a confidence interval for a population proportion from the corresponding frequency count in a corpus. The confidence interval can be based on a binomial test or on a z-score test (with or without continuity correction). Usage prop.cint(k, n, method = c("binomial", "z.score"), correct = TRUE, conf.level = 0.95, alternative = c("two.sided", "less", "greater")) Arguments k frequency of a type in the corpus (or an integer vector of frequencies) n number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples) method a character string specifying whether the confidence interval is based on the binomial test (binomial) or the z-score test (z.score) correct if TRUE, apply Yates’ continuity correction for the z-score test (default) conf.level the desired confidence level (defaults to 95%) alternative a character string specifying the alternative hypothesis, yielding a two-sided (two.sided, default), lower one-sided (less) or upper one-sided (greater) confidence interval Details The confidence intervals computed by this function correspond to those returned by binom.test and prop.test, respectively. However, prop.cint accepts vector arguments, allowing many confidence intervals to be computed with a single function call. In addition, it uses a fast approxi- mation of the two-sided binomial test that can safely be applied to large samples. rel.risk.cint 11 The confidence interval for a z-score test is computed by solving the z-score equation k − np√ np(1− p) = α for p, where α is the z-value corresponding to the chosen confidence level (e.g. ±1.96 for a two- sided test with 95% confidence). This leads to the quadratic equation p2(n+ α2) + p(−2k − α2) + k 2 n = 0 whose two solutions correspond to the lower and upper boundary of the confidence interval. When Yates’ continuity correction is applied, the value k in the numerator of the z-score equation has to be replaced by k∗, with k∗ = k − 1/2 for the lower boundary of the confidence interval (where k > np) and k∗ = k + 1/2 for the upper boundary of the confidence interval (where k < np). In each case, the corresponding solution of the quadratic equation has to be chosen (i.e., the solution with k > np for the lower boundary and vice versa). Value A data frame with two columns, labelled lower for the lower boundary and upper for the upper boundary of the confidence interval. The number of rows is determined by the length of the longest input vector (k, n and conf.level). Author(s) Stefan Evert See Also z.score.pval, prop.test, binom.pval, binom.test rel.risk.cint Conservative confidence interval for the relative risk ratio (corpora) Description This function approximates a conservative confidence interval for the relative risk coefficient, i.e. the ratio r = p1/p2 between two population proportions, based on frequency counts from two cor- pora. The approximation is computed from individual confidence intervals for the two proportions, with confidence levels adjusted accordingly. Usage rel.risk.cint(k1, n1, k2, n2, conf.level = 0.95, alternative = c("two.sided", "less", "greater"), method = c("binomial", "z.score"), correct = TRUE) 12 rel.risk.cint Arguments k1 frequency of a type in the first corpus (or an integer vector of type frequencies) n1 the sample size of the first corpus (or an integer vector specifying the sizes of different samples) k2 frequency of the type in the second corpus (or an integer vector of type frequen- cies, in parallel to k1) n2 the sample size of the second corpus (or an integer vector specifying the sizes of different samples, in parallel to n1) conf.level the desired confidence level (defaults to 95%) alternative a character string specifying the alternative hypothesis, yielding a two-sided (two.sided, default), lower one-sided (less) or upper one-sided (greater) confidence interval method a character string specifying whether the individual confidence intervals for the two proportions are based on the binomial test (binomial) or the z-score test (z.score) correct if TRUE, apply Yates’ continuity correction for the z-score test (default) Details This function computes individual confidence intervals for the two population proportions p1 (from k1 and n1) and p2 (from k2 and n2). Then, a confidence interval for the relative risk ratio r = p1/p2 is determined in such a way, that r lies within the interval whenever p1 and p2 lie in their respective confidence intervals. Thus, when these intervals are computed with a confidence level of e.g. .975, r is certain to fall within its confidence interval in .9752 = .95 of all cases. This adjustment of confidence levels is made automatically. Note that r might fall within its confidence interval even when either p1 or p2 is outside the respective interval, hence rel.risk.cint computes a conservative confidence interval that will be larger than necessary. Exact confidence intervals for the odds ratio coefficient θ = (p1/(1 − p1))/(p2/(1 − p2)) can be computed with the fisher.test function. However, these exact intervals are computationally very expensive and may cause R to run out of memory for large frequency counts. In addition, fisher.test only computes a single confidence interval for each function call (i.e., it cannot be applied to vectorised data). Value A data frame with two columns, labelled lower for the lower boundary and upper for the upper boundary of the confidence interval. The number of rows is determined by the length of the longest input vector (k1, n1, k2, n2 and conf.level). Author(s) Stefan Evert See Also prop.cint, chisq.pval, fisher.pval, fisher.test z.score.pval 15 Details The z statistic is given by z := k − np√ np(1− p) When Yates’ continuity correction is enabled, the absolute value of the numerator d := k − np is reduced by 1/2, but clamped to a non-negative value. Value The z-score corresponding to the specified data (or a vector of z-scores). Author(s) Stefan Evert See Also z.score.pval z.score.pval P-values of the z-score test for frequency counts (corpora) Description This function computes the p-value of a z-score test for frequency counts, based on the z-score statistic implemented by z.score. Usage z.score.pval(k, n, p = 0.5, correct = TRUE, alternative = c("two.sided", "less", "greater")) Arguments k frequency of a type in the corpus (or an integer vector of frequencies) n number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples) p null hypothesis, giving the assumed proportion of this type in the population (or a vector of proportions for different types and/or different populations) correct if TRUE, apply Yates’ continuity correction (default) alternative a character string specifying the alternative hypothesis; must be one of two.sided (default), less or greater 16 z.score.pval Value The p-value of a z-score test applied to the given data (or a vector of p-values). Author(s) Stefan Evert See Also z.score, binom.pval, prop.cint Index ∗Topic array cont.table, 8 ∗Topic datasets BNCcomparison, 3 BNCdomains, 4 BNCInChargeOf, 4 VSS, 13 ∗Topic htest binom.pval, 2 chisq, 5 chisq.pval, 7 cont.table, 8 fisher.pval, 9 prop.cint, 10 rel.risk.cint, 11 z.score, 14 z.score.pval, 15 binom.pval, 2, 11, 16 binom.test, 2, 10, 11 BNCcomparison, 3 BNCdomains, 4 BNCInChargeOf, 4 chisq, 5, 7 chisq.pval, 6, 7, 10, 12 chisq.test, 6–8 cont.table, 6, 8 fisher.pval, 7, 9, 12 fisher.test, 8–10, 12 prop.cint, 2, 10, 12, 16 prop.test, 7, 10, 11 rel.risk.cint, 7, 10, 11 VSS, 13 z.score, 14, 15, 16 z.score.pval, 2, 11, 15, 15 17
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved