Download Generalizing from Educational Research: Population Heterogeneity and Knowledge Claims and more Exams Psychology in PDF only on Docsity! ∀ Limits of Generalizing in Education Research: Why Criteria for Research Generalization Should Include Population Heterogeneity and Uses of Knowledge Claims Kadriye Ercikan, University of British Columbia Wolff-‐Michael Roth, University of Victoria Abstract Generalization is a critical concept in all research that describes the process of developing general knowledge that applies to all elements of a unit (population), while studying only a subset of these elements (sample). Commonly applied criteria for generalizing that focus on experimental design or representativeness of samples of the population of units neglect considering the targeted uses of knowledge generated from the generalization. This paper (a) articulates the structure and discusses limitations of different forms of generalizations across the spectrum of quantitative and qualitative research; and it (b) argues for an overarching framework that includes population heterogeneity and uses of knowledge claims as part of the rationale for generalizations from educational research. A recent special issue of this journal was dedicated to data use as an integral part of current reform efforts (Turner & Coburn, 2012). Other researchers highlight data ∀ Accepted for publication in Teachers College Record. GENERALIZING FROM EDUCATIONAL RESEARCH 2 use and research evidence as perhaps the most central dimension of today’s political climate that shapes the field of education (Cohen-‐Vogel, 2011; Moss, 2012; Roderick, 2012). This emphasis on data use and evidence crowns empirical research findings with the highest status in guiding policy and practice. It is therefore legitimate to ask, “To what extent is typical education research designed to provide evidence to inform policy and practice?” The evidence educators use for educational policy analysis, evaluation, and decision-‐making tends to be produced through educational research that takes population samples or case studies to make claims valid for jurisdictions at different levels such as classroom, schools, districts etc. However, the question whether research evidence at one level of educational practice scales up to another level is a non-‐trivial question (e.g., Ercikan & Roth, in press; Stein et al., 2008). The question of the extent to which educators can generalize from educational research has led in many contexts to a predilection for experimental and quantitative over qualitative studies – although it has been suggested that without the examination of qualitative evidence, “variations in quantitative studies are difficult to interpret” (Kennedy, 2008, p. 344). But in education and other fields, “[f]indings from a large number of qualitative research investigations have had little impact on clinical practice and policy formation” (Finfgeld-‐Connett, 2010, p. 246). In this article, we argue that the issue of generalization of empirical findings for the purpose of education practice, policy analysis, evaluation, and decision-‐making not only needs to transcend the traditional divide between quantitative and qualitative research but also requires an overarching framework that includes population heterogeneity and uses of knowledge claims as criteria that establish the quality of generalizations that meets policy makers’ demands “for relevant and rigorous research” (Brewer, Fuller, & Loeb, 2010, p. 4). In so doing, we contribute to establishing a theoretical framework for methodological rigor related to educational research generalization. GENERALIZING FROM EDUCATIONAL RESEARCH 5 on a logic of within-‐individual differences and causations (Borsboom, Mellenberg, & van Heerden, 2003). This article has two connected purposes: (a) to articulate the structure and discuss limitations of different forms of generalizations across the spectrum of quantitative and qualitative research and (b) to argue for considering population heterogeneity and for including future uses of knowledge claims when judging the appropriateness of generalizations that are used as evidence on which educational policy analysis, evaluation, and decision-‐making are based. In the first part of the paper we present two forms of generalization that rely on statistical analysis of between-‐group variation: analytic and probabilistic generalization. These are the most commonly understood notions of generalizing in educational research (Eisenhart, 2009; Firestone, 1993). We then describe a third form of generalization: essentialist generalization. [2] Essentialist generalization moves from the particular to the general in small sample studies. This form of generalization exists in medical, (historical-‐) genetic, and scientific research in general, but is not well understood and is infrequently used in social science or education research. We discuss limitations of each kind of generalization and propose two additional criteria when evaluating the validity of evidence based on generalizations from education research. In the second part of the paper, we first make a case for taking into account population heterogeneity when evaluating validity of generalizations from educational research. Second, we demonstrate a need to consider future use as integral and essential aspects of the question about the extent to which research claims are generalizable. Generalizing in Educational Research GENERALIZING FROM EDUCATIONAL RESEARCH 6 In this section we present and discuss – cutting across the quantitative-‐ qualitative divide that exists in educational research methodology – three main forms of generalization and their limitations in view of how they inform different users in policy and practice. The three forms of generalization analytic, probabilistic, and essentialist are presented as distinctly different with respect to the rationale and evidence required to support them. The criteria used for judging the supporting evidence are described. The distinctions between the three forms of generalization are important to clarify in discussing limitations of each generalization in informing policy and practice. None of them are presented as superior to the other; rather they are considered as complementary. Analytic Generalization Structure. Analytic generalization relies on the design of the research to make causal claims. It involves making arguments that support claims in relation to a theory. It may involve the testing of a new theory as well as application of a theory in a context for which the theory was not originally developed. The researcher may hypothesize, for example, that an intervention operationally defining a theoretical construct leads to better learning. This operationalization requires a specific research design (Shadish, Cook, & Campbell, 2002). First, it must logically allow making causal inferences: Instances where a cause operates have to lead to significantly different observations than those instances where the cause is disabled. Usually, this requires randomly assigning participants to control and experimental groups in the hope of achieving equivalence of these groups with respect to all moderating and mediating variables and an identical implementation of the intervention to the experimental and comparison (control) groups. The groups are not expected to be representative samples of any particular target population. GENERALIZING FROM EDUCATIONAL RESEARCH 7 Random equivalence is intended to rule out any potential alternative explanations of differences between the control and experimental groups. The arguments in analytic generalization are closely tied to the degree to which experimental design is truly implemented. The statistical support for the hypothesis about the effectiveness of the interventions – which provides sufficient evidence to reject the null hypothesis that there is no difference between the control and the experimental groups after the intervention – is used to make claims about effects of the intervention in the target population. The claim is with respect to the causal relationship between the intervention and the outcome. The outcome of an intervention is determined by comparing the difference between the means of control and experimental groups to the standard error of the mean differences. If on the average a statistically significant difference in the hypothesized direction is identified between the two groups, the theory is supported and therefore implies effectiveness of the intervention, such as a new instruction method that includes using technology in mathematics teaching. Limitation. In analytic generalization, there are two key criteria for judging the causal inference from the experimental design. One is whether there is a systematic difference between experimental and control groups that can be supported by statistical evidence and the other is the degree to which a true experiment has been conducted so that the change in experimental group outcomes can be attributed to the specific operating cause deriving from the intervention. Even when such a generalization is fully supported based on these two criteria, a loose causal link is established. A causal claim that applies to the overall group does not necessarily apply to subgroups or to individuals because the logic of such studies is based on the logic of between-‐subjects rather than within-‐subjects variation (Borsboom et al., 2003). In other words, intervention may have been effective “on the average” but, GENERALIZING FROM EDUCATIONAL RESEARCH 10 probabilistic generalizations are common. One type of generalization claim is with respect to relationships between variables, for example, between IQ and achievement (Figure 2a). In this case, statistics is used to estimate the probability that a systematic relation between IQ and achievement exists beyond chance level. The second type of research generalization is related to relative frequency (e.g., proportion of students identified with learning disabilities) or group differences (e.g., differences in achievement between boys and girls) (Figure 2b). For example, the Programme for International Student Assessment (PISA) 2009 data for Canada suggest that there are statistically significant differences between boys and girls on the reading score (Mb = 507, Mg = 542, SDb,g = 90) (see Figure 2b) based on the differences in the sample. In both of these probabilistic generalizations, generalization claims are derived from observations from the sample. The criteria by which the generalization is judged – i.e., the validity of claims about the correlation between IQ and achievement or gender differences in reading in Canada – centers on one of the same criteria used for judging analytic generalization that is whether there is statistical evidence of a systematic pattern in the data. Even though probabilistic generalizations may include group comparison, such as comparing gender or ethnic groups, these generalizations do not require a specific research design such as random equivalence of groups, or standardized implementation of an intervention. Instead, the representativeness of the samples of the target populations is the second key criterion used for probabilistic generalizations. Limitation. Within group heterogeneity that limits the meaningfulness of causal claims in analytic generalization for sub-‐groups or individuals leads to similar limitations in probabilistic generalization. National surveys of achievement are primary data sources for making probabilistic generalizations. For example, one of the primary foci of large-‐scale surveys of achievement – e.g., the National GENERALIZING FROM EDUCATIONAL RESEARCH 11 Assessment of Educational Progress (NAEP) or international assessments such as PISA – is to compare outcome levels of males and females, countries, or ethnic groups. Using the recent PISA reading results, we plotted the distribution of reading scores for Canadian boys and girls (Figure 2b). These distributions of scores have a great degree of overlap, so that claims such as “girls are outperforming boys” are not meaningful. At each score level, we find boys and girls, though at higher scoring levels, there are more girls than boys with a given score (right, Figure 2b), whereas there are more boys than girls with a given score at lower scoring levels (left, Figure 2b). Which girls are outperforming which boys? Clearly some boys are outperforming some girls. In fact, as recent results in the UK show, although girls tend to exhibit higher achievement scores on average (e.g., number of A’s in A-‐level courses), there are more boys than girls among the very highest scoring students (Clark & Preece, 2012). Thus, the claims for generalizing group differences become even more complex and problematic when we look at gender differences between sub-‐groups such as those from different socio-‐economic background, language groups, and others. A similar limitation exists when making knowledge claims related to relationships between variables. Probabilistic generalization that focuses on describing population characteristics can lead to knowledge claims that involve statistical concepts – e.g., mean, frequency, mean differences, or correlations – may not apply to sub-‐groups and may have limited value for guiding policy and practice. ««««« Insert Figure 2 about here »»»»» Essentialist Generalization Structure. Essentialist generalization is the result of a systematic interrogation of “the particular case by constituting it as a ‘particular instance of the possible’ . . . GENERALIZING FROM EDUCATIONAL RESEARCH 12 in order to extract general or invariant properties that can be uncovered only by such interrogation” (Bourdieu, 1992, p. 233). In this approach, every case is taken as expressing the underlying law or laws; the approach intends to identify invariants in phenomena that on the surface look like they have little or nothing in common (Roth, 2012). Thus, for example, Vygotsky (1971) derived a general theory of the psychology of art based on the analysis of three very different literary genres: a fable, a short story, and a tragedy. He concludes: We have ascertained that contradiction is the essential feature of artistic form and material. We have also found that the essential part of aesthetic response is the manifestations of the affective contradiction which we have designated by the term catharsis. (p. 217, original emphasis, underline added) Having derived his psychology of art based on individual case studies generally and the role of catharsis more specifically, Vygotsky notes that “it would be very important to show how catharsis is achieved in different art forms, what its chief characteristics are, and what auxiliary processes and mechanisms are involved in” (p. 217). That is, although Vygotsky developed the categories of affective contradiction and catharsis and their role in human development from the analysis of a concrete case, which he subsequently verifies by means of analogy in two further cases, he arrives at generalizations that are much broader than the three texts he analyzed and much broader than the written forms of art. Thus, as shown in Figure 3, because the categories constitute the essential feature of artistic form and material they equally can be found in painting and music (blues, classical, or any other form). In a subsequent text he summarily states: “the principle of art as well is dealing with a reaction which in reality never manifested itself in a pure form, but GENERALIZING FROM EDUCATIONAL RESEARCH 15 that it conceals under the appearance of singularity” (p. 234). One achieves this by completely immersing oneself “in the particularity of the case at hand without drowning in it . . . to realize the intention of generalization . . . through this particular manner of thinking the particular case which consists of actually thinking it as such” (pp. 233–234). Case-‐based research too frequently does not lead to generalization and furthermore “inclines us toward a sort of structural conservatism leading to the reproduction of scholarly doxa” (p. 248). In the case of phenomenography, researchers tend to catalogue the kinds of experiences research participants have but tend to fail seeking generalizations that would explain why participants experience a situation in this or that manner under given conditions (e.g., Roth, 2009a, 2009b). Additional Research Generalization Criteria: Population Heterogeneity and Uses [I]n the case of stating truly or falsely, just as much as in the case of advising well or badly, the intents and purposes of the utterance and its context are important; what is judged true in a school book may not be so judged in a work of historical research. (Austin, 1962/1975, p. 143, emphasis added) The criteria for generalization – i.e., the types of evidence needed to support knowledge claims – vary in different types of generalizations. In analytic generalization, the key criteria are (1) whether a systematic difference between experimental and control groups can be supported by statistical evidence and (2) whether the change in experimental group outcomes can be causally linked to the intervention. In probabilistic generalization, the key criteria are (1) whether systematic patterns in the sample can be supported by statistical evidence and (2) GENERALIZING FROM EDUCATIONAL RESEARCH 16 whether the sample is representative of the population. In essentialist generalization, the degree to which essential (i.e., common to all cases) aspects of the case are found in other cases of people, interventions, and contexts determine whether generalization claims are supported. To what extent are these currently used criteria for research generalization sufficient for determining meaningfulness and applicability of knowledge to inform policy and practice? In analytic generalization, the causal claim “the intervention causes the difference between the control and experimental groups” or in the probabilistic generalization “girls are performing higher than boys in the reading assessment” are targeted to be at the group level. Generalization of such claims is based on statistical analysis of between-‐group variation – also referred to as the “variable model” (Holzkamp, 1983; Maxwell, 2004) or the “snapshot, bookend, between-‐groups paradigm” (Winne & Nesbitt, 2010, p. 653). This approach entails within-‐group homogeneity. Researchers have criticized the use of between-‐group analyses for making claims about within-‐individual processes. Thus, there is an almost universal – but surprisingly silent – reliance on what may be called a uniformity-‐of-‐nature assumption in doing between-‐subject-‐ analyses; the relation between mechanisms that operate at the level of the individual and models that explain variation between individuals is often taken for granted, rather than investigated. (Borsboom et al., 2003, p. 215) A great deal of other research findings parallel this position (cf., Ercikan, Roth, Simon, Sandilands and Lyons-‐Thomas, in press; Molenaar, 1999, 2004; Molenaar, Huizenga, & Nesselroade, 2003; Oliveri, Ercikan & Zumbo, in press a; Oliveri, Ercikan & Zumbo; in pressb). These findings demonstrate that “if a model fits in a given population, this does not entail the fit of the same model for any given element GENERALIZING FROM EDUCATIONAL RESEARCH 17 from a population, or even for the majority of elements from that population” (Borsboom et al., 2003, p. 213). Similarly, qualitative research often fails to recognize that in the apparent diversity of phenomena there are fundamental commonalities in the processes of their generation (Garfinkel, 2002; Vygotsky, 1927/1997). In our introductory quotation to this section, Austin points out that to establish the truth or falsity of a statement we need to know its context intents and its and purposes (i.e., uses). In this section we introduce two additional criteria when making generalization claims that address context and use of knowledge claims. Respectively, these are (a) heterogeneity in the target population and (b) the degree to which claims apply to the targeted uses. Population Heterogeneity as a Criterion in Research Generalization The question about the degree to which some research claim provides useful direction for practice and policy depends on the degree to which findings apply to the relevant sub-‐groups or individuals. Applicability of research findings to the relevant units (individual, sub-‐group, group) is at the core of potential for research to inform pedagogy, policy, or social theory. Research inferences targeted to broadly defined populations have significant limitations in their applicability to understanding or to making decisions regarding sub-‐groups of the populations such as gender, ethnic, and ability groups of students. Cronbach (1982) highlights diversity in the population and its potential effect on inferences by stating that “the summary statistics on the sample, or the estimates for UTOS or a sub-‐UTOS, are usually not an adequate base for inference about *UTOS. Insofar as there is diversity in the data, the consumer should be told about that diversity and any factors associated with it” (p. 167). [5] As a result, the researcher will have “to work back GENERALIZING FROM EDUCATIONAL RESEARCH 20 on such models suffer from the problems Rogosa and his colleagues elaborated on almost three decades ago. Similar to proponents of inter-‐individual research designs, Winne (2006) suggests that there is a need to examine individual student learning traces using interactive learning software such as the gStudy to inform reform efforts to improve learning. Another widely used research approach draws on correlational studies to determine factors that are associated with better educational outcomes. Very commonly used correlational research uses statistical methods that typically employ ecological correlations (Robinson, 1950), such as Pearson correlations, which capture associations between variables for groups. These correlations use marginal frequencies for estimating group level associations. An alternative statistic individual correlation is defined as “a correlation in which the statistical object or thing described is indivisible” (p. 351). Individual correlation is based on individual level variable values such as gender, height, education level, rather than marginal frequencies for groups. Robinson demonstrates that ecological correlation differs by level of aggregation and that ecological correlations cannot be used as indicators of individual correlations. Some researchers argue that accounting for within-‐group heterogeneity by multi-‐level modeling in correlational research may address the problems of ecological correlation and individual correlations may not be needed (Subramanian, Jones, Kaddour, & Krieger, 2009). This rationale against individual correlations is not convincing to some researchers (Oakes, 2009). First, multi-‐level models have several assumptions that are often not met by real data. Second, multi-‐ level models are targeted to address group-‐level associations and do not capture associations for individuals or sub-‐groups, which may have very different associations (Oakes, 2009). The issue of heterogeneity poses itself differently in essentialist generalization. This is so because this form of generalization inherently acknowledges and is based GENERALIZING FROM EDUCATIONAL RESEARCH 21 on the diversity in which a generalization manifests itself (see Figure 3). Read from left to right, the figure exemplifies how a generalization leads to the diversity of particulars inherent in it but not to the particulars of other generalizations (Vygotsky, 1971). The problem lies in the identification of the generalization to which the particular case of interest belongs. Thus, for example, Piaget’s work on reasoning is problematic not because he did not generalize; rather, it is problematic because it does not apply in the case of the fundamental restructuring of reasoning that (schooling) culture and language bring about (e.g., Harris, 2001; Luria, 1976). Once a true generalization has been found, however, it will apply to every case; it only manifests itself differently in different cases. Therefore, in contrast to the two other forms of generalization, essentialist generalization inherently addresses heterogeneity as long as we take into account the contextual particulars relevant to the manifestation of the generalization. Uses of Knowledge Claims as Criterion for Generalizing A study on research use suggests that there tends to be a lack of uptake of research evidence on the part of teachers (Williams & Coles, 2003). The study shows that links between research output and practice often are not apparent. Moreover, often overlooked in the research on knowledge use is the relation between knowledge and interests (e.g., Habermas, 2008). Thus, as the introductory quotation from Austin shows, the truth or falsehood of statements (knowledge claims) depends on the intents and purposes (i.e., uses) of a statement (knowledge claim). Similarly, the question about the extent to which we can generalize research results cannot be limited to evaluating consistency, reliability across observations, or validity of interpretations (Bachman, 2009). Rather, the evaluation of the extent to which research claims are generalizable needs “to consider the uses that may be GENERALIZING FROM EDUCATIONAL RESEARCH 22 made of our research results, and the consequences of these uses for various individuals who may be affected by them” (p. 127). Granting councils around the world already are sensitive to the relationship between knowledge and use. Thus, for example, the Canadian Institute for Health Research defines a knowledge-‐user as an individual who is likely to be able to use the knowledge generated through research to make informed decisions about health policies, programs and/or practices. A knowledge-‐user's level of engagement in the research process may vary in intensity and complexity depending on the nature of the research and his/her information needs. A knowledge-‐user can be, but is not limited to, a practitioner, policy-‐maker, educator, decision-‐maker, health care administrator, community leader, or an individual in a health charity, patient group, private sector organization, or media outlet. (CIHR, 2011) There now exists extensive empirical evidence that knowledge is situated and specific to the circumstances so that what is useful in one setting is not useful in another (e.g., Lave, 1988; Lobato, 2006; Packer, 2001; Saxe, 1991; Tuomi-‐Gröhn & Engeström, 2003). It may therefore not come as a surprise that some scholars refer to knowledge in the plural form, as in “situated knowledges” (e.g., Haraway, 1991). In this section, we discuss a research use argument in the light of the preceding discussion of the three forms of generalization. The alternate levels of generalization allow us to understand that there are different ways in which change in education may be brought about. For example, much of current educational policy practice is to target tendencies such as the overall positive correlations between educational practice and learning outcomes or an increase of group level learning outcomes. This, as in analytic generalization, GENERALIZING FROM EDUCATIONAL RESEARCH 25 Figure 2a that statisticians treat as error variance. A superintendent of schools might decide, based on the results of (quasi-‐) experiments to foster teaching science using a hands-‐on approach over lecture style approaches. She may make available funding to assist teachers in learning how to teach with this new method. All of these decisions need to be guided by different types of evidence that the resulting actions at the student, classroom, teacher and school levels will lead to improvement. The idea that a generalization meets the needs of particular cases underlies the concept of phronesis sometimes discussed by teacher educators (e.g., Eisner, 2002) whereby the practitioner invents conduct such that the rule/law derived from generalization is violated to the minimum while satisfying the exceptional circumstances required by solicitude (Ricœur, 1990). To provide another example, general interests are distinguished from particular interests, most often represented in and by “interest groups” and the lobbyists that represent them. Effective generalization means that the interests of all interest groups are met. Is this possible? In the context of education, the cogenerative dialogue is one form of praxis that brings together every different stakeholder group – e.g., students, teacher, department head, and assistant principal – for the purpose of making decisions about concrete next steps that are in the interest of all those using and being affected by the decisions (Roth & Tobin, 2002; Tobin, 2009). Consideration of a set of generalizations at different levels, individual, sub-‐ population and population, therefore occur at the very heart of educational praxis, whereby all stakeholders commit to act in the general interest rather than in the particular interests of one or the other special (interest) group. Knowledge underlying the common plan inherently is shared and therefore of generalized nature rather than of a nature particular to an individual or group. Responding to our rhetorical question, yes, it is possible to produce useful generalizations if these GENERALIZING FROM EDUCATIONAL RESEARCH 26 are tailored beforehand to the needs of the particular user. Educational researchers therefore need to include the uses in their evaluations of research generalization in addition to evaluating consistency, reliability, or validity. Our recommendation thereby is consistent with the suggestion that research should be concerned with tactical authenticity by providing stakeholders with the means that allow them to empower themselves (Guba & Lincoln, 1989); but we extend this argument beyond the particular epistemological underpinnings to which it was initially applied and to all forms of generalization discussed in this article. Final Note The purpose of this paper is to provide an overarching framework that includes population heterogeneity and uses of knowledge as integral aspects in the process of research generalization and in the production of evidence on which educational policy analysis, evaluation, and decision-‐making are based. The power of research derives from the fact that it produces knowledge that can be used in multiple settings. In educational research, however, the question too often has been more about the use of qualitative or quantitative method rather than about the potential of research to contribute to the improvement of education. Yet, to paraphrase Bourdieu for our own purposes, educational research “is something much too serious and too difficult to allow ourselves to mistake scientific rigidity, which is the nemesis of intelligence and invention, for scientific rigor” (Bourdieu, 1992, p. 227, original emphasis). Mistaking rigidity and rigor would dismiss some research methods and lead us to miss out on the “full panoply of intellectual traditions of our discipline and of the sister disciplines of anthropology, economics, history, etc.” (p. 227). GENERALIZING FROM EDUCATIONAL RESEARCH 27 The problems deriving from over-‐generalizing exist in both quantitative and qualitative research. It is such over-‐generalizing that we need to guard against most vigorously by taking into account (a) the diversity in the populations of interest and (b) uses of knowledge from educational research as indicators of the quality of empirical evidence for policy and practice. Here we argue for the inclusion of population heterogeneity and knowledge uses when considering educational research generalization. With respect to the latter, one may only speculate about the absence of uses as a criterion. It may well be that the research communities represented in journals and authors of journal articles hope to reach the widest audience possible and therefore generalize their findings across specific uses. However, the different knowledge interests and needs that characterize teachers, politicians, evaluators, analysts, policymakers, or high-‐level administrators should highlight the importance that knowledge use is an important dimension of its generality. Including population heterogeneity as a criterion of the extent to which it is possible to generalize research findings simply means recognizing (a) diversity along a virtually infinite number of dimensions within society and (b) that what is beneficial for one identifiable group may be neutral or detrimental for another group even though they appear to be very similar. This recognition needs to be accompanied with clarity in how research findings are reported and an explicit identification of limits of generalizations. These can include clear identification of specifics of the domain about which the research question is asked including units (U), treatments (T), observing operations (O), and settings (S) of UTOS (Cronbach, 1982). We refer to these as referents in research reporting (Roe, 2012). In addition to descriptions of UTOS, there is a need to consider and discuss the degree to which research findings would be invariant in contexts not represented by UTOS. These will constitute the boundary conditions for the research claims. GENERALIZING FROM EDUCATIONAL RESEARCH 30 language, student (mis-‐) conceptions inherently are cultural (common, general) rather than personal (singular, special) phenomena (Roth, Lee, & Hwang, 2008). This has far-‐reaching implications in that this research suggests the impossibility of “eradicating misconceptions,” a long-‐held ideal of many science educators working from a conceptual change perspective. Notes [1] Some more radical “constructivist” educators favor the term “transportability” of findings (e.g., Guba & Lincoln, 1989); but the underlying concern is the same: making use of research findings in a setting other and therefore wider than where they are originally produced. [2] The adjective “essentialist” is based on Vygotsky’s (1927/1997) description of this form of generalization, which, as shown below, has as its goal “not a systematic exposition of a psychological theory . . . but precisely the analysis of the processes in their essence” (p. 319, original emphasis, underline added). [3] The design experiment is a research method that combines experimental and case-‐based methods to investigate complex interventions; it is intended to produce generalizations while being useful to the particular case (Brown, 1992). [4] In fact, there exists an insistence on the part of many “qualitative” researchers that their research ought not pursue generalization because “[t]he trouble with generalizations is that they don’t apply to particulars” (e.g., Lincoln & Guba, 1985, p. 110). [5] UTOS refers to domain about which the research question is asked, involving units (U), treatments (T), observing operations (O), and settings (S). UTOS* refers to the specific situation or class about which a conclusion is wanted (Cronbach, 1982). GENERALIZING FROM EDUCATIONAL RESEARCH 31 [6] In the context of classical logic, this statement may sound contradictory. In dialectical logic, however, a thing is not self-‐identical so that it will never manifest itself in identical form, a wisdom also capture d in the Heraclitean observation that we can never step into the same river twice. References Austin, J. L. (1975). How to do things with words (2nd ed.). Cambridge, MA: Harvard University Press. (First published in 1962) Bachman, L. F. (2009). Generalizability and research use arguments. In K. Ercikan & W.-‐M. Roth (Eds.), Generalizing from educational research: Beyond qualitative and quantitative polarization (pp. 127–148). New York, NY: Routledge. Borkenau, P., & Ostendorf, F. (1998). The big five as states: How useful is the five factor model to describe intraindividual variations over time? Journal of Research in Personality, 32, 202–221. Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203–219. Bourdieu, P. (1992). The practice of reflexive sociology (The Paris workshop). In P. Bourdieu & L. J. D. Wacquant, An invitation to reflexive sociology (pp. 216–260). Chicago, IL: University of Chicago Press. Brewer, D. J., Fuller, B., & Loeb, S. (2010). Editor’s introduction. Educational Evaluation and Policy Analysis, 32, 3–4. Brown, A. (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions in classroom settings. Journal of the Learning Sciences, 2, 141–178. Campbell, D. T. (1986). Relabeling internal and external validity for applied social scientists. New Directions for Program Evaluation, 31, 67–77. GENERALIZING FROM EDUCATIONAL RESEARCH 32 Canadian Institute for Health Research (CIHR). (2011). More about knowledge translation at CIHR. Accessed February 23, 2012 at http://www.cihr-‐irsc.gc.ca/e/39033.html Cervone, D. (1997). Social–cognitive mechanisms and personality coherence: Self-‐ knowledge, situational beliefs, and cross-‐situational coherence in perceived self-‐efficacy. Psychological Science, 8, 43–50. Clark, L., & Preece, R. (2012, August 16). Boys are top of the class! Teenagers celebrate as they get A-‐level marks . . . and lads do better than girls at getting A* grades. Mail Online. Accessed December 18, 2012 at http://www.dailymail.co.uk/news/article-‐2188974/A-‐ Level-‐Results-‐Day-‐2012-‐Boys-‐better-‐girls-‐achieving-‐A-‐grades.html Cohen-‐Vogel, L. (2011). “Staffing to the test”: Are today’s school personnel practices evidence based? Educational Evaluation and Policy Analysis, 33, 483–505. Corno, L., Cronbach, L. J., Kupermintz, H., Lohman, D. F., Mandinach, E. B., Porteus, A. W., Talbert, J. E. for the Stanford Aptitude Seminar (2002). Remaking the concept of aptitude: Extending the legacy of Richard E. Snow. Mahwah, NJ: Lawrence Erlbaum Associates. Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San Francisco, CA: Jossey-‐Bass. Derrida, J. (1988). Limited inc. Chicago, IL: University of Chicago Press. Eisenhart, M. (2009). Generalization from qualitative inquiry. In K. Ercikan & W.-‐M. Roth (Eds.), Generalizing from educational research: Beyond qualitative and quantitative polarization (pp. 51–66). New York, NY: Routledge. Eisner, E. W. (2002). From episteme to phronesis to artistry in the study and improvement of teaching. Teaching and Teacher Education, 18, 375–385. Ercikan, K. (2009). Limitations in sample to population generalizing. In K. Ercikan & M-‐W. Roth (Eds.), Generalizing in educational research: Beyond qualitative and quantitative polarization (pp. 211–235). New York, NY: Routledge. Ercikan, K., & W.-‐M. Roth (2009). Generalizing from educational research: Beyond qualitative and quantitative polarization. New York, NY: Routledge. GENERALIZING FROM EDUCATIONAL RESEARCH 35 Molenaar, P. C. M. (1999). Longitudinal analysis. In H. J. Adèr & G. J. Mellenbergh (Eds.), Research methodology in the life, behavioural and social sciences (pp. 143–167). Thousand Oaks, CA: Sage. Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever. Measurement, 2, 201–18. Molenaar, P. C. M., Huizenga, H. M., & Nesselroade, J. R. (2003). The relationship between the structure of inter-‐individual and intra-‐individual variability: A theoretical and empirical vindication of developmental systems theory. In U. M. Staudinger & U. Lindenberger (Eds.), Understanding human development (pp. 339–360). Dordrecht, The Netherlands: Kluwer Academic Publishers. Moss, P. A. (2012). Exploring the macro-‐micro dynamic in data use practice. American Journal of Education, 118, 223–232. Oakes, W. S. (2009). Individual, ecological and multilevel .fallacies. International Journal of Epidemiology, 38, 361–368. Oliveri, M. E., Ercikan, K. & Zumbo, B.D. (in pressa). Accuracy of DIF detection methods for heterogeneous groups. Applied Measurement in Education. Oliveri, M. E., Ercikan, K., & Zumbo, B. (in pressb). Analysis of sources of latent class DIF in international assessments. International Journal of Testing. Packer, M. (2001). The problem of transfer, and the sociocultural critique of schooling. Journal of the Learning Sciences, 10, 493–514. Poole, D. (1994). Routine testing practices and the linguistic construction of knowledge. Cognition and Instruction, 12, 125–150. Ricœur, P. (1990). Soi-‐même comme un autre [Oneself as another]. Paris, France: Seuil. Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351–357. Roe, R. (2012, July). Using referents to improve generalization in psychological research. Paper presented at the symposium ‘Limits of Generalizing in Psychological Research’ at GENERALIZING FROM EDUCATIONAL RESEARCH 36 the 30th International Congress of Psychology Cape Town, South Africa. Roderick, M. (2012). Drowning in data but thirsty for analysis. Teachers College Record, 114 (11). Rogosa, D. R. (1995). Myths and methods: "Myths about longitudinal research," plus supplemental questions. In J. M. Gottman (Ed.), The analysis of change (pp. 3–65). Hillsdale, NJ: Lawrence Erlbaum Associates. Rogosa, D. R., & Willett, J. B. (1985). Understanding correlates of change by modeling individual differences in growth. Psychometrika, 50, 203–228. Roth, W.-‐M. (2009a). Phenomenological and dialectical perspectives on the relation between the general and the particular. In K. Ercikan & W.-‐M. Roth (Eds.), Generalization in educational research (pp. 235–260). New York, NY: Routledge. Roth, W.-‐M. (2009b). Specifying the ethnomethodological “what more?” Cultural Studies of Science Education, 4, 1–12. Roth, W.-‐M. (2012). First person methods: Toward an empirical phenomenology of experience. Rotterdam, The Netherlands: Sense Publishers. Roth, W.-‐M., & Gardner, R. (2012). “They’re gonna explain to us what makes a cube a cube?” Geometrical properties as contingent achievement of sequentially ordered child-‐ centered mathematics lessons. Mathematics Education Research Journal, 24, 323–346. Roth, W.-‐M., Lee, Y. J., & Hwang, S.-‐W. (2008). Culturing conceptions: From first principles. Cultural Studies of Science Education, 3, 231–261. Roth, W.-‐M., McGinn, M. K., Woszczyna, C., & Boutonné, S. (1999). Differential participation during science conversations: The interaction of focal artifacts, social configuration, and physical arrangements. Journal of the Learning Sciences, 8, 293–347. Roth, W.-‐M., & Tobin, K. (2002). At the elbow of another: Learning to teach by coteaching. New York, NY: Peter Lang. Saxe, G. B. (1991). Culture and cognitive development: Studies in mathematical understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. GENERALIZING FROM EDUCATIONAL RESEARCH 37 Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-‐experimental designs for generalized causal inference. Boston, MA: Houghton-‐Mifflin. Slavin, R. (2008). Perspectives on evidence-‐based research in education—What works? Issues in synthesizing educational program evaluation. Educational Researcher, 37 (1), 5–14. Snow, R. E. (1989). Aptitude-‐treatment interaction as a framework for research on individual differences in learning. New York, NY: W. H. Freeman. Song, M., & Herman, R. (2010). Critical issues and common pitfalls in designing and conducting impact studies in education: Lessons learned from the What Works clearinghouse (phase I). Educational Evaluation and Policy Analysis, 32, 351–371. Stein, M. L., Berends, M., Fuchs, D., McMaster, K., Saenz, L., Yen, L., Fuchs, L. S., & Compton, D. L. (2008). Scaling up an early reading program: Relationships among teacher support, fidelity of implementation, and student performance across different sites and years. Educational Evaluation and Policy Analysis, 30, 368–388. Subramanian, S. V., Jones, K., Kaddour, A., & Krieger, N. (2009). Revisiting Robinson: The perils of individualistic and ecologic fallacy. International Journal of Epidemiology, 38, 342–360. Tobin, K. (2009). Repetition, difference, and rising up with research in education. In K. Ercikan & W.-‐M. Roth (Eds.), Generalizing from educational research: Beyond qualitative and quantitative polarization (pp. 149–172). New York, NY: Routledge. Tuomi-‐Gröhn, T., & Engeström, Y. (2003). Conceptualizing transfer: From standard notions to developmental perspectives. In T. Tuomi-‐Gröhn & Y. Engeström (Eds.), Between school and work: New perspectives on transfer and boundary-‐crossing (pp. 19–38). New York, NY: Pergamon. Turner, E. O., & Coburn, C. E. (2012). Interventions to promote data use: An introduction. Teachers College Record, 114 (11). GENERALIZING FROM EDUCATIONAL RESEARCH 40 Figure 2. a. In correlational research, a trend is observed and generalized to the population. b. PISA 2009 reading results for Canadian boys and girls. The grey represents presence of boys who score higher than some of the girls and girls who score lower than some boys. GENERALIZING FROM EDUCATIONAL RESEARCH 41 Figure 3. Pattern of Vygotsky’s (1971) derivation of the psychology of art (contradiction of emotions that move in two opposing directions). The generalization is true for every case.