Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli

Digital Data: Problems and Opportunities in the Age of Information, Sintesi del corso di Analisi Dei Dati Per La Ricerca Sociale

The challenges and benefits of using digital data in various contexts. Topics include issues with data reliability, ethical concerns, and the difference between native and digitalized data. The document also discusses the importance of self-reported and behavioral data, and the role of big data in social sciences. Problems with access biases and distortions in social media data are also addressed.

Tipologia: Sintesi del corso

2019/2020

Caricato il 17/02/2022

marta2126
marta2126 🇮🇹

4 documenti

1 / 9

Toggle sidebar

Anteprima parziale del testo

Scarica Digital Data: Problems and Opportunities in the Age of Information e più Sintesi del corso in PDF di Analisi Dei Dati Per La Ricerca Sociale solo su Docsity! 1. Main problem and opportunity in using digital data Problems: fake profiling because the more the coding process requires a high computational structure, the more likely there are errors; coverage of data because not all of us produce the same level of data, not all of them have the same level of coverage; data capitalism because we must always ask ourselves who buys the data, how they use them, how much we are aware of where the data will go; data reliability because in the analog world it is easier to identify the error, while in the digital world there is only the final data for which it is difficult to enter the creation of the process; ethical problems because data always has an owner for whom there is always a positive or negative manipulation, we must understand what kind of manipulation it is and ask ourselves what is the ethical limit how far to go. Opportunities: digital data allow us to profiling, classify, build ideal types of subjects, profiling can be both qualitative and quantitative, whatever it allows us to have classification points on which to apply our choices, digital techniques allow us to greater collection of information, rapid organization of information and greater computational capacity in putting information together; they allow forecasting, to predict what will happen, what our actions will lead to; allow to evaluate, the most complicated operation because it depends on a set of randomness models, is the area where digital data has made the most difference. 2. Difference between digital native data and digitalized data There are three large big data families determined by three levels: level of structuration, level of digitalization and level of access in everyday life. The level of digitalization of the data identifies two types of data: native digital data and digitalized data. Native digital data means data purpose origin in digital world, information that comes from the natural process of life, are not collected directly for the purposes of the researcher; before they were defined secondary data. They are fluxes data because are consequences of digital behaviour. Digitalized data means data that are born in the analog world and then digitilized, by nature they are not digital but they live in the digital world for our convenience. They are stock data because are more episodic. For example if you use a google form data is not natural behavior in digital worlds. 3. Distorsion and bias of digital native data All social research methods have underlyng assumptions about human nature and about the way people make decisions, create their opinions and behave. Two traditional model used to understand how people make decision are rational choice theory and oversocialized view. Rational choice theory is a model that say people’s preferences have a well-defined structure and the choice between courses of action is an almost automatic mechanism in which the individual applies his or her system of preferences to a limited set of options. Oversocialized view is a model that say the people are sensitive to the opinions of others and so their decisions are made after socialization with people. The data used to study people’s behaviour are of two type: self-reported data, they are data create by what people reported about themselves from surveys and interviews; behavioural data, they are data produce by the actual actions and behaviour carried out by someone, they are data collected not with surveys, but tracking the movements of people thoughts for example GPS phone. There are difference between these two types of data and for example self- reported data create some limits to the researchers like social desirability bias like people want to appear in certain way to the society and so they change their behaviour in relation to the norms of the society and give inaccurate information about hate behaviour, and another problem can be the difficult for people to report their sensations. Two of the most common problems engendered by people's reactions are the hawthorne effect and the social desirability effect. The Hawthorne effect refers to the fact that individuals modify their behavior in response to their awareness of being observed. The latest privacy data scandals have made users more aware that they are being watched and monitored. The effect of social desirability means the tendency of the subjects and reports the socially acceptable responses, thinking of being under observation, this behavior is implemented to project a favorable image of oneself and avoid negative evaluations. Social media is particularly affected by social desirability because people manage their online presence by creating a positive image of themselves. 4. In digital data self reported are better then behavioural data? We distinguish self-reported data and behavioral data. Data self-reported are used to collect information, data and study people's behavior, what they report, their opinions, social norms, attitudes and beliefs. These are often defined as self-reported meaning that researchers rely on participants of a study to report on something they have done or on what they think or believe. For example surveys and interviews of all sorts are self-reported data. Observational / behavioral data are data that consider human behavior as the result of mutual influences, of conscious acting, prejudices and rational influences from the environment. Collecting observational/behavioural data has been very difficult and expensive for social scientist, to keep track of people ’s actual behaviour could be done only for small groups of people and for a very limited amount of time. Now we have digital traces of people’s actual behaviour that were quite simply never available before because the availability of digital data has brought us a large increase in behavioural data. In digital data self-reported data will remain an important source of information for social scientist but at the same time the availability of behavioural data will function as complementarity data to understand complex social phenomena. Data self-reported are not better then behavioural data, both types of data are required for digital data and are complementary. In particular, behavioural digital data are the object of attention of a new generation of social scientist who believe in their potential to bring about a regeneration of the current theories. 5. What kind of human thinking method is closer to digital native data strategy? Since the late 1990s psychologist have distinguished between two systems of thought with different capacities and processes: System 1 and System 2. System 1 or automatic thinking considers the conative dimension, responds to a stimulus that is reactive, is a thought system that works when we are in a hurry, is made up of intuitive thoughts of great capacity, is based on associations acquired through experience and quickly and automatically calculates information. The System 1 is characterized by quick, automatic, no effort, no sense of voluntary control, use shortcuts; it is a continuous construal of what is going on at any instant. The process associated with the System 1 has been defined Type 1, it is fast, automatic and unconscious. Behavioral data are characteristic of System 1. System 2 or systematic reflective thinking considers the reflective dimension, it is the system of thought that works when we have to reflect and weigh all the characteristics of a situation, involves low-capacity reflective thinking, is based on rules acquired through culture or formal learning and calculates information in a relatively slow and controlled manner. The System 2 is characterized by slow, effortful, there is an attention to mental activities requiring it, ruled-based. Self-reported data are characteristic of System 2. The method that is closer to digital native strategy is System 1 because most social networks are designed for small expressions of thought, the logic of the hastahg is reactive, we do not put a like by making a reflection, it is something automatic. Depending on the type of thought that we want to collect, we must design and build the appropriate tools, in fact the behaviors are more suitable for following the activated, rapid, reaction thinking system while the tools that allow you to self-report allow greater depth in the understand the individual. Anycase we have got some biases and we should be careful of implication of methods strategy. 6. The three V of Big Data The three V of big data are: volume, velocity and variety. Volume refers to the quantity od data produced, this is the most salient feature of big data, the sheer amount of data created by digital services and goods. Velocity stands for the fast-moving nature of digital data that are often produced ‘on the fly’, for example when we search something on a search engine the exact list of results is generated instantly from our query. Variety refers to the multiplicity of formats that data can have in the digital world. The latter is a source of richness but also of troubles for social science researchers. This is because big data is generated by a vast range of largely invisible processes with frequently incomparable dimensions and different degrees of dimensionality. There are essentially three positive features of big data: increased size and resolutions, size refers to the sheer number of cases or participants that we can include in our research, but another way of interpreting the big in the big data is not only about the number of cases but about their increased resolution, the numbers of data points available for each subject; big data are long longitudinal means create opportunity for the social sciences which have historically encountered high costs and technical difficulties in collecting data over long periods of time; non reactive heterogeneity because in most cases big data are not collected by means of direct elicitation of people but in the background. 7. Problem of validation for digital data The problem of validation for digital data is one of the main challenges in the digital environment. A measured variable is considered valid for the construct if it is an adequate representation of the theoretical construct of interest. The reuse of collected data for other purposes poses challenges to the constructive validity of our indicators; the volume, variety, and speed of big data often accompany a lack of control and validation over how variables are measured and which variables are most reliable and secure. As a result, researchers using big data are forced to establish the validity of the construction using statistical measurements and analysis in an attempt to establish the validity of the construct. There are two main means by which researchers demonstrate validity: CFA and MTMM. The dominant approach is to evaluate the internal structure of a measure using confirmatory factor analysis and in this approach the researchers test both an a priori structural factor to determine the overall fit of the model, and the hypothesized structural factor against plausible alternatives. (CFA). Alternatively, historically, researchers used a multitrait- multimethods matrix to establish construct validity through a convergent and discriminating validity test model. Researchers collect data from multiple methods and compare the resulting correlations (MTMMs). 8. In what sense the use of digital data implies a reversion in the indicator’s construction process? Change the conceptual construction structure of a research. In conventional research we have the concept, the indicator, the question, the variables and the construction of the data matrix, so from the general concept we get to the variables. The researcher chooses the number of indicators to evaluate how many indicators are valid and reliable, it goes through the indication ratio, the researcher maximizes the overlap with the concept and minimizes the extraneous part. It is a deduction process. Concepts → indicators → property → variable. In digital research, on the other hand, we start from the variables, the properties referring to the subjects are defined, the most suitable variables are defined to represent the concept. It starts with several variables until you choose only the useful information. An optimization report is run. It is an induction process. Variable1/variable2/variable3/variable4 → property → indicators → concept. 9. Obtrusive and unobtrusive data In a research project the third point I have to face is to identify the methods to be used: SCA. Sample, I select my sample; Collection, I explain how I can collect the information; Analysis, I define the terms of the analysis. When I talk about collection I can find some data collection techniques that are neutral, others that involve an immediate reaction of the subject. There is a unobtrusive method, a non-reactive method, and an obtrusive method which is a reactive method. The distinction between these two modalities of data collection is important in the social sciences because people “react to researchers” measurements and also can figure out what a researcher’s goals are. The distinction between obtrusive and unobtrusive methods is not new to digital social research, in the pre-digital age interviews and focus groups were typical reactive data collection methods, now data collection methods has increased compared to the pre-digital past. Surveys, interviews and experiments are all examples of reactive modes of collection. The analysis of online content in its various forms, quantitative and qualitative, are instead a form of nonreactive data collection. The opposition does not arise in the digital world but has always been there because by definition the subjects have always been reactive, any stimulus we propose to the subjects they will respond accordingly, even if no stimulus is proposed to them they will react to the stimuli coming from the environment. Our behavior is always a reaction to something, we live in a strategic arena where we live strategically according to the actions of others, actions that are governed by shared rules and norms. From a methodological point of view, this is an opposition between observation and formulation of the question, the claim is that the observation is less reactive than the question. Two of the most common problems engendered by people's reactions are the hawthorne effect and the social desirability effect. The Hawthorne effect refers to the fact that individuals modify their behavior in response to their awareness of being observed. The latest privacy data scandals have made users more aware that they are being watched and monitored. The effect of social desirability means the tendency of the subjects and reports the socially acceptable responses, thinking of being under observation, this behavior is implemented to project a favorable image of oneself and avoid negative evaluations. Social media is particularly affected by social desirability because people manage their online presence by creating a positive image of themselves. 17. The use of Focus group in digital world A qualitative method of data collection is the focus group. The core purpose of focus groups stems from the interest in social groups and group behaviour led by many researchers who decided to employ group interviews in their research. Qualitative sociologies have used focus groups to study a myriad of group behaviour topics, including social interaction patterns. Online focus groups also known as OLFGs, are the internet equivalent of traditional offline focus groups, using specialized chat software to bring people to a designated website to conduct discussions. When we have an active structure, we are in the world of interviews for which chat interview, chat group, email interview, everything that involves the active participation of the researcher has to do with group interviews., If instead we use the structural passive we are in the world of the observation for which dynamic tracking of social media, internet forum, retrieval of posts archive. When we talk about focus groups in the digital world we can distinguish synchronous and asynchronous. Online focus groups have not physical contact and the intimacy of face-to-face group but most of the characteristics of an offine focus group are also repeated in the online focus group, in particular for the synchronous focus group online in which the composition of the members and the appropriate selection of the number of focus group members are similar to the offline focus group. In fact we talk about selecting pre-existing groups, called also natural groups, and purpose-constructed groups. Pre-existing groups may take a variety of forms: individuals who are no more than acquaintances, individuals in work setting, social groups, family groups, friendship groups. These groups pose fewer problems in terms of creating the level of interaction between members that a focus group is meant to simulate. I select one group over another depending on the topic and the researcher's question, when pre-existing groups are not appropriate then I build purpose-constructed groups. These groups allow participants to speak freely because they do not know each other, a factor that may not occur in pre-existing groups as they could be influenced by the thinking of the people they know. When the researcher creates a purpose-built group, he must be careful to maintain the balance within it, without allowing any participant to prevail. In online focus groups, the researcher organizes warm-up tasks to make participants who do not know each other interact and uses a focusing exercise to concentrate the group's attention and interaction on a particular topic. Warm-up sessions and focusing exercises are necessary for online focus groups compared to traditional ones to make participants interact remotely, face-to-face offline groups may interact in a different way. There are problems in conducting focus groups online: the infrastructure in which it is recommended to invite 8 to 10 participants, as it is possible that technical problems such as poor connection or poor video quality and moderator function may occur which traditionally has the task of not letting one participant prevail over another, but maintaining the balance of the conversation, in the online focus group it therefore technically has the function of silencing users and allowing everyone to speak, a function that not all platforms have. 18. Pros of quantitative design in digital world The quantitative method uses numbers and a specific language defined statistic, such as a standardized questionnaire. The quantitative data becomes a number such as age or income; I simplify reality, I find simpler reference samples, I have data that can compare faster and the data construction themes are faster. There are three large families of data construction, one of them is digital quantitative design. In the pre-digital world, for example, standardized questionnaires were used, perhaps administered by an interviewer on the phone or in some cases by mail, now we can use quantitative tools for the administration of standardized questionnaires. The quantitative method has several advantages: - more speed, more contacts, less costs because the speed of administration of a form through the use of standardized questions brings a simplification of reality, precisely because it is highly standardized. More speed also means more contacts and negligible costs, physically submitting a questionnaire to a person cost much more, in order of costs the digital world from the analog world is incomparable; - no geographical boundaries, internet access is geographically tracked but not restricted; - no time limitation, there is no time limit in the campaign why we can decide that the questionnaire is available for months, a function that cannot happen in the physical world; - interactive feature building the questions and answers, it is very easy to prepare schemes that work on their own, a good questionnaire is made up of different paths to model the different structures of the respondents. Filters were also made in the world of face-to-face administration and was the main source of error. - complex track design. 19. Cons of quantitative design in digital world The quantitative method uses numbers and a specific language defined statistic, such as a standardized questionnaire. The quantitative data becomes a number such as age or income; I simplify reality, I find simpler reference samples, I have data that can compare faster and the data construction themes are faster. There are three large families of data construction, one of them is digital quantitative design. In the pre-digital world, for example, standardized questionnaires were used, perhaps administered by an interviewer on the phone or in some cases by mail, now we can use quantitative tools for the administration of standardized questionnaires. However, the quantitative method also has several disadvantages, there is a no-go list: - partial coverage of target pop online, for example I will never do an online questionnaire on the elderly Italian population online because there is the obvious problem of a low elderly population interacting with a questionnaire, this problem has to do with the digital divide for which there is a divide in our country compared to '' access to the digital world which is generational, cultural and sometimes economic, in some cases even geographic; - confidentiality so risk of security or public exposition, for example, you should never do a search when there is a high degree of confidentiality for two main reasons, there are no inviolable servers and in some market research the client does not want to know that he is doing that research; - the online form doesn’t fit structured online methods, the form I build does not work adequately for the online method so the idea of building the questionnaire may not be able to be realized online; - interaction pop/proficiency tools, the online population exists but does not have the same potential for proficiency and if there are differences in the population, unwanted biases can be created; - comparison among pops so distribution of digital divide within contexts. If any of these reasons are evident in my research then I shouldn't choose quantitative design. 20. Visual design of a questionnaire in the digital world Web surveys represent an important advance in the evolution of self-administered questionnaires. A web surveys is a questionnaire accessed via an internet browser and it is typically created and operated via specialist online surveys software. The survey can be provided via a range of tools for example HTML o JavaScript. Web surveys are normally preferred to other online alternatives such as email surveys that can be in two ways, or to include the surveys in the body of the email, or to send the surveys as an attachment to the email. Web surveys design exploits a wide range of visual features, so respondents are likely to use visual features of the questions or response options as supplementary information to help them pin down the meaning of the question or the potential answers. There are many aspects of design to be considered to account for potential medium effects: first of all the basic layout of a web survey so the scrolling approach or the paging approach. Scrolling approach means that you only have one page and proceed with the procedure with the progress bar question by question, if they have filter questions and leave the responsibility for the filtering to the interviewee and it is not really a very good idea. Paging approach allows the interviewee not to struggle and have the simplest experience possible and allows you to better manage what you show the interviewee, it also allows you to build pages that appear to the interviewee only if you need filters, for example, even allows you to do sub-routes. The problem is that building a page template from the script construction point of view is much longer because in a scrolling design the questions are one after the other and you just have to write them, instead in a page design you have to design the pages, you have to decide when there is a breakdown between one page and another and many other elements to be careful about. It is necessary to pay attention to some non-trivial technical characteristics of visual design: colors, if I decide to use different colors I must remember that the interviewee will give them a certain meaning and instinct them with each other; the layout must be optimized for any digital device that the respondent will want to use; if I place a sentence in the center it will have a certain meaning so it can be a crucial or recurring sentence, for example; if I put words on the left and at the top it means that they are read first; if I put two questions together it means that they are two related questions; if two questions are similar it means that they have the same semantic area model; the higher the sentence, the better. There are no certain rules, but they are advice, these advices can be violated even with the aim of provoking our interviewee. There are several advantages in doing a web survey: first of all the marginal cost per case in web surveys is even smaller than with paper questionnaires where there are printing and postage costs for each additional case; the speed of both the deployment and the collection of data has greatly improved because web surveys can be collected in a matter of days and also data are already inputted ready to be analysed; geographical boundaries are so important element in web surveys because participants can be located anywhere and this expands the chance of cross-national comparisons as part of the research questions; time boundaries are another fundamental element in web surveys because people can respond to a web questionnaire at the time of their choosing, this is a double-edged sword because the participants can answer when time is available but also they might fill in the questionnaire while also engaged in other activities so with different distractions; interactivity and multimedia are important in web surveys because interactive elements are used to diversify the interaction between respondents and the questionnaire such as 3D viewing tools, books and magazines which turnables pages, ecc.; complex skip/routing so they can take the decision of which question to answer next out the hands of the respondent. There are also disadvantages to use web surveys: the target sample is not or only partially online, the study is highly confidential, the research topic is likely to interact the online medium, the stimuli can’t easily be rendered online, comparability issues. 21. Data –paradata – metadata The questionnaire does not just provide us with data, we can distinguish between three categories of information that we can collect using a web survey: the main data, metadata and the paradata. The main data are survey data, for example I ask a question and I get an answer that is encoded and ends up in the data matrix. I paradata are data about the process of collecting data, for example data that has to do with the behaviour during the response, how long I am on the same page, how many times I click on that page, how long it takes between a question and other, how many times do I change my answer. In the physical world these things were observed by the interviewer if there was an interviewer, in the digital world all these things can be detected by the data collection system and become variables in our data matrix and allow us to understand for example if the interviewee when he encountered question 8, which was difficult, he went away with his decision or thought about it. Paradata can be directly recorded so direct paradata, they are an integral part of available software to develop and carry out web surveys, direct paradata can be grouped in three categories: paradata about contact info that concerns the researcher’s actions, paradata about device type and paradata about questionnaire navigation and interaction, both paradata are more related to respondents’ actions; indirect paradata are those collected by adding instruments. The metadata are data about data, are the data that have to do with the access behaviour, how my population is making use of my questionnaire, for example time taken to complete the survey, channel by which participant accessed the survey, response rate for the survey, participation trend over time, language in which survey was completed. 22. The risks and the possible errors in quantitative design There are four sources of error using web surveys: - coverage error, this is the failure to give some members in the population any chance of being selected, is the most common source of concern. Even if we think that the digital divide can always exist, it must always be taken into consideration, there is not that all people do not have people excluded from a digital point of view. Furthermore, we must be careful of theoretical errors such as for example in the procedure for constructing indicators, questions, we must always do many tests and be sure that most errors have been kept under control. We cannot think of not making mistakes because we do not have reality but only a representation of reality that has a series of errors, the goal is to minimize the error. - sampling error, this is the failure due to a sample of participants having different characteristics from the overall population. The selection is always done by building a list, if we have the list available we can build a random sampling, every time we build a random sampling we come across a systematic error and an accdiental error. The systematic error is when we introduce a distraction because for example we send a questionnaire only via mil without bearing in mind that there are many people who no longer look at the email. The accidental error consists in the fact that each type of sampling introduces a certain amount of error because it is a simplification. You can also use non-statistically random methods, we do not have the list so we use a recruitment method where we multiply the sampling sources, we multiply the channels through which we send information. More multiplications allow for more heterogeneity. The data must be reported to the structural of the general population, if many answers are obtained then we proceed with the weighting, it does not solve the problem of lack of randomness, but helps to have a population behaviour as similar as possible to the target population. - non-response error, this is failure to collect data on all the people in the sample. People can decide not to participate so you have to work on the presentation of the questionnaire and how it looks, people do not participate because they do not trust the safety of the path, of anonymity, people can interrupt the questionnaire. - measurement error, this is failure due to inaccuracies in responses recorded by the survey instruments. People can lie so you need to enter control questions in the questionnaire. We must take into account the validity that how well the indicator we have chosen works well and carries little extraneous part, how suitable the question is and how much that question produces similar answers in similar people, how much that question is operational. 23. Main topic in experimental design An experiment in the social sciences is possible, but we must keep in mind that in the physical world the experiments are carried out in controlled environments where we consider how multiple causes of the same effect coincide and we decide that some causes are conditions that we keep under control. Social scientists aim at the study of an object that has enormous complexity: human behaviour. In the social sciences it is difficult to understand what is the relationship between the gears, to understand what is the cause and what is the effect, it is all more complicated than in the physical sciences. We need to be clear about one distinction between forms of causality in order to understand the difference between experiments and other methods. In the social sciences we have two separate forms of understanding causality: regularity theories of causation and manipulability theories of causation. Regularity theories as determined by temporal priority, spatiotemporal contiguity and constant conjunction. Experiments are grounded on manipulability theories, this approach views causes as variables that can be manipulated, with ad outcome that depends on the state of the manipulated variable. Experiments are therefore situated in a manipulability understanding of causality and their design is characterized by three main aspects: manipulation of one or more independent variables, use of controls such as randomly assigning participants to treatment or experimental or control groups, careful observation and measurements of one or more dependent variables. The primary goal of an experimental design is to establish a casual connection between the independent and dependent variables and the secondary goal is to extract the maximum amount of information with the minimum expenditure of resources. For understanding the logic of experiments are fundamental random assignment and the notion of validity. Random assignment distributes individual characteristics of participants across the treatment and control group so that they do not systematically affect the outcome of the experiments; helps ensure that error effects which refer to effect on the outcome variable that is not attributable to the manipulated variable, are statistically independent; creates a group of respondents who are at the same time of such division probabilistically similar on average. The notion of validity was divided in five concepts: internal validity, casual validity, construct validity, stochastical validity, external/ecological validity. Internal validity so how it can be close to the truth and reality and permit representativeness of context; casual validity so how it can model the correct causal scheme and permit the control of influence and bias factors; construct validity so how variables we are taking in account are similar and equivalent to the same ones in reality; stochastical validity so we can use inference from the results to general target of investigation; external/ecological validity are the consequences and modelization of contextual results. In the social sciences we always use random control trials (RCTs), controlled but random tests, so while in the physical sciences the measurements are deterministic, in the social sciences the evaluations of the relationships between cause and effect are probabilistic, they are based on inference statistics. 24. Causal model: what is it? For survive in quantitative research environment we can use quantitative methods to estimate the relation among different kind of object, the idea is to try to find a way to represent with some measure something that exists in reality but we cannot see and measure precisely. We are dealing with an extremely complex research object for which we can not measure anything, we can build estimates that allow us to say that in the quantitative world how much something works. In the quantitative world, we estimate the relationship between variables, a variable is something that varies over a period of time, for example age or qualification, concepts that subject to an operational definition become variables. We can also estimate the relationships between actors, organizations and any type of research activity, the idea is to use quantitative statistics to model networks. We can also use quantitative research to analyze texts, we can have the quantitative approach of text mining to understand how certain textual structures recur and the fact that they recur so many times what it means. There are casual and non-casual models. In a casual model there are one or more dependent variables which are explained by independent variables, a regressive type model. For example, there is a variable that represents a phenomenon that we want to explain, such as graduation results, there are one or more variables that are able to effectively explain the final graduation grade. It serves to explain how much each of the independent variables weighs more on the final result. Y = ax + e y is the dependent variable, a is how much the independent variable weighs with respect to the dependent variable, is my goal, x is the independent variable, and is the error. The non-casual model in which there is no dependent variable, but there are all the variables together and you try to understand how the relationship between these variables works, the goal is to understand if you can extrapolate a smaller set but clearer. 25. Difference among conventional analysis approach and computational one On the other hand, when we talk about the computational approach we are talking about models that analyse big data, where we did not build the data, in the world of native digital data it happens that we use algorithms that predict what will happen in terms of consequence given some causes. Algorithms allow us to make accurate predictions. Computational models are not probabilistic but take all the data and proceed through cross-validation techniques. Summing up conventional approach is characterized by: - Hypothesis testing and parameter estimation - Simple or complex random samples assumed - Significance tied to sampling assumptions and basis generalizability - Main effects more frequently than interactions - Nonlinear relations between X and Y often overlooked On the other hand, when we talk about the computational approach we are talking about models that analyse big data, where we did not build the data, in the world of native digital data it happens that we use algorithms that predict what will happen in terms of consequence given some causes. Algorithms allow us to make accurate predictions. Computational models are not probabilistic but take all the data and proceed through cross-validation techniques. Summing up computational approach is characterized by: - Prediction and model selection - Convenience samples acceptable and bootstrapping - Cross-validation instead of significance testing - Semi-automatic identification of interactions and semi-automatic identification of nonlinear relationships
Docsity logo


Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved