Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli

Appunti data analysis, Appunti di Analisi Dei Dati

Appunti data analysis della professoressa Raines corso di data analysis

Tipologia: Appunti

2021/2022

Caricato il 11/01/2023

Annasofia123
Annasofia123 🇮🇹

3.9

(19)

25 documenti

Anteprima parziale del testo

Scarica Appunti data analysis e più Appunti in PDF di Analisi Dei Dati solo su Docsity! Archives  are potentially information systems The primary function of records and record keeping, the enabling of high organized, large-scale social and material existence, has remained more or less constant over history Archive  contain primary sources documents that have accumulated over the course of an individual or oganization’s lifetime, and are kept to show the function of that person or organization (creator) Who Is this creator? (soggetto produttore) The entity, the family or the person who has set up, accumulated and/ or stored documents in the conduct of their personal or corporate activity What Is an information system? Any organized system for the collection, organization, storage and communication of information (in order to be able to eventually retrieve it) What are the information systems? -data warehoures -enterprise resource planning -enerprise systems -expert systems -search engines -geographic information system -global information system -office automation Historical archives are information systems but they do not necessarily use data management techniques: 1. Filing plan (titolario)  classification system designed to sort logically the flow of documents that constitute the archive 2. Protocol register (protocollo)  a register containing unique, consecutive numbers assigned to records and including additional information about the identity of persons involved and the documentary context of the record Logic of archive -accumulates, preverves and provides access to historical records -arranges and describes documents (order, provenance, content) This is called documentary perspective Logic of information system - Collects, stores, provides access to information - Arranges and describes data (entities, properties, relations) This is called database perspective The sharing of information is done through a documental flow between the offices, which means that one office is authorized to read the others’ records but not to store them The life of a person, the story of an object, an institution or a place, a building is therefore completely fragmentized and scattered, each office holds only a tiny piece of information Behind every single record there is a long chain of records that corroborates the information supplied in the record What is the task of big data? To create the necessary links between different contents while respecting their context in order to put together the pieces and create new entities, while leaving intact the information system We cannot access the data stored in the historical archives only using the metadata description  this supplies the context We are looking for data stored in the archival contents What are the difference between metadata and data? In order to give context to the data we need the metadata Metadata The historical method The record created, accumulated, and used by a person or group in the course of life and work are to be kept together and not intermixed with records from other sources Six method: - The general and particular political context of a given period - The institutions in general and in particular of a given period - The institutions of the archive’s creator - Its bureaucratic reality - Its management methods - Its archival practice What is archival science? It is a systematic body of theory that supports the practice of appraising, acquiring, authenticating, preserving, and providing access to recorded materials Archival science is mainly interested in the context of the record Data gets more specified, although in both no explanation is given to locate the meeting in a specific context Polish Minister Jerszy Kowalski met German Ambassador Johann Schmidt on Tuesday 6 September 1939 in Warsaw to discuss peace terms. - Is it a fact? Definitely, it is a complex one - Is it an historical fact? Yes, persons, functions, date, place and objectives are identified and set into context What is the difference between the last two sentences in terms of data? The reason for the meeting gives the context for the historical complex data How do we treat the data of these clusters? Common acces keys to information system On a metacontent level only 4 categories are recurrent: - Persons - Places +date - Institutions - Objects Organizing our data - First document  names and family names of the specific course - Second document  student number with exam votes - Third document  names, family names and student numbers of the MA program Can we assemble all the information included in these three documents into a unique database? Each document contains a series of specific information that serve a specific purpose A few information appear in more that a document and allow us to create a link between two typologies of information Each document supplies specific information but It is only when we reassemble them in a structures database that we have a full understanfing of the event Inserting the information in a unique database allows us to structure the data as we please and put it in relationship from diverse viewpoints Yet, the database supplies us with information we could not have had before: 1. The percentage if the students wo were present at the exam 2. The vote and identity ot the students 3. The percentagr of the students who frequent the course out of the total number of the MA program’s students Document modelling What are we interested in? Identification of the objects of interest of a domain: - Concepts and entities (classes) - Information (attributes) - Associations (relationships) Model  describes the things of significance to a domain Entity  a person or a house Concept  the price of a house Attributes  the name of the person, the location of the house Relation  the person lives/works in the house, the house costs this price Challenges and prerequisites Modelling of historical data - Incomplete data - Uncertain data - Disambiguation - Multiple interpretations - Space-time integration Interoperability - Syntactic (format) - Sematic (model)  semantic web Which format? Resource description framework Triple RDF (subject, predicate, object) Which semantics? RDF provides a generic, abstract data model for describing resources. However, it is semantic-agnostic, in the sense that it does not provide any domain-specific terms to describe thing of the world and how they relate to each other Ontologies - Description of shared meanings in a domain (database values) - OWL  web ontology language: o Existing standards (XML, RDFS) o Formally defines (based on logic) Data model RDF Resource  person, theory, chair, book, temperature, event What is an algorithm? - It is a sequence of calculations done in a precise logical order to reach a determined scope - Algorithms are at the base for the instructions given to automatic machines and computers - Examples from everyday life: rubik cube, chess, sudoku, recipes… Everyday recipes can be expressed in an algorithm The mathematical tool to do so is a flow diagram The golden rules for content management Before using whatever miraculous program which will reveal to us the mysteries of the universe, some preliminary steps must be taken  Define the objective and the possible deliverable/s of the project  Define the unit/unity of material to work on  Consider the relationship between context and contents  Consider the basic structure of each series  Consider the role of each series in the overall unity/archive  Consider the inner relationship of the series’ items  Consider the trans-relationship of each series to other information systems  Distinguish between primary and secondary sources*  Search for recurrences, structural repetitions, similar item repetition – in short for patterns Primary sources A primary source provides direct of firsthand evidence about an event, object, person, or work of art. Orimary sources include historical and legal documents, eyewitness accounts, results of experiments, statistical data, pieces of creative writing, speeches, and art objects Secondary sources Secondary sources describe, discuss, interpret, comment upon, analyze, evaluate, summarize, and process primary sources
Docsity logo


Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved