Download Understanding Information Systems: Types, Characteristics, and Retrieval and more Study notes Information Systems in PDF only on Docsity! cis20.2 design and implementation of software applications II spring 2008 session # II.1 information models and systems topics: • what is information systems? • what is information? • knowledge representation • information retrieval cis20.2-spring2008-sklar-lecII.1 1 what is information systems? • the field of information systems (IS) comprises the following: – a number of types of computer-based information systems – objectives – risks – planning and project management – organization – IS development life cycle – tools, techniques and methodologies – social effects – integrative models cis20.2-spring2008-sklar-lecII.1 2 types of information systems • informal – evolve from patterns of human behavior (can be complex) – not formalized (i.e., designed) – rely on “word of mouth” (“the grapevine”) • manual – formalized but not computer based – historical handling of information in organizations, before computers (i.e., human “clerks” did all the work) – some organizations still use aspects of manual IS (e.g., because computer systems are expensive or don’t exist to relace specialized human skills) • computer-based – automated, technology-based systems – typically run by an “IT” (information technology) department within a company or organization (e.g., ITS at BC) cis20.2-spring2008-sklar-lecII.1 3 computer-based information systems • data processing systems (e.g., accounting, personnel, production) • office automation systems (e.g., document preparation and management, database systems, email, scheduling systems, spreadsheets) • management information systems (MIS) (e.g., produce information from data, data analysis and reporting) • decision support systems (DSS) (e.g., extension of MIS, often with some intelligence, allow prediction, posing of “what if” questions) • executive information systems (e.g., extension of DSS, contain strategic modeling capabilities, data abstraction, support high-level decision making and reporting, often have fancy graphics for executives to use for reporting to non-technical/non-specialized audiences) cis20.2-spring2008-sklar-lecII.1 4 why do organizations have information systems? • to make operations efficient • for effective management • to gain a competitive advantage • to support an organization’s long-term goals cis20.2-spring2008-sklar-lecII.1 5 IS development life cycle • feasibility study • systems investigation • systems analysis • systems design • implementation • review and maintenance cis20.2-spring2008-sklar-lecII.1 6 social effects of IS • change management • broad implementation (not just about software) • education and training • skill change • societal and cultural change cis20.2-spring2008-sklar-lecII.1 7 integrative models • computers in society • the internet revolution (internet 2, web 2.0) • “big brother” • ubiquitous computing cis20.2-spring2008-sklar-lecII.1 8 information theory today • total annual information production including print, film, media, etc is between 1-2 Exabytes (1018) per year • how to we organize this??? • and remember, it accumulates! • information hierarchy: data → information → knowledge → intelligence cis20.2-spring2008-sklar-lecII.1 17 information retrieval • information organization versus retrieval • organization: categorizing and describing information objects in ways that people can use them who need to use them • retrieval: being able to find the information objects you need when you need them • two key concepts: – precision: did I find what I wanted? – recall : how quickly did I find it? • ideally, we want to maximize both precision and recall—this is the primary goal of the field of information retrieval (IR) cis20.2-spring2008-sklar-lecII.1 18 IR assumptions • information remains static • query remains static • the value of an IR solution is in how good the retrieved information meets the needs of the retriever • are these good assumptions? – in general, information does not stay static; especially the internet – people learn how to make better queries • problems with standard model on the internet: – “answer” is a list of hyperlinks that then need to be searched – answer list is apparently disorganized cis20.2-spring2008-sklar-lecII.1 19 IR process • IR is iterative • IR doesn’t end with the first answer (unless you’re “feeling lucky”...) • because humans can recognize a partially useful answer; automated systems cannot always do that • because human’s queries change as their understanding improves by the results of previous queries • because sometimes humans get an answer that is “good enough” to satisfy them, even if initial goals of IR aren’t met cis20.2-spring2008-sklar-lecII.1 20 “berry-picking” model (from Bates 1989) • interesting information is scattered like berries in bushes • the eye of the searcher is continually moving • new information may trigger new ideas about where to search • searching is generally not satisfied by one answer cis20.2-spring2008-sklar-lecII.1 21 information seeking behavior • two parts of a process: – search and retrieval – analysis and synthesis of search results • search tactics and strategies – tactics ⇒ short-term goals, single actions, single operators – strategies ⇒ long-term goals, complex actions, combinations of operators (macros) • need to keep search on track by monitoring search – check: compare next move with current “state” – weigh: evaluate cost/benefit of next move/direction – pattern: recognize common actions – correct: fix mistakes – record: keep track of where you’ve been (even wrong directions) • search tactics – specify: be as specific as possible in terms you are looking for cis20.2-spring2008-sklar-lecII.1 22 – exhaust: use all possible elements in a query – reduce: subtract irrelevant elements from a query – parallel: use synonyms (“term” tactics) – pinpoint: focus query – block: reject terms • relevance — how can a retrieved document be considered relevant? – it can answer original question exactly and completely – it can partially answer the question – it can suggest another source for more information – it can provide background information for answering the question – it can trigger the user to remember other information that will help answer the question and/or retrieve more information about the question cis20.2-spring2008-sklar-lecII.1 23 parametric search • most documents have “text” and “meta-data”, organized in “fields” • in parametric search, we can associate search terms with specific fields • example: search for apartments in a certain geographic neighborhood within a certain price range of a certain size • the data set can be organized using indexes to support parametric search cis20.2-spring2008-sklar-lecII.1 24 zone search • a “zone” is an identified region within a document • typically the document is “marked up” before you search • content of a zone is free text (unlike parametric fields) • zones can also be indexed • example: search for a book with certain keyword in the title, last name in author and topic in body of document • does this make the web a database? not really (which you’ll see when we get into database definitions next week) cis20.2-spring2008-sklar-lecII.1 25 scoring and ranking • search results can either be Boolean (match or not) or scored • scored results attempt to assign a quantitative value to how good the result is • some web searches can return a ranked list of answers, ranked according to their score • some scoring methods: – linear combination of zones (or fields) – incidence matrices cis20.2-spring2008-sklar-lecII.1 26 linear combination of zones • assign a weight to each zone (or field) and evaluate: score = 0.6∗ (Brooklyn ∈ neighborhood)+0.5∗ (3 ∈ bedrooms)+0.4∗ (1000 = price) • problem: it is frequently hard for a user to assign a weighting that adequately or accurately reflects their needs/desires cis20.2-spring2008-sklar-lecII.1 27 incidence matrices • recall = document (or a zone or field in the document) is a binary vector X ∈ {0, 1}v • query is a vector • score is overlap measure: |X ∩ Y | • example: Julius Caesar The Tempest Hamlet Othello Macbeth Antony 1 0 0 0 1 Brutus 1 0 1 0 0 Caesar 1 0 1 1 1 Calpurnia 1 0 0 0 0 Cleopatra 0 0 0 0 0 score is sum of entries row (or column, depending on what the query is) cis20.2-spring2008-sklar-lecII.1 28