Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Techniques & Visualizations for High-Dimensional Data in CS 7450 - Prof. John Stasko, Assignments of Computer Science

This document from a cs 7450 information visualization course explores various techniques for representing high-dimensional data, including scatterplots, parallel coordinates, star plots, and more. The document also discusses potential limitations of these techniques and introduces mathematical approaches like multidimensional scaling and principal component analysis. The class focuses on visual techniques for understanding large, high-dimensional datasets and their applications.

Typology: Assignments

Pre 2010

Uploaded on 08/05/2009

koofers-user-t29-1
koofers-user-t29-1 🇺🇸

10 documents

1 / 32

Toggle sidebar

Related documents


Partial preview of the text

Download Techniques & Visualizations for High-Dimensional Data in CS 7450 - Prof. John Stasko and more Assignments Computer Science in PDF only on Docsity! 1 High-Dimensional Data Representations CS 7450 - Information Visualization Feb. 2, 2006 John Stasko Spring 2006 CS 7450 2 Last Time • We looked at parallel coordinates, one way of projecting >2 variables down onto the 2D plane 2 Spring 2006 CS 7450 3 Techniques So Far Scatterplot Matrix Chernoff Faces Star Plots Parallel CoordinatesStar Coordinates Spring 2006 CS 7450 4 Potential Limitations • What happens when you have lots and lots of data cases? 5 Spring 2006 CS 7450 9 Our Focus • Visual techniques • Many are simply graphic transformations from N-D down to 2-D Spring 2006 CS 7450 10 Use? • What kinds of questions/tasks would you want such a technique to address? − Clusters of similar data cases − Useless dimensions − Dimensions similar to each other − Outlier data cases − … • Think back to our “cognitive tasks” discussion 6 Spring 2006 CS 7450 11 Today • We’ll examine a number of other visual techniques intended for larger, high- dimensional data sets Spring 2006 CS 7450 12 Can We Make a Taxonomy? • D. Keim proposes a taxonomy of techniques − Standard 2D/3D display Bar charts, scatterplots − Geometrically transformed display Parallel coordinates − Iconic display Needle icons, Chernoff faces − Dense pixel display What we’re about to see… − Stacked display Treemaps, dimensional stacking TVCG ‘02 7 Spring 2006 CS 7450 13 Dense Pixel Display • Represent data case or a variable as a pixel • Million or more per display • Seems to rely on use of color • Can pack lots in • Challenge: What’s the layout? Spring 2006 CS 7450 14 One Representation Each variable is in a window Data cases in grid in each window Similarity of window views tells you about similarity of dimensions Uses color scale 10 Spring 2006 CS 7450 19 Relevance Factor • How close an item is to the query • Data items have some value that can be numerically quantified • Each dimension is some distance away from query item • Sum these up for total distance • Relevance is inverse of distance Spring 2006 CS 7450 20 Example • 5 dimensions, integers 0->255 • Query: 6, 210, 73, 45, 92 • Data item: 8, 200, 73, 50, 91 • Distance: 2 + 10 + 0 + 5 + 1 = 18 • Relevance: 1275 - 18 = 1267 11 Spring 2006 CS 7450 21 Issues • What if dimensions are real numbers or text strings? • What if they’re the same type, but of different orders of magnitude? • Have to define some kind of distance, then a weight function to multiply by Spring 2006 CS 7450 22 Technique • Calculate relevance of all data points • Sort items based on relevance • Use spiral technique to order the values – Emanate out from center • Color items based on relevance 12 Spring 2006 CS 7450 23 Relevance Colors High Low Empirically established Spring 2006 CS 7450 24 Technique 0 21 3 456 109 8 7 15 Spring 2006 CS 7450 29 Grouping Arrangement Spring 2006 CS 7450 30 Example Display Multi-window Grouping 8 dimensions 1000 items 16 Spring 2006 CS 7450 31 Paper Recap “Pixel bar charts: a visualization technique for very large multi-attribute data sets” Daniel A. Keim Ming C. Hao Umesh Dayal Meichun Hsu Information Visualization 2002 Summer Adams Spring 2006 CS 7450 32 Overview • Traditional bar charts are limiting • Visualize data points within aggregations • Introduce Pixel Bar Charts 17 Spring 2006 CS 7450 33 Approach • Consists of 5 tuple <Dx, Dy, Ox, Oy, C> • Setting up the pixel bar chart − Partition the data set − Determine the pixel color − Place the pixels Spring 2006 CS 7450 34 Examples 20 Spring 2006 CS 7450 39 Individual Dimensions Spring 2006 CS 7450 40 Individual Dimensions Let’s take a closer look... 21 Spring 2006 CS 7450 41 The values of the data points mapped to colors of pixels in a given glyph... Spring 2006 CS 7450 42 The values of the data points mapped to colors of pixels in a given glyph... 22 Spring 2006 CS 7450 43 The values of the data points mapped to colors of pixels in a given glyph... 1st 2nd 3rd etc... Spring 2006 CS 7450 44 Slide 7: What’s Wrong… • Want to display high dimensional data. • We have parallel coordinate plots, scatter plot matrices, etc.? − Don’t scale well to MANY dimensions and many items. • Project the data down? − Curse of dimensionality. 25 Spring 2006 CS 7450 49 Slide 12: The Contribution • Highly scalable way to view dimensional relationships. • Computationally efficient. • Uses MDS for dimensions, not just points. Spring 2006 CS 7450 50 Paper Recap “A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization” Author: Matthew Ward Information Visualization (2002) Sheena Lewis 26 Spring 2006 CS 7450 51 What is a Glyph? • A graphical entity with p components, each of which may have r geometric attributes and s appearance attributes • A powerful communication mechanism in that a large number of data dimensions can be incorporated into the attributes of a single shape or symbol • Must be placed at a location in 2D or 3D display space Spring 2006 CS 7450 52 Goal of the Paper • To present a comprehensive taxonomy of glyph placement strategies to support the design of effective visualizations 27 Spring 2006 CS 7450 53 Glyph Placement Strategies • Data-driven: based on two or more data dimensions • Structure-driven: methods based on an explicit or implicit order or other relationship between data points Spring 2006 CS 7450 54 Data Driven Placement Raw Data • Up to three of the data dimensions are used as positional components • Conveys detailed relationships between the dimensions selected • Spotfire, 3D box glyphs Derived Data • An analytic process to generate positions using the data values and perhaps other information • Reflects a combination of all the dimensions in order to convey N-dimensional relational information in a smaller number of display dimensions 30 Spring 2006 CS 7450 59 The Curse of High Dimensionality • Utilize low-dimensional representations • Problems − 3D projections: hard to navigate, data is not always visible − 2D (non-axis-parallel projections): hard to interpret − 2D (axis-parallel projections): limited by 2 dimensions, large number of possible projections Spring 2006 CS 7450 60 Rank-By-Feature Framework • Exploit familiarity of 1D and 2D projections • Allow for task-oriented interaction • Hierarchical Clustering Explorer (HCE) • Graphics, Ranking, and Interaction for Discovery (GRID) principles: − User guides interaction by ranking data 31 Spring 2006 CS 7450 61 Demo Spring 2006 CS 7450 62 Administratia • HW 3 due Tuesday − Questions? • More project information 32 Spring 2006 CS 7450 63 Upcoming • InfoVis systems & toolkits − Reading: Eick paper Heer paper (for those with programming inclinations) • Commercial InfoVis systems − No papers to discuss (just demos for HW4) • Tufte (please be reading) Spring 2006 CS 7450 64 Sources Used CMS book Referenced articles Marti Hearst SIMS 247 lectures Kosslyn ‘89 article A. Marcus, Graphic Design for Electronic Documents and User Interfaces M. Monmonier, How to Lie with Maps W. Cleveland, The Elements of Graphing Data C. H. Yu, Visualization Techniques of Different Dimensions http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html http://www.csc.ncsu.edu/faculty/healey/PP/PP.html
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved