Scarica Big Data Implications: Transforming Society and Industries e più Dispense in PDF di Elementi di Informatica solo su Docsity! Big Data Implications Giovanni Giuffrida December 5, 2018 1 The Big Data based new society 2 New society: Flying objects 5 New society: Self driving cars It’s happening! 6 New society: Self driving cars 7 Big Data is surely helping computers passing the Turing test! 10 Big Data revolution implies big shifts • From exact to approximate • From sampling to all (n = All) • From causality to correlation 11 From exact to approximate • Increasing data size and speed leads to “inexactitude” • Data in database are never “clean”: With small data you can afford to clean those • Good enough is “good enough” with big data • Willing to sacrifice a little accuracy in favor of general trends • Big data transforms figure more into probabilities... And this is ok in many fields 12 n = All • No more sampling!! • Statistics based in the past hundred years on sampling... find the “best” “smallest” “most representative” sample • This was accepted as a matter of life • Reality: poor technology to process ALL data • Some industries developed around this concept, e.g.: Surveys 15 n = All... some issues there • Sampling works well at macro-level • Like a picture: good from the right distance, blurry when close • In a sense, the sample is chosen depending on the “distance” you look at it • You may need to reprocess data with new samples in order to change “distance” • Sampling may not work well for outliers detection 16 From causality to correlation • Big data is about what not why • Stop searching for “causality” • Correlation doesn’t tell us why something is happening • If Millions of cancer patients who drinks orange juice and get one aspirin a day get better... do we really care why? • Prediction based on correlations is central in Big Data • It leads to big (BIG) social implications 17 7. Mexican lemon imports prevent highway deaths.
16
R°=0.97
15.8
156
154
19989
Sources:
U.S. NHTSA, DOT HS 810 780
US. Department of Agriculture
Total US Highway Fatality Rate
200 250 300 350 400 450 500 550
Fresh Lemons Imported to USA from Mexico
(Metric Tons)
20
. Eating organic food causes autism.
The real cause of increasing autism prevalence?
25000. . 300000
a Autism
200004 ® Organic Food Sales
© ‘200000
9 5 15000
E
9 E 10000
Da 100000
1=0.9971 (p<0.0001)
0
ELOLEELELEEEE
Year
pasoubelg sjenpjAIpuj
2
Using Internet Explorer leads to murder.
Internet Explorer vs Murder Rate
18000 —____—_—_—___m_m—90%
17,200 se
16,400
60%
15,600
14,800 1%
14,000
2006 2007 2008 2009 2010 2011
> Murders in US M_ Internet Explorer Market Share
2
• CYC vs Watson • Two (very) different approaches • CYC was “embedding” knowledge • Watson is able to “learn” from huge amount of data 25 Cyc is an artificial intelligence project that attempts fo assemble a
comprehensive ontology and knowiedge base cf everyday common sense
Knowierige, with the goal of enabling Al applications to perform human-iike
ressoning.
The project was started in 1984 by Douglas Lenat at MCC and is developed
by the Cycorp company. Parts ol Ihe project are released as OpenCye,
mhich provides an API, RDF endpoint: and data dump under an open
source lizense.
Contents &
ove
Original author{s) Doules Lenet
Developer(s)
nitiai release
Stable rolease
Written in
Too
Website
Cycorm, Inc.
1904: 31 years ago
40113 un® 2012:2 years
290
Uisp. Ove
Ontology and Infrence
engine
vmiovccom a
gropeue
26
27
The IBM Jeopardy Challenge represents a milestone in
the development of artificial intelligence, and is part
of Big Blue's centennis] celebratio
“We are at a very special moment in time," said Dr
John E. Kelly Il IBM Senior Vice President and
Director ofIBM Research. “We are at i moment where
computers and computer technology now have
approsched humans, We have created a computer
system that has the ability to understand natural
tiuman language, which is very difficult thing for
computersto do.”
Named after TRM founder Thomas J. Watson, the
supercomputer is one of the most advanced systems
vn Rarthund was programmed by 25 IBM scientists
over the last four years. Researchers scannod some
200 million pages of content — or the equivalent of
about one million books — into Lhe system, including
books, movie seripts and entire eneyelopedias.
30
How It Works
IEM Watsor Health is improving health by bringing the world's data to cur daily lives.
31
IBM’s Watson – the language-fluent computer that beat the best human champions at a game of the US TV show Jeopardy! – is being turned into a tool for medical diagnosis. Its ability to absorb and analyse vast quantities of data is, IBM claims, better than that of human doctors, and its deployment through the cloud could also reduce healthcare costs. 32 AlphaGo • 3000 years old game • Simple board • Before 2016 it was considered to be “impossible” to model • Many (many) more combinations compared to chess • “the most elegant game that humans have ever invented”; “simple rules that give rise to endless complexity”; “more possible Go positions than there are atoms in the universe” • Mostly based on “intuition” 35 The sharing economy Linking people with surplus goods with people who can make use of them • Natural way to optimize overall stock of cars, bedrooms, etc. • Reduce the need to produce more • Richness redistribution • Transactions made directly between provider and consumer 36 • Policymakers: How to regulate it? • How to collect tax • How to guarantee public safety • How to protect old-style worker categories • Not a clear winning strategy yet 37 Who owns the Big Data? • Rio is the first city to collect real-time data from • Waze (drivers) • Moovit (public transportation) • Strava (cycling) • Daily aggregated view of 110.000 drivers • 60.000 daily incidents reported each day • Huge cost saving: from camera and sensors to smartphones • Better traffic alerting for citizens • Data exchange between Rio and Waze/Moovit • Duying data from Strava 40 Significant social and political implications Big Data are crucial for more and more public sectors • “Silicon Valley” owns the largest portion of worldwide Big Data • “Silicon Valley” knows a lot about our health • “Silicon Valley” knows a lot about urban transportation • “Silicon Valley” knows a lot about our education • “Silicon Valley” knows a lot about travellers and housing 41 Significant social and political implications If “Silicon Valley” could offer basic needs (from health to education to public transportation) why do we maintain our fat government and/or why do we need to pay tax? 42