UC Berkeley School of Information, May 8-9, 2014
Panel: Where's the Beef: Is Big Data Having a Real Impact on the Economy?
Panelists: Adam Ghobarah (Google Venture), Tjarko Leifer (Climate Corporation), Joe Reisinger (Premise Data), Prasanna Tambe (NYU)
Unsurprisingly, the collective answer to this question was
yes, and the panelists provided some examples where and how business strategy was improved by making data-driven decisions.
Again, I felt a little uneasy about the fact how pretty much all discussions and scenarios are focused on centralized scenarios, and thus essentially are about Big Data. Few seem to even consider decentralization and service-driven approaches, but maybe that's fair since the event is called
Data Edge and not
Service Edge. But then again, given that at least some talks explicitly mentioned Data Science, I would have hoped for more focus on ecosystem thinking, and less focus on building/running data crunching systems.
Talk: When Data Science Meets Design
Panelists: Alan McConchie (Stamen Design)
Stamen focuses on using Data to build visualizations, but they don't claim to do
Data Science. Instead, they work with Data Science teams which then generate data for visualization. The goal is to create beautiful, engaging, and accessible projects that delight and inform the public.
Alan is presenting many different visualization, mostly focusing on geographical data and maps. This definitely makes you think a lot about how many options there are to present and perceive data, which can make a very big difference even if the underlying data and maybe even analytics are the same.
Panel: How Surveillants Think
Speaker: Nils Gilman (UC Berkeley), Jesse Goldhammer (Deloitte), Gilman Louie (Alsop Louie), Ejovi Nuwere (Kaori-san), Vivek Wadhwa (Singularity University), Pete Warden (Jetpac)
The panel starts by talking about 9/11 and Snowden. Panelists point out that Big Brother couldn't even have dreamed of smartphones. And yet we pay for them.
One decidedly absurd line of argument is about how only because in retrospect you can find some data pointing to something you know did happen (in this case, 9/11), it makes any sense at all to conclude that you could have predicted this event, had you only had more resources and/or more willingness to violate people's rights. The simplicity and shortsightedness of this argument has bothered me since it was first brought up after 9/11, and yet people can still get away with it. This was a rather political panel, and thus that's probably why absurd claims were made, and nobody objected.
More specifically, when it comes to managing one's own data: does sharing have to mean giving away all information and all control over it? Is it possible to imagine that there's a different model out there? Maybe people can have tethers to all the information, still owning their information. Google shouldn't own the data people are generating, they should just have a license to use it, and people can revoke that license. But is it really possible to control information this way, given that much/most data cannot be easily tied to just one owner? One more overly simplistic and rather absurd claim, but again panelists got away with it.
Talk: Using Practical Data for Conversion Rate Optimization
Panelists: Kyle Rush (Optimizely)
An interesting story about better understanding your audience and then being able to tweak your data capture. In this case, the story is about the Obama 2012 campaign, and how the online donation campaign was improved by observing errors, understanding how they originated, and then make small improvements to the online form (such as help texts on the online form), which can result in significant reduction of errors.
Talk: Using Data Science to Make Fun Games
Panelists: Kenneth Yu (Kabam)
The company's data scientist team acts as
internal consultants, working in areas such as user acquisition, customer retention, monetization, strategy and competitive analysis, and game design.
Data can be used to inform game developers, looking at user feedback to design elements of games. This is possible because of the online nature of games, and cannot be done in the same way in more traditional games on consoles or computers. However, the secret sauce is combining this feedback with the art of game design to make and improve designs.
Talk: Constructing Experiments to Inform Business Innovation
Panelists: Stephen Brobst (Teradata)
Justifying decisions you've already made with data is not the best way to do Data Science. In hindsight, you'll always find some explanation to justify or explain a decision. The good way to use Data Science is to design experiments that allow you to make decisions that change the way you do business. The problem is how fast you can go through this cycle.
Success is the combination of Science and Art. Using only mathematics will get you stuck. You need crazy ideas to allow you to move towards more interesting ways to do business.
Data Scientist Skills: Curiosity, Intuition, Data gathering, Statistics, Analytic Modeling, Communication. Very different from computer scientists.
Data science needs to be cheap, so that experiments can be cheap, and failure of those is cheap. In such an environment, the ratio of success and failure still results in a positive ROI.
I think this was my favorite talk of the conference. What I liked was the fact that it did not put the cart in front of the horse. Instead of saying that data science and big data are great, now what can we do with them, it asked about valuable questions you might want to ask, that it's not always easy to find and ask those questions, and that data science and big data simply are tools that then allow you to answer those questions, and maybe to do it with questions you could not answer before.
Panel: A Conversation with Hugh Williams
Panelists: Hugh Williams (Pivotal), Michael Chui (McKinsey)
Little to say here other that I absolutely enjoyed the conversation as a story how some pretty smart guy moves through an amazing number of interesting places throughtout his career. Also, this conversation made me want to join Pivotal.
Panel: Relying on Data Science: Reproducible Research and the Role of Policy
Panelists: Fernando Pérez (Henry H. Wheeler Jr. Brain Imaging Center), Philip Stark (UC Berkeley), Victoria Stodden (Columbia University)
The goal is provide evidence that your result is correct. Is it possible to generate evidence of correctness without making things completely reproducible?
OSTP guidelines have affected the thinking about practices and tools when it comes to data management and publishing. It seems like the topic of reproducibility has gone mainstream, and it is even mentioned in non-specialist publications.
Fernando talks about how there are some offerings in the curriculum, but there is not enough training, and based on the current structure, these offerings do not meet the demand. More content is needed in terms of matching various education levels, and various scientific disciplines where data science methods are relevant.
Talk: Can Microdata Tell Us Anything About Macroeconomics?
Speaker: Joe Reisinger (Premise Data Corporation)
The not very surprising answer to this question is: It depends. Data may allow us to learn interesting things, if we look in the right places. But you need to look, and it does not necessarily help you with explaining what is going on.
Talk: Understanding the Natural World Through Spatial Data
Speaker: Kevin Koy (Geospatial Innovation Facility, UC Berkeley)
Since I have a lot of interest in geospatial data and services, I thoroughly enjoyed this talk about Berkeley's Geospatial Innovation Facility. Like Alan McConchie's talk, this talk illustrated that humans are very spatial creatures, and that a lot of data that we have does have a spatial component to it. Combine these two facts, and it becomes apparent that geospatial data and services are a very large and important space, when it comes to Big Data and/or Data Science.