Saturday, June 21, 2014

Granularity and Big Data - 2nd Start Examining the Chicago Tribune Point of Entry to the Problem

After getting side tracked in my prior blog entry by getting the lay of the land straight before starting out to where I want to go and the purpose of the journey this entry gets the trip going.

Refresher:  It started in the prior blog entry with this report in the Chicago Tribune:  New Sensors Will Scoop Up Big Data On Chicago.

We are all Chicago!

This from the Tribune link:

"Researchers have dubbed their effort the "Array of Things" project. Gathering and publishing such a broad swatch of data will give scientists the tools to make Chicago a safer, more efficient and cleaner place to live, said (Charlie) Catlett, director (computer scientist) of the Urban Center for Computation and Data, part of a joint initiative between the University of Chicago and Argonne National Laboratory."

Examine the Argonne National Lab.  It brought us the atom bomb and is still involved in national security.  

Wikipedia link to Charlie Catlett here

Catlett (bio at this link) is quoted frequently in the Tribune report.  He said this:

"Catlett said the fact that all of the data collected will immediately be published will also expose the project to ongoing scrutiny."  Later he says:  we made the decision that the (sensors) will not save address data, and will only count nearby devices." 

This is a fact:  Cell phone sensors sense cell phone ID.  That is the data that cell phones broadcast...but it will not be saved (inhaled).  Since sensors are directional they will also collect location distance.  They will also collect travel in the owner's pocket or car as well as location stops along the way.  What if the owner stops to rob a bank?  Need to know trumps privacy.  No way of knowing if all the data from every device is not saved.  The NSA motto: "Get it all, Save it all".

By what wild stretch of the imagination will "all the data will immediately be published"?  Never in a million years I think.  Especially the data on cell phones.  All the data.  All the data will never be public.  Some carefully screened maybe or what best serves PR.

All?  Never gonna happen.  Publish it all and sooner than real time smart technicians are going to connect it up to real people and their privacy.  At about the same time, the NSA will publish all the data it collects.  Highly redacted.  What will most likely happen is that all the data from one or a few compartmented operations will be published and it will be called "all".  At least all that the compartment truthfully knows about.

A more truthful statement by Catlett here:

To date, startup money for the first phase has come from Argonne, but a National Science Foundation grant application is pending, Catlett said, and he expects corporations that want to use the system to "pay their way" down the road.

Fred Cate says:

"Almost any data that starts with an individual is going to be identifiable," Cate said. When tracking activity from mobile phones, "you actually collect the traffic. You may not care about the fact that it's personally identifiable. It's still going to be personally identifiable."

He got that right!

Gary King, director of the Institute for Quantitative Social Sciences at Harvard University.  is quoted as saying this:

"King, the Harvard sociologist and data expert, agreed that the Chicago scientists will inevitably scoop up personally identifiable data.  If they do a good job they'll collect identifiable data. You can (gather) identifiable data with remarkably little information," King said. "You have to be careful. Good things can produce bad things."

Gary King writes about "Big Data" here at this link"Why Big Data is a Big Deal" appearing in the Harvard Magazine.

The experts quoted in the Tribune report as well as their associated institution relationships are excellent points from which to examine my objective of learning more about Granularity and Big Data.

Granularity and Big Data.  Those two conceptual things fascinate me.  They are the two extremes in the continuum from smallest uniquely identified entity of information to its SuperClass aggregate using my high level Object Oriented design approach to "things", their "methods" and their message interactions.

No comments: