Saturday, September 28, 2013

The NSA Master Data Managment and Sensemaking

Jeff Jonas has a conceptual pointer here to the idea that there has to be some unique entity (ultimately to be identified if not known) to which information is associated.  He has a much better way of saying it, an ability I wish I had so I will simply defer to his statement:

"This article suggests that the single most fundamental capability required to make a sensemaking system is the system’s ability to recognize when multiple references to the same entity (often from different source systems) are in fact the same entity.  For example, it is essential to understand the difference between three transactions carried out by three people versus one person who carried out all three transactions.  Without the ability to determine when entities are the same, it quickly becomes clear that sensemaking is all but impossible."

Jeff is giving me a whole new lexicon!  "Entity Resolution"  "Semantic Reconciliation"  Terms that encapsulate my concepts in a couple words that become shorthand for complex problem domain.  Oh the joy of discovery of what he says here. He  draws the conceptual entity picture so well and clearly in his definition of terms.  Must me channeling or something!

I continue to think that there must be an NSA master ID for all known citizens.  In light of Jeff's point of entry terminology for the problem domain of dis-ambiguating entities it might be called my "Entity Resolution ID"


Jeff structures big data in two categories MDM and Sensemaking.  Each defined here.  An excellent first cut at the problem.  MDM being more in house structure of the information primarily related to specific corporate mission performance.  Sensemaking bringing in all that external jumble of information the company has little control over structure and making some sense of it through integration with the MDM.  Some enterprises are more oriented toward one or the other but all, if to be competitive and successful integrate both to an appropriate degree.  The NSA has both aspects.  The known information of government designed systems.  The external sea of information with infinite degrees of of relationship to be sifted for sense.


In that sea that is the source of Sensemaking it seems to be to make the most sense to capture all information all the time.  Or, given real world limitations; get all you can given current tech capabilities.  Give it a first pass sifting.  Pick out the real time gems, prioritize/categorize the rest, even if it is only pen register.  Store all the NSA can store.  Build as much more storage as possible as fast as possible to store more.  Data not captured or identified as data to be captured in a distilling process required by various limitations is data that will often never be know in connection to any other data or of limited value due to its distillation from the raw source.

To me it is a given that the NSA can obtain and maintain a vast amount of MDM data that should have some kind of entity resolution ID number that identifies at least 95% of all citizens of the USA???  Makes sense so the use of multiple question marks is not necessary!  I would bet on it.  With odds.

What is my NSA entity (virtual me) ID number?

In the Sensemaking domain I would imagine that the NSA deals more with place-holder entity resolution ID's.  They do this even when there are a number of probable entities with different attributes some or all that may in fact be the same entity.  Obvious:  The goal is single entity ID differentiation resolution.  Ultimately it is a single person (or entity like a location or account) with a unique identifier.  The target of interest that is the mission objective.  Everyone must have an identifier to which all related info is associated to in order to identify the target of interest through info related to that target.  Just like google needs all info related to all possible search entries in order to satisfy specific search requests.

This is Jeff's paper on IBM Infosphere Sensemaking

No comments: