Wednesday, May 7, 2014

Photographic Fingerprints

Tracing photographs to the specific camera device that took the picture is an interesting idea that is discussed at this link. 

The tracing data is the noise that each unique sensor may introduce to the picture that is uniquely different than any other sensor.  In other words a sensor finger print that exists by virtue of some constantly fixed random occurrence of noise to every picture the camera takes.  Could work for video also would be logical.

Exif is related meta data about a digital picture explained at this Wikipedia link.  It exists in a file structure related to the picture.  Any form of digital fingerprint to be derived from the essence of the photographic pixel content of the picture would be embedded data, just like our real fingerprints, or other form of unique personal biometric ID is embedded in us.  A digital photo fingerprint could however be removed by some method, although there could be ways to prohibit that or perpetuate original ID data.  Exif is a file related to the picture and there are apps that remove it or it can be disabled in user settings.

What can be done with a digital photo fingerprint?  It can match all photos that were taken with the same photographic device.  The ability to do that created by the consistency of noise patterns is evidently inherent in the physical sensor and/or hardwire processing between sensor and storage.  So what is that matching capability worth?  What does it say?  That two photos were taken with the same device.  Devices can be linked, but not uniquely, to people.

What if the sensor was designed to put a unique device identifier like a MAC number (Media Access Control Address) into the picture by way of inserting pixels into the picture that would be distributed in the picture so as not to be visible or detectable?  Certainly possible when there are megapixels and a minimum number of manipulated pixels necessary to code a MAC number into the picture would not be noticed, especially if hashed into the picture.

If it could be done with pictures then why not any stream of digital data?  Voice?  Video? It would be easy to do.

Identification by means of built in code and device introduction of that code to embed in the content of the digital data the device produces would give potentially absolute proof ID of a device association to the data it produced.  Other data in addition to MAC number such as geographic info, etc could also be introduced.  The device could be made to say all types of information in its digital expression over the internet.

While the device itself could introduce its identification data to its digital form output much of the same information could be introduced to the digital stream at a second step of processing once the data is transmitted via the internet.  More unique internet path info could be introduced by embedding into to the original digital content stream at any downstream point.

The cryptographic hash function is nothing new and is explained at this Wikipedia link.

The idea of designing a cryptographic hash function into a device for the acknowledged or stealth introduction of information into the digital content it produces in order to identify the unique device that produced it is an interesting idea related to traceability and surveillance.

More on  device fingerprint at this Wikipedia link. 

So, what does this blog entry really have to say about uniquely identifying a device that originated a digital stream of data like a photo, video or voice transmission?

This:  Identify the device by embedding information in the stream of data that the device produces.  That is much different than Identifying the device by association to information about the device external to the intended payload stream of raw digital content that the device produces.  Device ID is inherent in whatever the device produces.

Days after this posting I ran across this link.  

There are many unique characteristics introduced to digital data by the hardware component that acquires it that can link to a certain specific device.  These unique characteristics are inherent in the physical properties of the component.  Matching them to the device however takes some comparison.  If the characteristics were intentionally introduced to the components at manufacture and recorded in a data base then no comparison and association to the device would be necessary.

The most elegant way to trace is to have the tracing factor embedded in the production/processing of the digital content payload.

Electronic components are not the only devices that encode unique identifying characteristics to communications.  The human voice does that too.  Telephone calls can be recorded and identified to the caller as well as data mined for revealing information contained in the vocal expression other than the words used but how the words are vocalized.  That is what this link discusses.  This is the lead of that link:

No comments: