Wednesday, June 12, 2013

Using Metadata to Find Paul Revere

Excerpts from a tongue-in-cheek article on what metadata can reveal.  The suggestion that metadata is of little consequence when spying on citizens is, of course, a deliberate half-truth.  Either it is (1) significantly revealing, as has been emphasised by the government in its pursuit of criminals, or (2) it is not, as has been claimed by the government regarding the analysis of virtually all uninvolved and unsuspected citizens.

“Social Networke Analysis,” 

a small encroachment on freedom, identifies terrorists in the Colonies.

London, 1772.
I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the newfangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. ... I shall also endeavour to show how these methods work in what might be called a relational manner.
Here is what the data look like.
...  So this Samuel Adams person (whoever he is) belongs to the North Caucus, the Long Room Club, the Boston Committee, and the London Enemies List. 
... I will simply start at the very beginning and follow a technique laid out in a beautiful paper by my brilliant former colleague, Mr Ron Breiger, called “The Duality of Persons and Groups.”...
At this point in the 18th century, a 254x254 matrix is what we call Bigge Data.... Anyway:
Notice ... We did not start with a “social networke” as you might ordinarily think of it, where individuals are connected to other individuals. We started with a list of memberships in various organizations. But now suddenly we do have a social network of individuals, where a tie in the network is defined by co-membership in an organization. This is a powerful trick.
... This time, the result is a 7x7 “Organization by Organization” matrix, where the numbers in the cells represent how many people each organization has in common. ...
Again, interesting! (I beg to venture.) Instead of seeing how (and which) people are linked by their shared membership in organizations, we see which organizations are linked through the people that belong to them both. People are linked through the groups they belong to. Groups are linked through the people they share. ... Here’s what that looks like.
And, of course, we can also do that for the links between the people, using our 254x254 “Person by Person” table. Here is what that looks like.
 ... Look at that person right in the middle there. ... His name is Paul Revere.
Once again, I remind you that I know nothing of Mr Revere, .... All I know is this bit of metadata, based on membership in some organizations.  ... Here are the top betweenness scores for our list of suspected terrorists:
Perhaps I should not say terrorists so rashly. But you can see how tempting it is. ...  There is something called eigenvector centrality, ... Here are our top scorers on that measure:

Here our Mr Revere appears to score highly alongside a few other persons of interest. And for one last demonstration, a calculation of Bonacich Power Centrality, ....
And here again, Mr Revere—along with Messrs Urann, Proctor, and Barber—appear towards the top of our list.

The full article originally appeared on Kieran Healy’s blog.
