There's been a lot of discussion about how the NSA is spying on everyone in the world, which is a huge civil-rights problem. If everything you've ever written to a lover or a friend, every file you've ever downloaded, and a minute-by-minute record of your whereabouts as determined by your phone, is archived forever, someone who wants to blackmail you can probably find something in the archive to offend any particular group they want you to offend, prosecute you wherever you happen to live, and therefore make demands on you. So the NSA, or anyone with access to their archives, will have extensive influence on world politicians throughout the next half-century, even if their spying were to stop tomorrow.
Beyond the utility of such an archive as a means of coercion per se, it can be used to amplify more traditional means of coercion. It's common for political leaders and even rich people to employ bodyguards and keep their whereabouts private in order to frustrate kidnapping and assassination attempts. But, even if your cellphone records from your teen years don't reveal where you're likely to vacation with your family, even if your bodyguards are always at your side day and night, and even if the NSA and their Russian counterparts don't have the political backing to blow you up with a Hellfire missile or shoot you with a quadcopter-mounted rifle, the social-graph data of who you know will provide ample "soft targets" --- a tactic that has already been used, for example, in the FBI's malicious prosecution of US journalist Barrett Brown. (They raided his mother's house and filed trumped-up charges against her: "obstruction of justice," for not knowing he had his laptop at her house. She pled guilty.)
Any government agency armed with a computer-generated list of your thirty-two closest friends and family members can surely find one of them who is poorly guarded and easy to kidnap or threaten. By this means, social-graph data amplifies traditional means of coercion.
You might think that regulatory and administrative oversight of the NSA can solve this problem. I don't think it will, for two reasons.
First, the NSA is already in a strong position to retaliate against any politicians that attempt to rein it in, using blackmail.
Second, if the NSA doesn't do it, other agencies will. Current computer security is extremely poor; a smart kid in Iran can probably manage to download cellphone location records of people in the US en masse; and the relevant social-graph information is largely public on Facebook. As has been the case with bombings for over a century, our best defense is that not very many people want to do it badly enough to dedicate the necessary part of their life to the problem. However, as with bombing, given that this is now a viable path to geopolitical power, there will be no shortage of organized groups (such as the US military and the militaries of other countries) who organize themselves to take advantage of the opportunity. Indeed, any entity seeking power through coercion who fails to take advantage of these opportunities will probably be subverted by entities that do.
There's still the matter of budget. If we stipulate that the information is readily available, how much does it cost to turn it into a Big Brother Database that enables this kind of coercion? (Let's assume the agency pays for this itself instead of using storage it borrows without permission.)
There are currently about 7 billion people. Each of them might have 1000 social connections that matter; these social connections range over several orders of magnitude of importance. If you need 16 bits to represent the importance of the relationship and 40 bits to identify the person who is related to, that's 56 trillion bits, or 7 terabytes. A one-terabyte disk currently costs about US$100, so storing the world social graph — if you could get hold of it — would cost you about US$700 of disk. You'd probably want to store it with some redundancy to handle disk failures, indices might add another factor of 2, and motherboards, power supplies, etc., might add another 25%. All in all we're talking about a budget of some US$2500 to store the world social graph, assuming you can get hold of it.
However, disk prices are still in an exponential fall, halving every 15 months. In three years, in 2016 or so, the price will be around US$600, and three years from then, US$150.
The next step, presumably, is the location information database. The Earth is about 20 million meters from pole to pole, so locating someone on Earth's surface to within one meter — good enough for targeting a bomb — requires about 50 bits of information. Let's round up to 64, 8 bytes. Collecting this information every ten minutes for a typical 30-year lifetime adds up to 12.6 megabytes per person, or 88 petabytes in all: about US$9 million of disk storage at present, shrinking to US$1 million around 2020 and US$1000 around 2035 (although the population will be somewhat higher then, so the cost will be slightly higher). But this information is highly compressible, so these cost numbers might be high by an order of magnitude or so.
However, even if you collect this much information originally, you don't have to retain it all for it to be useful. If you summarize to the three most common places in which a particular person can be found, then instead of 12.6 megabytes per person, you only need 24 bytes per person, or 170 gigabytes for the world population, about US$20.
Collating multiple sources of location information might inflate the 12.6 megabytes by a factor of three or so. For example, license-plate cameras, gait recognition from security cameras, public-transit card tracking, ticket-purchase information from airlines and long-distance bus lines, private jet flight plans, and location information on social-media postings can all provide supplementary location information. Disagreements among these sources might point to cases where someone's trying to hide something. For example, if someone's cell phone jumps from one city to another at over 200 kilometers per hour, they've probably taken a plane, and there should be a flight record unless they're traveling under an assumed name.
This kind of information also supplements the online-gathered social graph with information about who you move from place to place with (perhaps you're riding in their car) and who you lend your car to.
So what about the Blackmail Communications Database? This is more difficult. A basic version might be some ten 100-byte SMS messages per day per person, some 1000 bytes. 1000 bytes per day per person over 30 years is about 11.0 megabytes per person, similar in size to the location information database; you also need the metadata of who the message was to and when it was sent, which perhaps inflates it by 10% or 20%, so let's say 13 megabytes. If you also include IM and email, you might have another order of magnitude on top of that, or 130 megabytes. (Some people send more, some send less.) That brings us up to almost 900 petabytes for the world: US$90 million.
Any spy agency in the world with a billion-dollar budget has surely already done these calculations and has been running this program for years.
At some point in the near future, perhaps around 2020, DNA sequencing will be as inexpensive as license-plate scanning is today. Whenever you touch an object that isn't yours, such as a doorknob or hot-water tap, or release bodily fluids such as urine into a public receptacle, you'll be taking the chance that it's sampling your skin cells. Combined with the location database, this will rapidly provide a clear picture of the genetic ancestry tree of living humans, which can be added to the social-graph database to provide a more complete picture, including corrected biological paternity data. That is, you'll be able to infer whose biological father, grandfather, or great-grandfather is someone other than who it's conventionally assumed to be.
In its full form, the genome of a human being is 3 gigabasepairs, or 750 megabytes, although this is highly compressible. But you don't need 750 megabytes to uniquely identify a person; you only need a judiciously chosen 66 bits to provide a unique identifier with high probability. Reliably inferring biological relatedness probably requires more bits, but I don't think very many more.
This data, interchangeably with other biometric surveillance data such as facial recognition and gait recognition, will eliminate anonymity and pseudonymity for anyone who traverses public spaces, at least from the agency or agencies that have access to it.
This seems to point to a fairly dystopian future, one in which a friendship with, or relatedness to, a powerful person could get you kidnapped or murdered to coerce their compliance, and in which access to a relatively small and easily copied database is sufficient to provide this ability to the NSA --- and whoever manages to steal a copy of it from them. What are our options for avoiding this?
You could try to keep your part of the social graph secret: don't post it on web sites, say. But that doesn't help if the other people you know, or the people who see you together, go ahead and post their photos online.
Today there are companies whose business is to deanonymize web visitors http://www.forbes.com/sites/adamtanner/2013/07/01/heres-some-companies-who-unmask-anonymous-web-visitors-and-why-they-do-it/ and to sell listings of sightings of a given license plate http://www.forbes.com/sites/adamtanner/2013/07/10/data-broker-offers-new-service-showing-where-they-have-spotted-your-car/ so you can tell where a car has been. These services will get more comprehensive, cheaper, and more numerous, as improving fabrication and analysis technology progressively drives down the cost of providing them.