Skip to main content

The NSA is Spying pt. 2: The Gloomier - The Science They Don't Want You to Know

Hello internet, and welcome to The Science They Don't Want You to Know.  As I have mentioned in the first post of this series, I am doing research regarding the statistical viability of currently unconfirmed conspiracies (no leaked documents) by way of currently known conspiracies (documents have been leaked).  The primary purpose of this initial research is to gather particular information, specifically how many people were involved in the actual conspiracies and the length of time which these conspiracies took place.  If you have not read the first post, you should read it here.  For NSA part one of this post, it'll be found here.



If you'll recall last time, I was talking about how the Patriot act allowed the NSA to tap into the phone calls and emails during the Bush administration.  I covered the fact that this does not technically fall under the fourth amendment because conversations do not fall under the legal definition of property.  I left off last time with the FISA court telling the NSA that they have to get a warrant for every phone call and email which they collect data.  That should have been the end of the story, since Obama couldn't honestly use the line that Bush did about the legality of the NSA spying on the American population.

But in 2013, Edward Snowden leaked classified documents from the NSA about Project PRISM, and the Guardian was the first to reveal the story.  These files show that the NSA has still been performing mass surveillance on the Domestic front without warrants since 2007, which you'll notice is the opposite of what the FISA court dictated in the March 2009 ruling.  President Obama said in response to this that the NSA wasn't "listening to our conversations", just "collecting meta-data".  We don't have the documentation or the surveillance to confirm or deny his claims, so we cannot assume that he's lying.  Then-President Bush did the same thing, even as he had less of an understanding of what "meta-data" means.

The logic behind this surveillance without a warrant is that the information itself exited US Borders and entered again.  So while both ends of the communication was wholly within the United States, the information packets used to get the information from point A to Point B (both being in the US) travelled outside the United States.

These documents also show that a number of internet companies were involved in the program called PRISM, including Microsoft, Google, and Facebook.  The mechanism for the collection of these data which the NSA claims makes this a legal operation, is that the information which collects physically leaves the borders of the United States, than reenters the country, and the fact that it "enters from a geographically exterior location" makes it legal for collection, even as the sender and the recipient both are well within the borders of the United States.

So now we have both the start point and the end point, and now we have a time frame of this conspiracy.  That is, of course, assuming that the NSA is no longer spying on Americans without warrants.  As we know, that may not be the case, but assuming that it is the case, this conspiracy covers 11.5 years, from the day the Patriot Act was made into law to the day Snowden released the files relating to this case.

We just need a number for how many people were involved in this conspiracy.  It is safe to assume that everyone in the NSA headquarters worked to collect these data, even for a few cases, since it involves the collection of so many emails and phone calls.  It also must involve those at the very top of the companies involved, who knowingly participated in such activities.  There may have been technicians who did a little bit of coding which allowed the mechanism for data leaving and reentering the borders of America, but it is also safe to assume that these technicians didn't know why these codes were being written, either through lie or omission.  

What's the final number that would make this good for statistical analysis?  In a power point presentation, it was revealed that the number of employees were "more than 30,000 demographically
diverse men and women located at NSA headquarters in Ft. Meade, MD".  There is an interview where Ambassador John D. Negroponte Director of National Intelligence mentioned that "[...] with an Intelligence Community of close to 100,000 individuals across 16 agencies [...]", so one would think that this is the number we would use to approximate the value we want.  This would be wrong, because the NSA is one of the 16 intelligence agencies referenced, so the NSA would have less than  the quoted "close to 100,000 individuals".  So the number which we will use to make the approximation is the "more than 30,000 [individuals]".  Since it's "more than" 30,000, and there are agents external to the NSA involved, we can loosely justify a number of 31,000 for our figures.

If you remember the first post in this series, the point of obtaining those numbers is to get statistical values in order to find the statistical likelihood of supposed conspiracies.  Putting these numbers through the statistical machinery, it is seen that the probability of failure per person per year of the NSA spying on us is 2.0054E-04%.  This gives us a statistical average for the per-person-per-year failure rate of 1.6529E-04%.  Now, if it turns out that the NSA is still spying on us -- which I wouldn't doubt, seeing as they have been during the first two presidencies since the 9-11 attacks -- these statistical values will change on us.  Remember that these are the numbers for the known situations to date.

So until next time, take that as you will.
K. "Alan" Eister Î”αβ

Relevant Entries:
Start Here

Comments

Popular posts from this blog

Basic Statistics Lecture #3: Normal, Binomial, and Poisson Distributions

As I have mentioned last time , the uniform continuous distribution is not the only form of continuous distribution in statistics.  As promised, here are the three most common continuous distribution types.  As a side note, all sampling distributions are relative to the algebraic mean. Normal Distribution: I think most people are familiar with the concept of a normal distribution.  If you've ever seen a bell curve, you've seen the normal distribution.  If you've begun from the first lecture of this lecture series, you've also seen the normal distribution. This type of distribution is where the data points follow a continuous curve, is non-uniform, has a mean (algebraic average) equal to the median (the exact middle value), falls from highest probability at the mean to (for all practical purposes) zero as the x-values approach $\pm \infty$, and therefor has equal number of data points to the left and to the right of the mean, and has the domain of $(\pm \i

Confidence Interval: Basic Statistics Lecture Series Lecture #11

You'll remember last time , I covered hypothesis testing of proportions and the time before that , hypothesis testing of a sample with a mean and standard deviation.  This time, I'll cover the concept of confidence intervals. Confidence intervals are of the form μ 1-α ∈ (a, b) 1-α , where a and b are two numbers such that a<b, α is the significance level as covered in hypothesis testing, and μ is the actual population mean (not the sample mean). This is a the statement of there being a [(1-α)*100]% probability that the true population mean will be somewhere between a and b.  The obvious question is "How do we find a and b?".  Here, I will describe the process. Step 1. Find the Fundamental Statistics The first thing we need to find the fundamental statistics , the mean, standard deviation, and the sample size.  The sample mean is typically referred to as the point estimate by most statistics text books.  This is because the point estimate of the populati

Basic Statistics Lecture #5: Baye's Theorem

As promised last time , I am going to cover Baye's Theorem. If Tree diagram is the common name for Bayes Theorem.  Recall that conditional probability is given by $P(A \mid B) = \frac{P(A \wedge B)}{P(B)}$.   For tree diagrams, let's say that we have events A, B 1 , B 2 , B 3 , … (the reason we have multiple B's is because they all are within the same family of events) such that the events in the family of B are mutually exclusive and the sum of the probabilities of the events in the family of B are equal to 1. Then we have $$P(B_i \mid A)= \frac{P(B_i)*P(A \mid B_i)}{\sum_{m=1}^{n}[P(B_m)*P(A \mid B_m)]}$$  What this means is reliant on the tree diagram. If we are only looking at the sub-items of A, this is what the tree diagram would look like. If J has a probability of 100%, and P(C) and P(D) are not 0, then when we are trying to find the probability of any of the B's being true given that A is true, we have to set the probability of A to be the entir