group 4.1

Statistics in “NSA Files: Decoded”

On November 1st, 2013 The Guardian published the article ‘NSA Files: Decoded’ written by Ewen MacAskill and Gabriel Dance. The article incorporates strong visual elements, produced by Feilding Cage and Greg Chen. The article is divided into 6 parts, each part illustrating the conduct of the NSA – revealed by Snowden’s whistleblowing – and the consequent impact on both a societal and individual level. The article contains a vast amount of information, which is displayed to the reader in an enticing way through a variety of interactive visual elements. The amount of statistics – in the sense of cold, hard numbers – is not excessive. However, the purpose of the article can be interpreted as an engaging rendition of what Edward Snowden’s feat exposed; and how it might affect the reader. One noteworthy aspect in the article, is the lack of direct quotation along with some of the graphics. Presumably the information is embedded in the NSA Files, as the article predominantly interprets the information in the files released by Snowden.

Three degrees of separation’ and ‘Your digital trail’ are two of the interactive graphics depicting the extensive length of the prying arm of digital espionage. ‘Three degrees of separation’ visualises the three ‘hops’ the NSA can make from their intended target, in order to create a web of interactions and, subsequently, garner more information on a greater amount of people. The interactive graphic allows you to slide a pointer to indicate how many friends you have on Facebook and see how many people the NSA could technically monitor through you – personally my results came up as follows: tier 1 (your own friends), around 1100; tier 2 (friends of friends), amounted to almost 180,000; and tier 3 (friends of friends of friends) 29,000,000. The amounts are based on an estimate that the average user has 190 friends, but considering the article is almost six years old, the numbers are likely far bigger in today’s world. Presumably the graphic is based on information available in the NSA files, but no direct quotation is made.

Screenshot 2019-03-07 at 11.25.59

Your digital trail’ is another interactive graphic which allows the user to find out what information they are relinquishing through various forms of communication. The graphic allows the reader to click on logos of; different social media (e.g. Facebook, Twitter), e-mail, Google Search, phone call, web browsing, and camera, to see what information the provider – and consequently the NSA – can gather through your use of these services. Equivalent to ‘Three degrees of separation’ no source is listed, but it allows the user to visualise the vast amount of information surrendered through the mere use of services.

Other noteworthy motion-graphics include the one calculating the amount of data gathered for review by the NSA since opening the article. Like the two previous graphics no direct source is available, but the graphic does the job of portraying how much information the NSA gathers. One motion graphic where the source is directly credited is ‘Connected by cables’, which depicts how many countries the US is directly connected to through fibre-optic cables. The graphic is used to illustrate the ‘upstream’ flow perpetrated by the NSA, meaning they directly intercept information transferred along fibre-optic cables connected to the US

.Screenshot 2019-03-07 at 11.29.25

Apart from the interactive motion graphics, a variety of statistics are used throughout the article to strengthen the points written and clarify technological jargon in picture form. Such statistics include public opinion on the government’s ability to protect privacy, opposition towards governmental monitoring of communication, amongst others.

Overall, the statistics in the article mostly serve the purpose of clarification. The examples mentioned above all seem to simplify matters revealed in the NSA files, which might be hard to grasp in writing, through interactive motion graphics; which allows the reader to put things into a personal perspective.

Blog Post

NSA Files Decoded

Data Journalism

Group 4.1

Tom Westoe, Merve Aytas, Édua Varga, Tjerk de Vries, Pascal Friedrich Degenfeld, Dragoș Octavian Culcear


Problematization of this journalistic research

There has been quite an extensive debate in media scholarship and journalism about the ethics and potential dangers of the social media filter bubbles. Filter bubbles are the result of personalization algorithms. These algorithms learn what a user is interested in, likes and dislikes, and accordingly shows them content to keep them engaged on the platform. It is also used to target users with specific advertisements that are of interest to the user. The result is that each user gets put in his or her own filter bubble, where they are not exposed to, for instance, differing political views. The algorithms determine the kind of ads each of us encounters by constantly gathering extensive data about our previously bought or liked items. This is the reason why Facebook claims that their service shall always be “free”: the social media platform generates its revenue from the way it delivers personalized ads to different people.

Everyone gets stuck in filter bubbles. They can influence our political views by promoting fake or one-sided news materials and propaganda, as well as causing discrimination and social class divisions. One of the biggest events that triggered world-wide discussions about this topic was centered around Donald Trump’s election, specifically about how they aided his victory. Since then, many researches tried to uncover how the system works, but sadly none of them have been successful. Here’s where Facebook Tracking Exposed comes into play.


Facebook Tracking Exposed, or simply fbTREX, is a Chrome/Firefox extension that tracks the data received by its users in their Facebook News Feeds in order to try to uncover how the algorithm actually functions.

The tool can be used by Facebook users that want to know more about their own filter bubbles, researchers collecting data with control groups in Facebook, and journalists interested in echo chambers and algorithm personalization. The extension displays which elements are recorded, and which elements are considered too private for collection, and chooses to not collect those.

The tool was created specifically for Facebook because it generates so much data that it is almost impossible for users to acquire and process this information in a meaningful way. The aim of this tool was to increase transparency behind personalisation algorithms, so that people have more effective control over their Facebook experience. This then benefits Facebook users and researchers. It does not, however, benefit Facebook. This is because fbTREX is trying to expose the algorithm, which Facebook is trying to keep secret in the interest of staying competitive with other social media platforms.

Fake account

We created Lucie Dvorak, a 30 year old lesbian, who grew up in Prague and now lives in Amsterdam. She works at DAF as a trucker and she’s is interested in country music, cats (she owns cats), cars, Orange Is The New Black, Tasty, photography, Opera, and WWE. But she is also a flat earther and anti-vaxxer. We decided to focus on these last two interests, because that gives us a good angle of inquiry for investigating Facebook’s personalization algorithm and the filter bubbles it creates.

So we liked pages and posts that are about Lucie’s interests to make it as real as possible. We only received suggestions for other pages but no ads were shown on her newspage.

We found that Facebook’s algorithm does not do a great job detecting fake accounts. Our group members logged in from a variety of places but despite that the account remained active. We also received more than 400 friend requests, but most of these seem to be fake accounts as well. Facebook usually requires its members to verify their accounts through mobile phone number identification but we easily created our fake persona with only our fake Gmail account. Those two observations reinforce the idea that Facebook seems to pay little to no attention to bots and fakes invading their platform. Throughout the week we only received one ad and two page recommendations and they did not match Lucie’s interests. The advertisement showed us special pants “for people with reduced mobility” when in reality our fake persona was a totally capable lesbian truck driver.

Image may contain: people standing


Meet Lucie – a 30-year-old trucker from Prague who resides in Amsterdam and has a variety of knacks and wits about her. Lucie is lesbian, and in a complicated stage of her romantic life. The complications compel Lucie to look for comfort through other channels, such as cats or TV series. She owns a cat and due to her sexuality, she is drawn to the highly-regarded Netflix series ‘Orange Is the New Black’; as she relates to the characters’ issues. She also spends a lot of time on social media. She feels alienated in life and seeks self-approval through strangers online. This has lead Lucie down a dark path – engaging in conspiracy theories and disregarding scientific counter-evidence. She spews propaganda and feels empowered when like-minded souls ravish her with praise. It is evident to people around her that she is spiralling further day-by-day and embracing the warm cocoon of her own filter bubble. This is evident to a specific someone, or should I say something? Not a person, but a computer-generated mind – Facebook’s algorithm. And it feels no remorse for sending her further and further into the abyss.


Luckily, Lucie is not real. She is one of many unfortunate guinea pigs used in an attempt to break down Facebook’s News Feed-algorithm. She is the creation of six creative 20-something-year-olds, and she is quite popular.
We created Lucie about two weeks ago and have already garnered over 400 friends on Facebook. Granted, some of the accounts are obvious bot accounts but the remainder seem to be genuine people. The vast majority are desperate, lonely, and some even downright distasteful – sending explicit photos and videos – men of various age. Lucie’s purpose is to trick Facebook’s algorithm into feeding us content which will strengthen her views no matter how preposterous. We like, we share, we join groups, we do everything in our power to feed as much information as possible to the algorithm and simultaneously record the posts through the fbtrex web browser extension. Over the course of two weeks we have recorded over 1000 posts through Lucie’s profile. When downloaded in an .csv file the data makes little sense. When organised, some patterns start to appear, but all-in-all the results are underwhelming.

Most posts recorded, almost fifty percent, are from groups Lucie has joined. The groups are all based around her interests, such as cats, country music, Orange Is the New Black etc. and then the more controversial themes we were hoping to be fed – conspiracy theories about flat earth theories and anti-vaccine propaganda. Photos, posts by friends, and videos all come up at around fifteen percent each of the total amount of posts, and the remaining 5 are made up of ads. The small number of ads is a backlash as we were hoping to produce ads related to the darker themes we instilled in Lucie’s persona.
The ads do teach you things about Lucie: she’s a cat owner and lives in The Netherlands. This can be derived from the various ads promoting parties and other social events in and around Amsterdam, as well as ads for cat food. Other than that, the ads are largely arbitrary. The rest of the recorded posts paint a picture of Lucie which can just as easily be painted by looking at her profile. The posts are in direct accordance with her interests, which are clearly visible in her feed as well.

To juxtapose Lucie’s results, one of the members of our group had a very different experience with his recorded posts; almost sixty-five percent were ads. This represents a clear shortcoming in the project – the timeframe. Two weeks is apparently not enough time to ‘trick’ Facebook’s algorithm by feeding it data. This is further emphasised by my own results; my ads and feed are very clear representations of my interests. But when compared, Facebook has over a decade of data on me, and only two weeks’ worth on Lucie.

To conclude, there are limitations to our experiment, but we won’t stop trying. Even though at this point it seems near impossible to achieve the desired results with so little time, we keep on feeding the algorithm with data hoping we will learn more about how it works.