Final Research Report:

From Maria To Rachel

Our research has shown us insight into the intricate workings of the Facebook moderation system. Our profile was only active for two days before it was taken down by Facebook on suspicion of it being a bot. Ultimately we underestimated Facebook’s moderation and we handled the profile carelessly. Luckily we were able to retrieve some of the data in an excel file. A big part of our research process is the removal of our profile and the current process of retrieving it. With the data of Mary we did retrieve we calculated the percentage distribution of the formats of posts received. We then proceeded to visualize it in a circle packing.. Then we classified the posts on the basis of authorship and traced a profile of the user as the algorithm presents it. The final step was to compare the fake persona in relation to what we fed the algorithm. Is Mary trapped the Facebook bubble she created for herself based on her personality?

According to the calculations the main type of content was posts, consisting of 162 text based posts of a total of 240 interactions. This is followed by 67 videos and 11 photos.

The percentage of photos were found to be 4.58 %. 27.9% of the total posts were videos and the percentage of articles and small text posts is 67.5 % of the total interaction in Mary’s timeline.

Our Mary, being fairly traditional having three kids, being of catholic faith and while she does not believe in same sex marriage, she can be described as leaning towards the traditional side. Her political views fall under a heavily conservative standpoint as she is an adamant Trump supporter. Most content comes from a Facebook page called Being Conservative with over 60 posts, the second biggest contribution is from a page called Conservative Tribune with over 40 posts, followed closely by Fox News. However, Mary is a complicated women so being divorced and being ‘Pro-life’ she is open minded about certain topics, placing her more on the liberal side of the scale. As a strong single mother she holds firm believes about providing for her children without the support of a husband which is shown in her Facebook interaction with doula Kelly Winder, of which she interacted on 10 posts. Her passion for cooking also sketches an image far removed from the firm, conservative picture sketched by her political views. What do we learn from the data we collected? The profile we created for Mary is in line with the data that the Facebook algorithm showed us.

What we learned dealing with Mary is that Facebook is not stupid. They have various systems in place to identify fake profiles such as ours. For our next profile we have to be careful when logging in from the same IP address and from multiple computers. It could be argued that due to the right-wing conservative nature of the pages and profiles we shared, liked and posted on, our Maria could have been the subject of greater scrutiny by Facebook and it’s algorithms. We tried to go through the steps of verifying our profile, but without success. Currently we are still under review.

To be able to continue the research on Facebook’s personalisation algorithm we have decided to create a new profile. To do so in a way that doesn’t abruptly end our previous investigation, we have created a persona of someone we believe would be in Mary’s friend circle. We introduce to you Rachel An. She’s an american-born Korean woman, is within Mary’s 45-50 age group and shares similar conservative views: she is pro-life, condemns homosexuality and she is happily married. She has a son and is of catholic faith (they found each other in church). Her hobbies include sewing and  doing Sudoku.

What did we learn? Facebook does not want us to do this. A secretive company reluctant to share data, we must stay one step ahead if we are to be successful as investigative data journalists. The social media giant has been under intense pressure to moderate and control the darker areas of the site, to limit and exclude hate speech and other forms of abuse.

Posted by Andreas, Natalie, Morris, Aishwarya and Isa


“Me llamo Maria y amo los Estados Unidos”

Maria Renee López is a 45 year old latin-american woman. She likes to go by Mary, because she idolizes American culture and the English language. By calling herself Mary, she’s attempting to fit in more. She is divorced and has three sons; Thomas, Adam and Joseph. She has a love-hate relationship with the father of her children, but attempts to have an amicable relationship with him for the sake of their children. Mary is a very traditional woman of faith and therefore does not believe in same-sex marriage, abortion and the like. She is an adamant Trump supporter and even though she is an immigrant herself, she does not believe that the United States of America should have open borders. Her main hobbies include cooking delicious food for her sons, advocating for the wall to be built, following and reading anti-vaccine pages on Facebook, and taxidermy.

It felt quite unethical to create a fake persona, especially since the primary purpose of creating this persona was to conduct a research project. The very notion of ‘liking’ pro-Trump pages and groups, and promoting anti-vax views feels almost wrong. Additionally, in this constantly connected, overtly surveilled environment in which we find ourselves, the task of installing the FbTREX application felt very counter intuitive, given the level of social media surveillance we now know goes on behind the scenes. Willingly inviting a data collection application onto your personal computer goes against logic, however it is integral if we wish to fully ascertain and understand how Facebook’s personalisation algorithms work.

Facebook Tracking Exposed, or FbTREX is a free browser extension available for Firefox and Google Chrome. It allows users to investigate the information that they are exposed to through Facebook personalisation algorithms, and so increases transparency between users and Facebook, while also allowing people to have more control over their Facebook experience. To turn this into a research project, we created a fake Facebook account with the purpose of understanding the extent of filter bubble on Facebook, for a profile with views and perspectives different to our own.

The creation of this profile became the Maria Renee Lépez project, the creation of a fake Facebook profile which reflected the aforementioned characteristics. A stock photo was uploaded, a profile was built, pages were liked, and posts were made, reacted to and shared for a week. While this was being done, the FbTREX extension tracked the profile’s activity. FbTREX was used to investigate the extent to which a latin-american woman in the U.S. would be drawn into right-wing/Trump supporting groups, when usually such groups do not support the presence of latin-american people in the U.S. The fake persona is connected to current events such as Trump’s political actions, various right-wing news organisations, Jennifer Lopez’s recent engagement, ‘hot’ news topics, etc. This should enable us to enter into echo-chambers and filter bubbles that we would be unable to access on our personal profiles. As a result of this, we will be able to identify views, opinions and posts, from which Facebook cultivates and presents supportive posts to validate our opinions.

Though most of the liked and followed pages concerned themselves with right-wing news sources, Donald Trump and taxidermy, the suggested groups for some reason were mainly the likes of ‘Shania Twain Entertainment News Group’ or ‘Billy Ray Cyrus Spirit Club fans!!!’ and ‘USA recipes, cooking and baking tips.’. Our fake profile was not invited into the right-wing groups that we attempted to join. Whether this is because our profile indicates that the user is latin-american or because the fake nature of the profile is obvious is unknown. Facebook reflected back to us the information that we put into it (and that our profile was allowed to put into it, by its environment), proving the hypothesis that Facebook creates filter bubbles. However, this research carried with it a  multitude of limitations. For one, Facebook tracks user IP addresses and thus knew we were in the Netherlands. This affected our research for obvious reasons. Furthermore having had only one week to complete this project, not having friends and a community will have resulted in deceitful outcomes.

Posted by Andreas, Natalie, Morris, Aishwarya and Isa


“There are three kinds of lies: lies, damned lies, and statistics” – Mark Twain

Issues surrounding privacy and surveillance have never been as applicable to our everyday lives as they have been following the NSA scandal. The pulitzer prize winning article NSA Files: Decoded –  What the revelations mean for you published by The Guardian relies heavily on percentages and projections as type of statistics. It presents multiple sources to support the arguments made in order to have a variety of perspectives. For instance, one statistic had a reliable academic source, authored by Cornell scholars attached by link to the article based on an analysis of Facebook that reported the number of  friends of a typical Facebook user. Furthermore, another statistic was gathered from TeleGeography an authoritative telecom data website. Again the link was embedded into the article however, this time the link was unavailable due to it being expired. Thus, the credibility of the source could be questioned. Two statistics were taken from Electronic Privacy Information Center which is a public interest center and one of the most known privacy websites based in the USA. Other statistics were gathered from the likes of the Federal Judicial Center, a juridical education and research agency of the US government or The Associated Press-NORC Center for Public Affairs Research, an independent, objective, and non-partisan research organization with a focus on providing data and analysis on social issues. Lastly, another statistic was gathered from Clerk’s Office of the U.S. House of Representatives, a chief record-keeper for the House.

The people who promoted the figures were Edward Snowden, The Guardian and the journalists of the Guardian. The Guardian is a British newspaper which was founded in 1821. It published a series of articles regarding the Snowden discoveries in late 2013. The journalists who were involved with the public disclosures of Edward Snowden’s discoveries were Glenn Greenwald, Ewen MacAskill, Gabriel Dance and Fielding Cage.

The article depends on data and data representations to make its content more digestible. The data figure in part one of the article explains the Three degrees of separation by Kenton Powell and Greg Chen. This means that the NSA is allowed to travel “three hops” from its investigative targets. Three hops meaning: “people who talk to people who talk to people who talk to you”. The calculations in the figure are based on an analysis of Facebook where the typical user has an average of 190 friends and 14% of those friends are friends with each other. This is a statistic based on percentages in which the data always correlates to the base figures. This analysis then shows how three degrees of separation gets you to a network bigger than the population of Colorado. In other words, it shows how large of a network the NSA can investigate based on your personal amount of Facebook friends.

Based on the sets of information provided by Snowden the focus of the article, the meaning of the leaked data/information for the people, is clarified and supported. One of the conclusions made is that one of the reasons that NSA data collection was so  successfully extensive, was the digital revolution of the 1990s and the extent of traffic directed through the US allowing them to tap into traffic. The Connected by Cables diagram displays the cable connections between countries and allows for a visualization and support of the written information. Other data sets such as Shifting sentiment? reinforce their conclusions that, like Snowden, most US citizens do not support the degree to which the NSA surveils them.

Additional interpretations of the NSA surveillance data are fairly difficult to ascertain, and even more problematic to justify. It is hard to escape from the facts that the data shows; millions of civil liberty infringements, clearly laid out in solid, verifiable, black and white. One could potentially argue that Snowden’s revelations may actually do more to harm the endeavour of investigative journalists in the long run. Potential whistle-blowers may be understandably reluctant to break cover, given how explicitly the data details the level of control, surveillance and power that the government has over its populace. How can any investigative journalist feel secure that their identity will remain anonymous?

Alternative statistics relating to the NSA data collection scandal are widely available. This chart, right, came from statistics portal ‘Statista’, and details the three-fold increase in US phone record collation in a single year. Other charts and data sets are also broadly accessible online.

This is data journalism at its finest. A story of serious public interest and particular relevance in the social media world, steered and supported using huge amounts of data, detailed in a manageable way in order to make sense and be relevant to all people, not just statisticians and the politically aware.

Posted by Andreas, Natalie, Morris, Aishwarya and Isa