group 1.2

Filter or Flop?

In last week’s blogpost: ‘Meet Mercedes’, we mentioned facebook tracking exposed, fbTREX, a browser extension that tracks and saves public posts, ads and events on people’s newsfeed, which can later be downloaded to form a spreadsheet with all types of trackable data of these posts. This ranges from the type of post to the link of the author’s page. In this blogpost, we share the results of 202 posts that were tracked on Mercedes’ timeline in the last two weeks, which we then compare to the data of one of our group member’s timeline to visualise the influence that different uses and different perspectives on Facebook can have on the feedback that is shown on a timeline. The experiment’s overall aim is to look for evidence of the filter bubble phenomenon.

First of all, we needed to make a distinction between types of data that we wanted to collect, content-wise. When installing our bot, we focused on three major characteristics: a right-wing political affiliation, a love for her country of origin Venezuela and memes. These were the main three types of content that were public and that got tracked by FbTREX. The aim is to make sense of all aggregated data and finally visualise it.

Surprisingly, Mercedes only got around two ads over the whole time spam, whereas our chosen personal profile, as a controlled variable, gets around ten plus targeted ads every log-in. This seems to be quite obvious due to the longevity (or lack thereof) in this project. Moreover, looking at the data scraped from our personal profile vs the data scraped from Mercedes’, the only ‘real’ thing to tell is that the former one’s posts are a lot more varied and come from way more authors whereas Mercedes’ posts are mostly from Hispanic cooking and Donald trump fan pages and lack variety. Again, however, that was to be expected. As to the exploration of a possible filter bubble phenomenon, at this point, within Media Studies, the filter bubble is a proven phenomenon, our project, however, is not able to adequately contribute to this existing body of research.

Getting to the fun part. We used Microsoft Excel Sheets to create the following visuals for simplified illustration. See the graphics below:

Now, in continuation to last week’s mentioned limitations, aggregating the raw data using the fbTrex extension, making sense of it and visualising it led to some further complications. First off, several data traces were unusable due to ‘inexplicable’ error messages:

Our explanation for this is last wednesday’s facebook outage, causing facebook (and Instagram) to either cause usage problems or be completely down for several hours, on some servers until late thursday – Facebook’s longest outage in its history. Next to that, however, most importantly, the programme seems to have recorded some links multiple times, for instance a post by Fox News being counted in three separate instances, another one being recorded up to seven times. In such a large data set it would take too much time to meticulously keep track of every single link being recorded, which was complicated further by the fact that several posts seemed to have been misclassified, for example videos as text posts.

Generally, we as a group remain critical towards this project, not only due to the growing list of complications and overall limited time and resources. The process of creating a fake profile is potentially unethical to begin with. Then being asked to use our own personal profiles within the fbTrex programme, with now our own data being collected, demands for a certain level of transparency in what exactly it will be used for, not only on fbTrex but also DataJLab. Most importantly, the idea of comparing a new fake profile to a years-old real one still seems flawed to us, especially in connection to the very short time frame of the experiment. In our opinion, there won’t be any ‘real’ fruitful results without more specified variables and a more detailed research guideline to follow. It seems like the only reason that FbTREX is used in this course is to collect data for God knows what? Which leaves us with the question: who’s the real guinea pig here, Mercedes or us?

References: , ,  

Written by: Elviira Luoma, Jana Franck, Sadhbh Ward, Choy Travers, Ioulia Kaliatza and Niels Willemsen

Meet Mercedes

May we introduce to you: Mercedes Luciana, born October 31st 1984 in Venezuela, currently residing in Amsterdam. She is divorced and a strong Trump supporter. Her interests include traditional Hispanic cooking, alternative and Latin American music, as well as the Emo culture she identifies with.

Cute, right? Now, what is special about Mercedes? She does not exist.

She, i.e. her Facebook profile, was created last week, March 7th, as part of our experiment using the Facebook tracking exposed browser extension. FbTREX wants to explore Facebook’s concept of algorithm personalization, which has the aim to “reduce information overload, increase its scope and usability, and support users autonomy of making informed choices” (fbTREX). Now, fbTREX shows the user all the data Facebook is able to collect about you, wanting to illustrate “the social and individual negative effect of filtering” (fbTrex). With that said, Mercedes started creating an online life for herself, liking pages and sharing posts according to her interests. This went on for six days, with us keeping track of all developments such as advertisements, page and event suggestions and friend requests as well as messages. In order to grasp the big picture, we compared data collections from Mercedes’ profile to our personal ones.

Here are our most important and interesting findings:

In general, the political side of Mercedes’ new feed consists mainly of articles by Fox News and CNBC, and posts by Donald Trump and Ivanka Trump, since we followed some of their fanpages. However, since we followed a lot of music pages and Hispanic food groups, the main response was from here, since they were more active in general. This resulted in ads that corresponded with this sector of her interests. The advertisements we got were for a hardstyle festival, a genre that does not match the alternative rock and Hispanic music pages that we liked, which is explained by the fact that we set her as ‘interested’ in an upcoming hardstyle event when expanding onto her interests.

A lot of content on Mercedes’ news feed consists of pictures posted by the (surprisingly large) group of  people, most of whom are of Indian origin, that sent her friend requests. Overnight, she got 62 friend requests, some being mutual friends. Most of these people seemed to be bots themselves due to the recency of the profile’s creation and lack of personal information. Some of Mercedes’ new friends sent her messages to which we responded in the hopes of appearing legitimate. There was no in-depth conversations from which we could draw any conclusions.

During this project, we made an error which might have influenced our algorithm. Besides the various Pro-Trump pages we liked, we followed pages like ‘Venezuela Solidarity Campaign’ and ‘Republican Family Values’, which contradicted the other pages, since they turned out to be anti-Trump. It is possible that this accident caused a lack in political advertisements and posts, because the Facebook algorithm did not ‘know’ what Mercedes wanted to see. The political content we did receive was both from left- and right-wing news sources.

The main conclusion we can draw from this project is that the pages we follow and the posts we like and share influence the advertisements and search results we get. Even though she gathered a specific group of friends, this was not necessarily visible in the ads she received. The posts on her news feed mostly corresponded to her likes and interests, and we noticed that the pages Mercedes most frequently shared content from, such as ‘Hispanic Kitchen’ showed up most often in her feed.

Overall, the “filtering results in the creation of a bubble, a bundle of only that information that match her preferences” (fbTREX), clearly illustrated in Mercedes’ exposure to content. This can have serious consequences, as for instance as a Trump supporter, her “ability to critically evaluate and deal with contrary opinions, especially in the context of public debates, is potentially damaged” (fbTREX) – quite scary when you start thinking about it.

Lastly, our experiment does not come without certain limitations, the largest one being that we only had a week to create this profile, and shape her interests and personality. This resulted in a limited amount of data that Facebook had to base its personalized content on. However, this project did show us that it doesn’t take a lot of data to create a feed completely centered around this content. The main factors (taste in music, political affiliation, personal relations, etc.) are clearly reflected on Mercedes’ feed just as they are on our own Facebook feeds, even though we took a couple of years to form our shadow bodies.


Written by: Elviira Luoma, Jana Franck, Sadhbh Ward, Choy Travers, Ioulia Kaliatza and Niels Willemsen

Statistics Behind Snowden

The Guardian’s article NSA Files: Decoded – What the revelations mean for you, published November 1st 2013 by Ewen MacAskill and Gabriel Dance, revolves around Edward Snowden’s disclosures about the NSA, resulting in a societal debate around mass surveillance. With data journalism projects being rooted in data sources of various kinds, The Guardian uses statistics to support arguments and understanding – underlining the power of numbers. Now, the article, as the name suggests, focuses on ‘what the revelations mean for you’, providing the reader with numerous statistics illustrating different topics such as the extent of the NSA’s access to users’ data or politics’ influence. Most importantly, the article presents multiple interactive graphics allowing the user to limit down results according to preferred demographic factors, countries or personal information. In this text, we will critically analyze the six different descriptive statistics used in the article, mentioning their content and their functions within the text.

Figure A, Three degrees of separation, was created by The Guardian’s journalists based on calculations provided by Facebook. The chart shows the extent of access the NSA has to users’ data, being allowed to explore “three degrees of separation”. Readers can select a number of friends and are then confronted with calculations of their own second and third degree network.

Figure A

Connected by cables, provided by TeleGeography (Figure B) depicts connections between countries via fiber optic cables. In this graph, the reader can choose from a list of countries, uncovering that the cables offer a considerable amount of information regarding users’ phone and online communication, showing the amount of data that agencies access without users’ consent.

Figure B

The Intelligence reports by company bar chart (Figure C), created by the NSA, shows the data that various Internet companies provided to the NSA between June and July 2010. The graphic is being used as strong evidence as well as a reliable source of information in presenting the involvement of the major companies and the extent of their data access.

Figure C

Judges of the Fisa court (Figure D), provided by the FAS and the FJC, depicts a listing of US court members in chronological order, offering several demographic characteristics such as ethnicity and gender for the reader to choose, narrowing down the results. The chart highlights the race and gender issues within US politics.

 Figure D

The graph Shifting Sentiment? (Figure E), by The Associated Press-NORC Center for Public Affairs Research, shows a jump in concern over privacy around the Snowden revelations. US citizens were asked about their expectations of the situation in 2020, illustrating a growing dissatisfaction of people.

Figure E

Lastly, Figure F, A Bipartisan Congress, shows each vote during an amendment to curtail funding for the NSA’s collection of phone records of millions of Americans in July 2013. The statistics include voting results from both the majority and minority of the Republican and Democratic votes, depicting an extremely high rate of coalitions formed between centrist Republicans and centrist Democrats, joining forces against both libertarian Republicans and liberal Democrats.

 Figure F

We conclude the article’s use of data formats to be very extensive and fitting to provide background to the arguments provided. They sometimes lack a proper introduction and leave the reader to make sense of them by exploring the statistics themselves, which is possible, since most graphs contain all necessary information to understand its content and context. However, leaving the graphs open to interpretations could potentially be the journalists’ intention. It is important to keep in mind that data statistics are sources that have to be questioned, the article’s graphics, however, hardly leave room for questioning as everything is based on facts and calculations. Overall, The Guardian uses very strong visualisations supported by an appealing format (aka embodying the sexy job of a statistician), with especially the interactive graphs, making the article’s arguments more accessible to the reader – what the revelations mean for YOU.


Written by: Elviira Luoma, Jana Franck, Sadhbh Ward, Choy Travers, Ioulia Kaliatza and Niels Willemsen