Group 2.1



Data Project – Group 2.1 Final Report

Authors: Vera, My, Amanda, Maria, Aino, Jula


Visualizations: making datasets fun and clear

After the second week of data tracking our group member’s Facebook account and our fake Facebook account, we have been able to gather a lot more interesting data needed for this study. The data gathered from the real account shows quite a clear pattern. There is no posts nor suggestions that would make us think that the adverts are misplaced. The data tracked from this account mostly consists of adverts that any twenty years old would enjoy and quite frankly go well with the Facebook usage of the real account. The tracked posts consist of events, advertisements, liked pages and the posts shared on these pages and Facebook groups.

The posts tracked consist of pieces of popular culture. The page that seems to be dominating the real account is the meme and video page 9gag that is liked on Facebook by the real account. It is a public site that posts content multiple times a day and therefore the tracker keeps constantly saving 9GAG’s posts. There are also posts related to fashion and online stores such as Caliroots, Adidas, Gucci, Reebok, Calvin Klein, and Zalando. In addition, we detected a lot of events happening in Amsterdam attended by the user’s Facebook friends as well as event suggestions in the same area. The events varied from big music festivals such as Awakenings to creative writing workshop event provided by the University of Amsterdam. Also, smaller events in Amsterdam bars like Skatecafe and Club Exit were tracked, but some of these events are not tracked by the event name but by the Facebook friend’s name who reacted to the event, therefore, explaining the vast amount of names in the data visualization below (image 1.).

Image 1: The classification in term of authorship in the real facebook account

Image 1: That’s a lot of bubbles, but surely 9GAG is drawing a HUGE attention in our team member’s Newsfeed

For the fake Facebook account, Wouter Atsma, the visualized data looks fairly different compared to the real account. Lots of these differences depend on the fact that the fake account is indeed a new one and because of this the amount of data that has been possible to be gathered is significantly smaller. Also the fact that Wouter Atsma doesn’t have friends on Facebook, unlike our real account, affects the content of the data remarkably. As it can be seen from the visualizations (image 1. & image 2.), the amount of authors is far bigger and versatile for the real account than the fake one. The fake account has been active on Facebook by liking new pages, joining new groups, sharing posts, attending events as well as liking and reacting to the posts on his timeline. This has resulted in a fair amount of tracked data on his Facebook feed making it more comparable with the real Facebook account of our group member as this account has been used for years multiple times a day. Even the amount of data collected from the fake account is way smaller than from the real, the differences in the nature of the data are notable,  as a single author of the posts retrieved from one data set doesn’t appear in the other (image 1. & image 2.), as the interests of Atsma consists of the likes of Fox News, right-wing politicians, cuisine, different alcohol beverages and so on.

Image 2: The classification in term of authorship in the fake facebook account

Image 2: The Facebook posts seem to be more decentralized in our bot’s Newsfeed, with Fox News is at the center of the bubbles

What was found interesting, is how both accounts interact on Facebook can clearly be seen on the visualizations. By visualizing the data, it simply and concisely emphasizes the main factors affecting the accounts’ Facebook feeds. To make sure to receive enough data from Atsma’s account, joining a lot of different groups and liking several pages which equal to Atsma‘s interests has been the main goal of this project.  As it shows in the visualization (image 3.) almost 3/4  of the data gathered from the fake account so far is from different groups which correlate with the actions taken on the account. Whereas the data gathered from the real account shows how 50% of the interaction on Facebook is with videos (image 3.). This difference can be explained with the authenticity of the accounts. The real person’s account we used for this research genuinely spends time on Facebook by watching a lot of videos and tagging friends to posts, and the data gathered has been genuine and natural so to say. On the contrary, the data gathered from Atsma’s account has been forced and originated from an assembly line kind of actions; not using the time to watch videos or to interact with people. Likewise, the lack of a real network and friends on Atsma‘s account correlates with the indifferent amount of data gathered from events which are only 1% of the total data (image 3.).

Image 3: The percentage distribution of post format

Image 3: While our team member receives a lot of video contents, Facebook groups are taking over our bot’s Newsfeed

In the future of this research, it will be interesting to examine if it’s possible to manipulate the algorithm more and develop the filter bubble on Wouter Atsma‘s account by taking more similar actions to the real account. The interests are going to stay the same, but the way the fake account interacts on Facebook could be altered in some ways to especially affect the numbers on percentage distribution of the posts’ formats it receives (image 3.).


Comparing Facebook data between a real and a fake account

In recent years, there is a rising concern for selective exposure to information due to the personalization of content in social media, such as Facebook. The tendency of personalized content often viewed as being caused by Facebook’s algorithm. Therefore, this project investigates to which extent Facebook is tracking users’ data and the effect of pre-selected personalization. For this blog post, we compared the data from two different Facebook accounts. The data was gathered by using a computer programme called facebook tracking exposed. This tracker saves all the public posts on the Facebook timelines and sorts them out by categorizing them into videos, posts, photos, groups, and events. This comparison of the two accounts proved especially interesting because of the vast differences between the two users. The accounts have nothing in common they have completely different interests, backgrounds and the user activity by these two users is also completely different making these accounts excellent examples for this study.

WhatsApp Image 2019-03-13 at 11.39.25

Personal wall of our bot – Wouter Atsma

Our research group created a persona, who is a 45 years old Dutch male living in Groningen. Our constructed bot is a right-wing supporter and his profession is writing a critical novel. That is to say, our bot has completely different interests, age groups, status and background from ours. As we keep track of the bot throughout the week, we find out that Facebook friend suggestion is remarkably accurate. Although the bot’s personal information is different from our personal account, the Facebook algorithm still manages to suggest to people that we do know in real life. This observation indicates that Facebook is actively tracking users’ location, IP address, browser history and possibly many other indicators that we are not aware of. The Facebook algorithm is able to trace us back to our personal account, even though we tried to leave out our information when creating the bot.

In the following, I will compare the account we created with one of our group members on the basis of the data that was gathered throughout the last week; the data of our group member shows the interest of the user in the event “Awakenings”, which is an electronic music festival in Amsterdam. Contrary, a set menu in the “Eetbar” in Amsterdam is suggested to our bott. Assumptions can be made that the user of the first account is interested in events that include music and dancing, while the second user seems to prefer a settled environment. The data of our group member tracked, that the user watched a video of the page “Lady Gaga Facts”. One can indicate that the user is interested in modern artists and music. Contrary, the user of the second account got an invitation for the theater Pathe Tuschinski, which informs one that the user is more interested in traditional music and culture.

Further to our research, we will track both of the Facebook accounts regularly throughout the whole tracking period and gather the data retrieved for our final exhaustive analysis. We can already see the significant difference in the content these two completely different personas receive on their timeline based on their interests and activities on Facebook, therefore a hypothesis of two radically different data sets may take place. The IP address and the browser history do indeed have their own role in the formation of the algorithm, since Facebook offers advertisements and events also based on your whereabouts, but able to treat that data differently, as the research proceeds, we may concentrate more on the data that originates from the choices and preferences of our two Facebook users and the information entered on the platform. We will further examine the existence of filter bubbles on social media, and understand how they affect what each user sees on their personal feeds.


The UK riots in numbers? Outline of the statistics on 
Reading the Riots

This blog post will provide an outline of the riot research conducted by The Guardian and the London School of Economics. The main findings in phase one can be summarized as the following. Of the 270 interviewed people 85% said policing was an “important”/ “very important” factor in why the riots happened. Under half of those interviewed were students, and of those that were not in education, 59% were unemployed. The rioters were generally poorer than the country at large. 81 % of those interviewed said that they thought that riots would happen again and over one- third (35%) said that they would get involved if there were riots. 63% said that they thought more riots would occur within three years.

WhatsApp Image 2019-03-06 at 13.36.52

In the first phase of the research, the focus was on the people involved with the riots, whether they were engaged in violence, arson, attacks on the police or looting. Interviewees were collected through local contacts. People accepted the interviews because they wanted their stories to be heard. Even the ministry of justice allowed the researches to enter the prison to interview arrested rioters. In the end, 270 were interviewed, with the large majority not being arrested. The interviewees took place in homes, youth clubs, cafes and fast food restaurants. Each interview lasted at least 45 minutes. During the interview survey, style questions were asked but there was also room for an extended discussion. The emphasis was put on people’s experiences and perspectives.
In the end, 1.3 million words were turned into data. The data analysis was done by the LSE. They coded the interview thoroughly so that particular themes could be identified and evidenced. Coding labels such as injustice, police and riot motivation were used. The relationship between themes was recorded and displayed on a thematic map document. This allowed for a larger picture and the reliability and validity of the interviews to be checked.

It is identified that the anger and conflict with the police was one of the main narratives that caused the UK riot. Based on the interview with 270 rioters, 85% indicated that policing was an “important” or “very important” factor that caused the riot. This finding indicates the negative attitude to the police, in which the rioters did not feel they were respectfully treated. It was widely believed among the respondents that “the police is the biggest gang out there”. A figure shows that 73% of interviewees had been stopped and searched at least once in the previous year, in which black people are being disproportionately targeted. A young 22 years old man, who was presented at the Manchester and Salford riots explained his reason of participating in the riot that he wants to send out a warning message to the police and the government. One interpretation of this finding is the negative view of the police as a corrupting influence makes many rioters are not troubled by their morality.

WhatsApp Image 2019-03-06 at 13.36.53 (2)WhatsApp Image 2019-03-06 at 13.36.53 (3)

 The main reasons that pushed people to the riots were the inequalities and injustices taking place in the country. When interviewed, individuals claim that the factors that encouraged them to participate in the riots were varied, and all had to do with the lack of opportunities, the bad economic circumstances, the lack of unemployment or the bad treatment on behalf of the national police who unjustly ended the life of Mark Duggan. The reasons were very diverse, but they all led to a feeling of discontent and disappointment with the system, and people decided to protest.
After the research, it was revealed that a part of these protesters were gang members, who were responsible for most of the violent attacks taking place during the riots. These gang members teamed up together and decided to use violence without justification or reason, it was estimated that 13% of the arrested rioters in the whole country and 28% of the rioters arrested in London were gang members. These gang members took advantage of the chaotic situation and decided to take this opportunity to fight the authorities.


Approximately 2500 businesses were looted during the riots and the overall cost the lootings caused to the economy were up to 300 million pounds in insurance costs. The rioters had different motivations for the looting and some explained them to be based on simple greed, excitement or being angry. Data were available for 2278 premises hit in the riots of which 61% were retail premises, 12% sold electrical goods, 10% clothing and sportswear, 10% were restaurants and 9% small independent retailers, which were easiest to hit according to the rioters. Some looters admitted having a sense of euphoria from the actions whereas some had feelings of regret and shame.
From all the 2,5 million riot-related tweets studied only little sign of incitement was found. The platform was used more for the good cause: gathering people for community clean-ups with tweets, some of them retweeted over 1000 times. Blackberry´s messenger and the traditional media played a bigger part in the riots as the rioters informed each other about their whereabouts and police’s actions through pings and more than hundred of the 270 interviewees admitted getting inspired to join the riots by the dramatic news coverage on television.

The second phase of this study will focus on the effects that the riots had.  It will be investigated how local communities were impacted as well as the criminal justice response to the riots. The idea is to take the responses from the first phase and hold public discussions and debates at local communities with the people that suffered from the consequences of the riots. These events will include focus groups and in-depth interviews that will hopefully provide a deep understanding of the effects that the riots had on local communities to help with future research. In addition to the research about the communities, the criminal part of the events will be explored. The way that the police worked on these riots and the aftermath has been criticized and therefore we want to hear from the police force themselves. Also, they intend to understand the work of courts through interviews with judges, defense lawyers, prosecutor and court staff that were working on the riot cases.

Lewis, Edited by Dan Roberts Foreword by Paul, and Tim Newburn. 2011. “Reading the Riots: Investigating England’s Summer of Disorder Ebook.” The Guardian, December 15, 2011, sec. Info.