group 4.4


ASSIGNMENT 1: Reverse engineered statistics in Reading the Riots

Data Journalism

Group 4

K. Barabash, C. Escudero Castro, E. Loukopoulou, U. Uygun, E. Olthof

Instructor:­­ Emilija Jokubauskaite

March, 6th 2019




In their analysis and news reports of the 2011 London riots, a journalistic project called Reading the Riots, the Guardian collected data from people directly involved in the riots and people affected by the riots through interviews and local authorities. Together with data analysts from the London School of Economics the Guardian analyzed and visualized this data to make it accessible for a larger audience. Data Journalism has been praised for its objectiveness (Rogers), and its rise has been accompanied with a strong believe in letting the facts tell the story (Anderson). However others critique the reliability of the data and call for a more critical approach to it (Gray). It isn’t the data that tells the story, the story is told by the way the data is selected, collected and presented. The following paragraphs will critically evaluate the fashion Guardian has chosen to tell this story.


Ethnicity of the interviewed

Analyzing the ethnicity of the affected individuals by the riots, who were interviewed by the researchers at the London School of Economics and the Manchester University, we found that almost half of the interviewed people were black. The decision to interview black people to find out answers to the questions like “Is policing an important factor in the riots?” or “Are riots going to happen again?” was not voluntary – majority of the rioters were black, as the riots that started in Tottenham sparked because of the murder of Mark Duggan by the police officer. Mark Duggan was a suspect for drug and weapon possession, which was shot due to the police brutality.

“Originally, it started off, it was like, yes, it was

a group of black people, the family members and friends … “

  • Man who set fire to police car in Tottenham

This explains the statistics of the ethnicity of interviewed people.


The Ethnicity of the Rioters per City

Ethnicity London West Midlands Nottingham Greater Manchester Merseyside Other Total
White 33 38 37 78 73 76 41
Black 46 34 33 11 13 13 38
Asian 7 14 1 3 2 7
Mixed 12 13 29 9 8 8 12
Other 3 1 2 1 3 1 2
Total 100 100 100 100 100 100 100
101 100 101 100 100 100 100

Defendants brought before the courts for offences relating to the public disorder between 6th and 9th August 2011, by ethnicity(1) and region – data as of 12th October 2011, The Guardian 2011.

As the above table and the graph on the right show, the Guardian divided demographic data collected on the rioters based on the ethnicity of the arrested rioters per city. This is data collected from defendants brought before courts, their ethnicity was defined by the defendants themselves. This can have implications for the data that is used in the graphs (Stray 20). Whoever states his or her ethnicity could feel that the current categories don’t fit. When does someone define itself as a white person and when does someone have a mixed ethnicity? It is a field which has a wide interpretation since it is by definition a socio-cultural identity. The Guardian clustered the collected data to make it usable for visualization. For example white and black Caribbean were defined as Black, Indian and Pakistani as Asian, Chinese however, was defined as other. People could have their own reasons for providing the interviewer with a different ethnicity for example when he or she doesn’t know their ethnicity or feels that it could fit into more than one category.

Next to the data on ethnicity alone, it is interesting to look at the cities and the ways the categories of ethnicity are divided between them. What does other mean in this context? How many other cities were taken into account? It doesn’t tell you if there were any cities that contained big outliers. Also the amount of arrested people in London is far greater than the other cities, e.g. 1386 arrestees in London compared to 42 in Meyerside.


Critique on Visualization

The official report and its Guardian coverage have both made substantial use of supporting visuals. The structure of the visual language of the official report deserves praise. Almost each page is divided into two parts with the visuals positioned on the right hand side of the page. On one hand, this clear distinction in the layout makes it easier for the audiences to make the connection between the data and the meaning it represents. On the other hand, the split layout structure resulted in a limited space for the visuals which took up more or less one thirds of each page. The limited space resulted in turn in the use of somewhat ambiguous visuals. For instance, the graph used on the right hand side of the page 24, it is not clear if the 3 answers given by the rioters are for the same questions as question 1 and question 3 seem to overlap (strongly agree, all agree). Because the space limitation does not allow complex or extensive visualizations, the point this graph claims to make should have been instead made by text in the description positioned on the left hand side of the page.

A further point in relation to the issue of limited space can be made as follows: If a different layout was preferred, there would have been sufficient space for a dedicated visuals sections. A dedicated visuals section would have made the use of multi-axis charts possible. Such charts might be difficult to grasp quickly which is fair reason not to involve them in the current layout of the report. But a dedicated visuals section would have worked perfect with them.
Lastly, the choice of the types of visualization techniques deserves mention. The overall point of the report is to develop a holistic perspective on the riots that took place. However, the graphs used do not reflect this holistic principle. Pie charts, bar charts, graphs and maps alternate on different pages of the report which does not allow a seamless reading experience for the average reader, who is after all, the intended audience of the report.



Stray, J. The Curious Journalist’s Guide to Data. GitBook. 2013.

Rogers, S. “John Snow’s data journalism: the cholera map that changed the world”. The Guardian. 2013. (Links to an external site.)Links to an external site.

Anderson,C.W. “Genealogies of data journalism”. The Data Journalism Handbook 2. eds J. Gray and L. Bonegru. 2019.



ASSIGNMENT 2: The Facebook Algorithm Analyzed

Data Journalism

Group 4

K. Barabash, C. Escudero Castro, E. Loukopoulou, U. Uygun, E. Olthof

Instructor:­­ Emilija Jokubauskaite

March, 13th 2019



1. Problemization of the Research

Who conducted this research?

This research has been conducted as a part of the course Data Journalism (2019) taught at the University of Amsterdam Media and Information bachelor’s programme.

What is the goal of this research?

The main goal of the research is to assess whether or not the content curation algorithm used by Facebook encourages the so-called filter bubbles. The fake Facebook profile created towards testing this question portrays a fictitious Dutch white male supremacist who is strictly against Islam. Therefore the research intends to test whether or not the algorithm feeds this profile anti-Islam-related content.

How will that goal be achieved?

This test will not only draw conclusions from the mere possibility of the algorithm suggesting anti-Islam content but instead will focus on a recent Islam-related case that sparked a public debate in the Netherlands: Dutch authorities have been struggling to take action about an Islamic school in Amsterdam some members of which have recently been found to have shown support for ISIS online (Pieters 2019). The research aims to test whether or not the algorithm will feed content that spreads misinformation on this sensitive topic. It also aims to see whether or not the content that covers this topic has dramatic undertones that manipulates the audiences.

For whom is this research intended?

The conclusions to be drawn from the research are intended to be discussed primarily by the research team and the whole class of the course. The conclusions are also intended for any audiences interested in the adverse consequences of online content curation algorithms.

Whom does this goal benefit?

Academics, data journalists, students, teachers and hopefully general social media audiences.

Whom does it harm?

From an ethical perspective, there are stakeholders who might be affected by the methodology of the research. For instance, the fake Facebook profile needed for the project has been created as to look as real as possible not be taken down by Facebook. For this reason, a credible profile picture had to be found for the profile created. This picture was taken from a royalty-free photography website and therefore can be freely used. However, the fake profile created using this picture demonstrates negative undertones with which the real person in the photo might or might not agree. This creates an ethical burden.

Furthermore, there is another consideration that might not really address one particular stakeholder but still is considered harmful by the research team. Even if it was for the purposes of this research only, the team did contribute to the ever increasing number of online personas who showed support for supremacist ideas.


2. The Fake Account

The profile created portrayed a male named Jan de Vries who is a 46 year old from Friesland and owns a family business in agriculture. He is an ardent supporter of the political party PVV and its leader Geert Wilders. He also shows online support for Trump and the values he stands for. He is married with kids. He likes to go fishing and hunting in his spare time and he tries to attend the church regularly. He prefers to shop from Albert Heijn. He mostly drinks milk in the morning. He also likes to drink beer while watching TV in the afternoon. He particularly likes Frans Beuer. He diet consists mainly of meat and meat products. He likes to read Telegraaf and Leeuwarder Courant. He smokes the cigarette brand Shag. He is quite active on Facebook and does not hold back when it comes to his political views.


3. Conclusions

Below can be found the conclusions drawn so-far together with some journal entries.

  • Facebook did not take down any of Jan’s posts.
  • When anti-Islam content is searched on the search bar, Facebook kept suggesting pro-Islam pages.
  • Friendship proposal started dropping in from the first day, however they do not seem to be in line with our profile.
  • It seems that it requires more activity spread over a consistently longer time for the algorithm to place the profile in a meaningful bubble.

Journal Entries

09/03/2019           18:03

Logged in to check whether I have new stuff popping on my news feed.

All the friend proposals are the friends I have on my real profile.

Liked Albert Heijn.

Liked golden Virginia hand rolling tobacco.

11/03/2019           19:46

Liked MAGA 2020 and shared a video.

Liked “Bikers against radical Islam Europe”.

Liked “United against Islam” and shared one of their posts.

Liked the official page of Geert Wilders.

Interested in the event; Provinciale Statenverkiezingen 20 maart 2019.

Until now there are no ads or pages proposed for Jan.

12/3/2019  12:13

New friend proposals are finally popping up. They are all people from Western African countries.

Liked Frans Beuer and shared one picture of his.

Also liked some fishing pages.



Pieters, Janene. 2019. “Amsterdam School Threatened after Counterterrorism Warning.” NL Times. March 8, 2019.




ASSIGNMENT 3: The Facebook Algorithm Analyzed – Part II

Data Journalism

Group 4

K. Barabash, C. Escudero Castro, E. Loukopoulou, U. Uygun, E. Olthof

Instructor:­­ Emilija Jokubauskaite

March, 20th 2019


Findings in Relation to the Filter Bubble

1. Jan could not achieve his motivation

When the fake character Jan was created for this project, a motivation was created for him. In other words, a realistic reason why he would decide to open a Facebook account was devised. That reason, his motivation, was to actively voice his support for the alt-right community online and to possibly connect with those with whom his ideas resonate. However, the more time he spent on the platform the less engaging his feed became. Instead of feeding Jan the type of content that would encourage his engagement such as events or discussion groups, Facebook has largely shown him content that could be passively consumed like posts or videos. Although the research team kept on pushing Jan to actively engage by performing interactive actions on the platform such as commenting on posts, the home feed of Jan still remained mostly composed of news items or dull entertainment content that were not really engaging.

2. Amsterdam School Case did not show up

The main goal of the research was to see whether the Facebook algorithm would feed Jan misleading news on the Amsterdam Islamic school case mentioned in our previous blogpost. Although Jan liked and actively engaged with the Facebook pages of various right wing and some alt-right movements and news outlets, Amsterdam school case has not made it to his feed. Interestingly though, other recent events have found great coverage in Jan’s feed. The attacks in Utrecht and New Zealand have found great coverage. This was attributed by the research team to the immediacy factor: Facebook algorithm is likely to favor content that is not only fresh but also more controversial.


Findings in relation to Facebook Content

1. Home Feed

Even though Jan followed and interacted with a variety of pages with different content types, most of the content on Jan’s feed was of political nature. Content suggestion that appear on the side of Jan’s feed had often entertainment themes while the content that appeared on Jan’s page remained mainly political. This was the case regardless of the fact that Jan liked these suggestions or not.

Jan’s feed was mostly made up of posts but the type of the feed content did not yield meaningful results. It was rather the authorship and the political nature of the content that produced meaningful explanations. As the charts below show, Jan’s feed was dominated by the right-wing authors even though he had interacted with other pages equally.


Distribution of Post Types

Distribution of Post Types

Distribution of Authors

Distribution of Authors


2. Friendship Requests

The time frame for the research was not sufficient for the algorithm to fully demonstrate its capabilities. Although the algorithm has shown improvement, the research team believes that much more consistent results could have been achieved with a longer time frame. This could be observed in the friendship requests? At first random people from different regions of world came up. People in the Netherlands started showing up after the first week. However, this could also have another explanation. Could it be the case that Facebook predicted Jan to be an alt-right supporter and tried to normalize him by suggesting a multicultural environment? The research team highly doubts it, but is not in a place to make a definitive case because of the lack of objective proof in either direction.

Also, the contacts of the different Facebook accounts that were logged on the same computer were shown as friend suggestions. This served as the hard proof that the data left on the browser is being employed in the working of the Facebook algorithm.


Distribution of Friendship Requests

Distribution of Friendship Requests


Findings in Relation to the Tool

The results given by the tool for the type of content had some ambiguous information particularly regarding the date. Some logs on the excel file looked like they were created on some date in 1970 which is technically not possible. This led the team to question the legitimacy of the data but it was not feasible to check all the logs on the excel file.

The tool kept crashing particularly while trying to retrieve the CSV file even though we tried using alternative browsers. After many attempts it started working again but the team could not understand the reason.