ASSIGNMENT 1: Reverse engineered statistics in Reading the Riots
K. Barabash, C. Escudero Castro, E. Loukopoulou, U. Uygun, E. Olthof
Instructor: Emilija Jokubauskaite
March, 6th 2019
In their analysis and news reports of the 2011 London riots, a journalistic project called Reading the Riots, the Guardian collected data from people directly involved in the riots and people affected by the riots through interviews and local authorities. Together with data analysts from the London School of Economics the Guardian analyzed and visualized this data to make it accessible for a larger audience. Data Journalism has been praised for its objectiveness (Rogers), and its rise has been accompanied with a strong believe in letting the facts tell the story (Anderson). However others critique the reliability of the data and call for a more critical approach to it (Gray). It isn’t the data that tells the story, the story is told by the way the data is selected, collected and presented. The following paragraphs will critically evaluate the fashion Guardian has chosen to tell this story.
Ethnicity of the interviewed
Analyzing the ethnicity of the affected individuals by the riots, who were interviewed by the researchers at the London School of Economics and the Manchester University, we found that almost half of the interviewed people were black. The decision to interview black people to find out answers to the questions like “Is policing an important factor in the riots?” or “Are riots going to happen again?” was not voluntary – majority of the rioters were black, as the riots that started in Tottenham sparked because of the murder of Mark Duggan by the police officer. Mark Duggan was a suspect for drug and weapon possession, which was shot due to the police brutality.
“Originally, it started off, it was like, yes, it was
a group of black people, the family members and friends … “
- Man who set fire to police car in Tottenham
This explains the statistics of the ethnicity of interviewed people.
The Ethnicity of the Rioters per City
|Ethnicity||London||West Midlands||Nottingham||Greater Manchester||Merseyside||Other||Total|
Defendants brought before the courts for offences relating to the public disorder between 6th and 9th August 2011, by ethnicity(1) and region – data as of 12th October 2011, The Guardian 2011.
As the above table and the graph on the right show, the Guardian divided demographic data collected on the rioters based on the ethnicity of the arrested rioters per city. This is data collected from defendants brought before courts, their ethnicity was defined by the defendants themselves. This can have implications for the data that is used in the graphs (Stray 20). Whoever states his or her ethnicity could feel that the current categories don’t fit. When does someone define itself as a white person and when does someone have a mixed ethnicity? It is a field which has a wide interpretation since it is by definition a socio-cultural identity. The Guardian clustered the collected data to make it usable for visualization. For example white and black Caribbean were defined as Black, Indian and Pakistani as Asian, Chinese however, was defined as other. People could have their own reasons for providing the interviewer with a different ethnicity for example when he or she doesn’t know their ethnicity or feels that it could fit into more than one category.
Next to the data on ethnicity alone, it is interesting to look at the cities and the ways the categories of ethnicity are divided between them. What does other mean in this context? How many other cities were taken into account? It doesn’t tell you if there were any cities that contained big outliers. Also the amount of arrested people in London is far greater than the other cities, e.g. 1386 arrestees in London compared to 42 in Meyerside.
Critique on Visualization
The official report and its Guardian coverage have both made substantial use of supporting visuals. The structure of the visual language of the official report deserves praise. Almost each page is divided into two parts with the visuals positioned on the right hand side of the page. On one hand, this clear distinction in the layout makes it easier for the audiences to make the connection between the data and the meaning it represents. On the other hand, the split layout structure resulted in a limited space for the visuals which took up more or less one thirds of each page. The limited space resulted in turn in the use of somewhat ambiguous visuals. For instance, the graph used on the right hand side of the page 24, it is not clear if the 3 answers given by the rioters are for the same questions as question 1 and question 3 seem to overlap (strongly agree, all agree). Because the space limitation does not allow complex or extensive visualizations, the point this graph claims to make should have been instead made by text in the description positioned on the left hand side of the page.
A further point in relation to the issue of limited space can be made as follows: If a different layout was preferred, there would have been sufficient space for a dedicated visuals sections. A dedicated visuals section would have made the use of multi-axis charts possible. Such charts might be difficult to grasp quickly which is fair reason not to involve them in the current layout of the report. But a dedicated visuals section would have worked perfect with them.
Lastly, the choice of the types of visualization techniques deserves mention. The overall point of the report is to develop a holistic perspective on the riots that took place. However, the graphs used do not reflect this holistic principle. Pie charts, bar charts, graphs and maps alternate on different pages of the report which does not allow a seamless reading experience for the average reader, who is after all, the intended audience of the report.
Stray, J. The Curious Journalist’s Guide to Data. GitBook. 2013.
Rogers, S. “John Snow’s data journalism: the cholera map that changed the world”. The Guardian. 2013. https://www.theguardian.com/news/datablog/2013/mar/15/john-snow-cholera-map (Links to an external site.)Links to an external site.
Anderson,C.W. “Genealogies of data journalism”. The Data Journalism Handbook 2. eds J. Gray and L. Bonegru. 2019.
ASSIGNMENT 2: The Facebook Algorithm Analyzed
K. Barabash, C. Escudero Castro, E. Loukopoulou, U. Uygun, E. Olthof
Instructor: Emilija Jokubauskaite
March, 13th 2019
1. Problemization of the Research
Who conducted this research?
This research has been conducted as a part of the course Data Journalism (2019) taught at the University of Amsterdam Media and Information bachelor’s programme.
What is the goal of this research?
The main goal of the research is to assess whether or not the content curation algorithm used by Facebook encourages the so-called filter bubbles. The fake Facebook profile created towards testing this question portrays a fictitious Dutch white male supremacist who is strictly against Islam. Therefore the research intends to test whether or not the algorithm feeds this profile anti-Islam-related content.
How will that goal be achieved?
This test will not only draw conclusions from the mere possibility of the algorithm suggesting anti-Islam content but instead will focus on a recent Islam-related case that sparked a public debate in the Netherlands: Dutch authorities have been struggling to take action about an Islamic school in Amsterdam some members of which have recently been found to have shown support for ISIS online (Pieters 2019). The research aims to test whether or not the algorithm will feed content that spreads misinformation on this sensitive topic. It also aims to see whether or not the content that covers this topic has dramatic undertones that manipulates the audiences.
For whom is this research intended?
The conclusions to be drawn from the research are intended to be discussed primarily by the research team and the whole class of the course. The conclusions are also intended for any audiences interested in the adverse consequences of online content curation algorithms.
Whom does this goal benefit?
Academics, data journalists, students, teachers and hopefully general social media audiences.
Whom does it harm?
From an ethical perspective, there are stakeholders who might be affected by the methodology of the research. For instance, the fake Facebook profile needed for the project has been created as to look as real as possible not be taken down by Facebook. For this reason, a credible profile picture had to be found for the profile created. This picture was taken from a royalty-free photography website and therefore can be freely used. However, the fake profile created using this picture demonstrates negative undertones with which the real person in the photo might or might not agree. This creates an ethical burden.
Furthermore, there is another consideration that might not really address one particular stakeholder but still is considered harmful by the research team. Even if it was for the purposes of this research only, the team did contribute to the ever increasing number of online personas who showed support for supremacist ideas.
2. The Fake Account
The profile created portrayed a male named Jan de Vries who is a 46 year old from Friesland and owns a family business in agriculture. He is an ardent supporter of the political party PVV and its leader Geert Wilders. He also shows online support for Trump and the values he stands for. He is married with kids. He likes to go fishing and hunting in his spare time and he tries to attend the church regularly. He prefers to shop from Albert Heijn. He mostly drinks milk in the morning. He also likes to drink beer while watching TV in the afternoon. He particularly likes Frans Beuer. He diet consists mainly of meat and meat products. He likes to read Telegraaf and Leeuwarder Courant. He smokes the cigarette brand Shag. He is quite active on Facebook and does not hold back when it comes to his political views.
Below can be found the conclusions drawn so-far together with some journal entries.
- Facebook did not take down any of Jan’s posts.
- When anti-Islam content is searched on the search bar, Facebook kept suggesting pro-Islam pages.
- Friendship proposal started dropping in from the first day, however they do not seem to be in line with our profile.
- It seems that it requires more activity spread over a consistently longer time for the algorithm to place the profile in a meaningful bubble.
Logged in to check whether I have new stuff popping on my news feed.
All the friend proposals are the friends I have on my real profile.
Liked Albert Heijn.
Liked golden Virginia hand rolling tobacco.
Liked MAGA 2020 and shared a video.
Liked “Bikers against radical Islam Europe”.
Liked “United against Islam” and shared one of their posts.
Liked the official page of Geert Wilders.
Interested in the event; Provinciale Statenverkiezingen 20 maart 2019.
Until now there are no ads or pages proposed for Jan.
New friend proposals are finally popping up. They are all people from Western African countries.
Liked Frans Beuer and shared one picture of his.
Also liked some fishing pages.
Pieters, Janene. 2019. “Amsterdam School Threatened after Counterterrorism Warning.” NL Times. March 8, 2019. https://nltimes.nl/2019/03/08/amsterdam-school-threatened-counterterrorism-warning.
ASSIGNMENT 3: The Facebook Algorithm Analyzed – Part II
K. Barabash, C. Escudero Castro, E. Loukopoulou, U. Uygun, E. Olthof
Instructor: Emilija Jokubauskaite
March, 20th 2019
Findings in Relation to the Filter Bubble
1. Jan could not achieve his motivation
When the fake character Jan was created for this project, a motivation was created for him. In other words, a realistic reason why he would decide to open a Facebook account was devised. That reason, his motivation, was to actively voice his support for the alt-right community online and to possibly connect with those with whom his ideas resonate. However, the more time he spent on the platform the less engaging his feed became. Instead of feeding Jan the type of content that would encourage his engagement such as events or discussion groups, Facebook has largely shown him content that could be passively consumed like posts or videos. Although the research team kept on pushing Jan to actively engage by performing interactive actions on the platform such as commenting on posts, the home feed of Jan still remained mostly composed of news items or dull entertainment content that were not really engaging.
2. Amsterdam School Case did not show up
The main goal of the research was to see whether the Facebook algorithm would feed Jan misleading news on the Amsterdam Islamic school case mentioned in our previous blogpost. Although Jan liked and actively engaged with the Facebook pages of various right wing and some alt-right movements and news outlets, Amsterdam school case has not made it to his feed. Interestingly though, other recent events have found great coverage in Jan’s feed. The attacks in Utrecht and New Zealand have found great coverage. This was attributed by the research team to the immediacy factor: Facebook algorithm is likely to favor content that is not only fresh but also more controversial.
Findings in relation to Facebook Content
1. Home Feed
Even though Jan followed and interacted with a variety of pages with different content types, most of the content on Jan’s feed was of political nature. Content suggestion that appear on the side of Jan’s feed had often entertainment themes while the content that appeared on Jan’s page remained mainly political. This was the case regardless of the fact that Jan liked these suggestions or not.
Jan’s feed was mostly made up of posts but the type of the feed content did not yield meaningful results. It was rather the authorship and the political nature of the content that produced meaningful explanations. As the charts below show, Jan’s feed was dominated by the right-wing authors even though he had interacted with other pages equally.
2. Friendship Requests
The time frame for the research was not sufficient for the algorithm to fully demonstrate its capabilities. Although the algorithm has shown improvement, the research team believes that much more consistent results could have been achieved with a longer time frame. This could be observed in the friendship requests? At first random people from different regions of world came up. People in the Netherlands started showing up after the first week. However, this could also have another explanation. Could it be the case that Facebook predicted Jan to be an alt-right supporter and tried to normalize him by suggesting a multicultural environment? The research team highly doubts it, but is not in a place to make a definitive case because of the lack of objective proof in either direction.
Also, the contacts of the different Facebook accounts that were logged on the same computer were shown as friend suggestions. This served as the hard proof that the data left on the browser is being employed in the working of the Facebook algorithm.
Findings in Relation to the Tool
The results given by the tool for the type of content had some ambiguous information particularly regarding the date. Some logs on the excel file looked like they were created on some date in 1970 which is technically not possible. This led the team to question the legitimacy of the data but it was not feasible to check all the logs on the excel file.
The tool kept crashing particularly while trying to retrieve the CSV file even though we tried using alternative browsers. After many attempts it started working again but the team could not understand the reason.
Final Research Report
Kateryna Barabash: 11656697
Chavelii Escudero Castro: 11253614
Eirini Loukopoulou: 11256559
Ulas Uygun: 11801654
Elise Olthof: 10821317
Instructor: Emilija Jokubauskaite
April, 3rd 2019
Word count: 2715
This study has been conducted with the purpose of testing a popular hypothesis regarding the algorithms of online social platforms: The hypothesis that online platforms such as Facebook usually use a certain type of algorithm that encourages misinformation and polarization. This hypothesis has been voiced frequently over recent years. Thus, this study responds to these claims through the following fundamental principles:
· Goal: The main goal of this research project is to answer the question of whether the Facebook algorithm encourages extreme right-wing ideas through the spread of misinformation or not.
· Focus: The test of the above-mentioned hypothesis can produce meaningful results only on the stipulation that a tight focus is devised. Because of its conjectural relevance, the focus of this study has been limited to anti-Islam discourse.
· Case: To further specify the focus of the study, a recent Islam-related case that sparked a public debate in the Netherlands has been chosen: the Amsterdam School Case. Dutch authorities have been struggling to take action about an Islamic school in Amsterdam some members of which have recently been found to have shown support for ISIS online (Pieters 2019). This topic has garnered attention from all segments of the Dutch society and it is the type of a case that is expected to be easily favored by the Facebook algorithm because of its content and vulnerability to manipulation. For these reasons, this case has been picked.
· Methodology: To test whether or not the Facebook algorithm would feed the Facebook users misinformation about the Amsterdam School Case, a fake Facebook profile has been created. The profile portrayed a 40+ white male Dutchman from Friesland (northern part of the Netherlands) who actively supported the discourse of the extreme right-wing political party of the Netherlands, the PVV. 3 members (due to the possibility of an account being taken down) of the research team managed the activities of the account, cultivating engagement consistent with the demographic profile of the character – posting under Jan’s name, liking content, commenting, and following pages. The Facebook extension tool FBtrex has been used to collect the data generated by the activities of the Facebook account, and the data was analyzed to glean meaningful information.
· Findings: Although the algorithm was found to further feed the profile with right-wing content, the Amsterdam School Case was not suggested by the algorithm.
2. STORY: THE ALGORITHM MISCONCEPTIONS
The data collected for the purposes of this research gave an answer to the central question posed by the research team which will be discussed further in the findings section of this report. What it achieved doing, in addition to that, was to enlighten other aspects to the working of the algorithm. It turns out that there are some assumptions about such algorithms that are not altogether true. These assumptions collectively make up a set of common misconceptions about the Facebook algorithm. The story that came out of this journalistic research is elaborates on these misconceptions.
Facebook is supposed to encourage socializing and critical thinking
Facebook positions itself as a social platform where you can voice your opinions. But as seen through Jan De Vries’ profile, it is more of a platform that pushes you to consume – content or just advertisements. Content that is easily digestible by the user is a priority while group suggestions and events which would make the user socialize are not often visible on the forefront. Users are not encouraged to engage in any forms of debates or current discussions but rather be passive consumers while their feed turns into an ongoing promotion of different goods and services. Facebook trains consumers and reinforces consumerism rather than critical thinking, which is why it supposedly encourages its users to blindly consume the content given to them according to the official narrative.
There are no attempts to erode the filter bubbles
Facebook does, in fact, confirm your preferences further, creating the so-called filter bubbles. The users rather than being exposed to a diverse feed, they fall into a loop of never-ending content which targets only specific beliefs and point of views, which coincide with users’ present attitudes. They learn how to accept their own news feed as a representation of the world and encourage radicalization of ideas and fanaticism. However, surprisingly it also attempts to expose users who seem to be already radicalized in content, as for example Jan, which is out of their comfort zone. In Jan’s case, this was the numerous friend suggestions his profile received coming from different people. These friend suggestions from various foreign countries ranging from Indonesia to Belgium could also be due to the normalization Facebook attempts to achieve through exposing their users to content (friend suggestions, group suggestions, events, etc) which is not necessarily directly linked to the user’s preferences. Although there is just a probability of this normalization, the data that our team got after thorough analysis strongly suggests the possibility of it being true.
Facebook personalization settings are applied directly
It takes time for the Facebook algorithm to define a certain account in some category. It demands constant use and active participation in order to provide the algorithm with the ‘food’ it needs in order to categorize the user. Jan’s profile, even though constantly active, seemed like it needed more time in order for Jan to be put in a certain box by the algorithm. If Jan’s profile was active for more time we would probably understand and visualize in more detail how the algorithms work. For example, after approximately 2 months of usage of Jan’s account, our team would be able to generate more data which would make the results of this report more valid.
All pages you have liked have an equal position on your news feed
Algorithm suggests non-political topics as group suggestions, but the home feed is dominated by political content. Even though Jan had various interests apart from politics like music, food, sports, etc., none of that would appear on his news feed any time his profile would be active. Various other pages like this of singer Frans Bauer and the supermarket chain like Albert Heijn were liked and even though most of them are producing a lot of content daily none of that would appear on Jan’s news feed, which was on the contrary dominated by posts and visual content related to current political issues.
In conclusion, our team wants to notice that Jan did not achieve his motivation. When the fake character Jan was created for this project, a motivation was created for him. In other words, a real reason why he would decide to open a Facebook account was devised. That reason, his motivation, was to actively voice his support for the alt-right community online and to possibly connect with those with whom his ideas resonate. However, the more time he spent on the platform the less engaging his feed became. Instead of feeding Jan the type of content that would encourage his engagement such as events or discussion groups, Facebook has largely shown him content that could be passively consumed like posts or videos. Although the research team kept on pushing Jan to actively engage by performing interactive actions on the platform such as commenting on posts, the home feed of Jan still remained mostly composed of news items or dull entertainment content that was not really engaging.
Additionally, the Amsterdam school case didn’t appear. The main goal of the research was to see whether the Facebook algorithm would feed Jan misleading news on the Amsterdam Islamic school case mentioned in our previous blog post. Although Jan liked and actively engaged with the Facebook pages of the various right-wing and some alt-right movements and news outlets, the Amsterdam school case has not made it to his feed. Interestingly though, other recent events have found great coverage in Jan’s feed. The attacks in Utrecht and New Zealand have found great coverage. This was attributed by the research team to the immediacy factor: the Facebook algorithm is likely to favor content that is not only fresh but also more controversial.
However, other political content did appear on Jan’s feed and moreover, most of the content was political. Content suggestions not in line with the actual content on feed. For example, one of Jan’s group suggestions was Cricket group and most of his feed was extremely political.
Also, time frame prevented algorithms from realizing its potential. Generally, more time would be needed for this research to gather more valuable and valid data about the case.
As a side note, we wanted to find out what would happen if Jan’s extreme right profile would be compared to one of the team member’s profiles that would be considered rather left, with more interests in environmental issues and left-wing cultural events and news outlets and less political interests. The following table shows the results of comparing both accounts’ the first ten posts of the same news outlets, RTL and NOS:
This table is consistent with our previous findings on the highly political content of Jan’s feed. However, it also shows Facebook’s potential gatekeeping role in news distribution itself. The table shows how Facebook would present Jan the posts with content on Islam and political issues while the team member barely got to see any political content but more on environment-related issues, which aligned with the team member’s interests.
4. LESSONS LEARNED
In order to achieve consistent results, more time was needed. Maybe if we had more time the algorithm would have suggested the Amsterdam Haga Lyceum Case and we would have material that we could use in comparison with other data, like our own Facebook timelines. Not only would the algorithm get more time to adjust, but more information would also have been provided to create a more complete picture of Jan himself. This way the algorithm would have had much more information and we could explore more options and research subjects for our research project. We needed to have many news outlets that Jan followed together with the material to compare it with, this turned out to be more difficult than expected. First of all the Facebook.tracker.exposed did not always work on every account. To follow mainly right-wing news outlets and pages that suited Jan’s personal interests, more research was needed than we initially thought. To find out how politically laden his news feed would be, Jan’s likes needed to be not only liked pages with politically loaded content but also of Jan’s personal interests, like farming, tractors, family life, and music styles. This turned out more difficult than initially thought. As Jan is part of a particular cultural group that differs in many aspects a lot from our team’s own cultural backgrounds, many things Jan would like are things that we didn’t know that exist. Only when liking pages we started to find out more about these cultural specificities. For other research in this field, more research will be done up front as well as during the research into the persona by looking at other pages of like-minded or habituated people. There are still many unknowns for us concerning the farming life for example, which makes it difficult to create a person that is as close to reality as possible.
To compare two accounts’ news feed of two different news outlets we needed the post-IDs of both accounts, however, with one of the accounts the facebook.tracking. exposed didn’t provide this data which meant that the material had to be compared manually. This takes time and also caused us to compare only a small portion since we couldn’t compare hundreds of post with the same ease as the FTE, making the research less reliable and less complete. In addition, our team hopes that the extension for Safari browser will be released soon, as this was the most used browser in all groups. Availability of FBtrex on Safari would make it easier to access our own profiles and generate data to use for the comparison. Additionally, a lot of posts in the CSV file generated by the extension were dated back to January, 1st 1970 which made it less convenient to analyze the given data and make more reliable conclusions.
There is always the possibility that although the profile is created for purposes of research, it might still have contributed to the general right-wing ideas online. This created an ongoing discussion in the team on whether to stay true to the role you play as Jan’s persona or to downplay it a little in order not to create something people would build upon or discriminate. How reliable is the research if the researcher doesn’t completely follow the script? Our fears of contributing to extremist ideas online meant that we tried to keep our comments in a somewhat more moderate tone than Jan maybe would have said had he been real. We sometimes used biblical quotes and remained a bit general in our commentary in order not to create or contribute to any extremist discussions. In order not to contribute to, or have any influence on future debates the profile and Jan’s activity will be archived and removed. When our persona left insulting messages on posts, a message will be posted that will stay on for 24 hours before removal of the comment that says the post was part of a research project.
Also, it is worth considering that Jan de Vries was, in the end, a fake persona. He is based on our conceptualization of what a white supremacist male would look like and act online. Especially considering the short time scheme that the persona had to be created, this meant that Jan was based mostly on our own ideas of white supremacy instead of carefully conducted research.
Group Work Challenges
Managing an account in a limited time frame was one thing. However, different people managing the same account was another important difficulty. All group members have to be careful about when and from which device they would log in in order to produce content. There was always the danger of getting Jan’s account shut down in case multiple people
would log in from different devices as this is an indication for Facebook that the specific account is problematic. At the same time, sharing one persona with different people means that each one has a different image of this persona, therefore, it demands coordination in order to agree on certain meeting points which will eventually be the main characteristics of the avatar.
Another problem was the language difference. Two out of five people in our team are native Dutch speakers which meant that they only could post comments on behalf of Jan. The non-native speakers were able to give likes but still, it was difficult to always know what exactly the posts are about or what Jan’s likes meant. This could, on the one hand, be helpful in not having too many people interfering with the profile but it also created room for misunderstandings, less knowledge sharing and the subsequent potential of missing out on important findings. Jan’s Dutch background was also more recognizable for the native Dutch, like typical Dutch parties or customs. On the other hand, this worked in our advantage since the Dutch team members were immune to some cultural differences that were noticed by the others. The conversation between the group members on these issues was an important part of our meetings and led to new insights.
The profile that was created was very loaded in the sense that it was a stereotyped version of a white male supremacist. A more neutral profile with everything else kept the same may very well produce different results. Since Jan was already mostly based on our own conceptualizations of him, we could bias ourselves into thinking this is what a right-wing person does on the internet, while the actual picture could be much more nuanced.
“Jan De Vries.” n.d. Accessed April 3, 2019. https://www.facebook.com/ian.devries.7121.
“Narrating a Number and Staying with the Trouble of Value- Handbook 2 – Data Journalism Handbook.” n.d. Accessed April 3, 2019. https://datajournalismhandbook.org/handbook/two/assembling-data/a-number-object-lesson-staying-with-the-trouble-of-value.
Footnote about group work contributions
Though every member had more or less contributed to every step of the way, here are a few distinctive contributions.
Ulas: Structure of the report, Introduction, Findings
Eirini: Structure of the report, Group Challenges
Elise: Lessons learned, Data Collection, Visualization
Chavelii: Dividing tasks, Management of the Facebook account
Kateryna: Findings, Critique of the Tool