A Bot’s Life
It has been two weeks since the creation of our Facebook BOT, the infamous ‘Corey Govender’. We finally collected enough data to understand through the comparison of his profile and another profile the differences created by filter bubbles and to which extent Facebook profiles are isolated from opposing content.
And so, there appear to be different dimensions to Corey’s identity when looking at his profile, including all the personal information we incorporated in his persona, and when scrolling through his news feed. The “about” dimension portrays him as an old banjo enthusiast, remarkably interested in the wild life which stems from the past we created for him— a retired South-African safari leader, currently living in Amsterdam. Pages and groups he likes include “Chasing Trophy Whitetails”, “Hunting Club” and a number of taxidermy supporters. When it comes down to the present of Corey’s life, the “news feed” dimension exposes content from Corey’s past, offering him the opportunity to dive into his interests to an extent we didn’t anticipate when choosing these. There was a lot of explicit content of animals which were pretty grim… but what surprised us even more was when those disappeared and were replaced by the Trump invasion. The first week of Corey’s life on Facebook, brought us all to the memorable November 2016. Scrolling through the feed we got overwhelmed by the amount of pro-Trump posts, pictures and memes, because we only had liked two pages which would generate such content. For reasons we cannot really define, there was almost no content about Corey’s passion, playing the banjo, and we did post a jolly country song and joined a few local groups.
As we befriended more and more people,who seemed to be pretty appreciative of our acceptances of their friend requests, we got 48 messages overnight, along with pictures of flowers and chocolate, the news feed became more colorful. Now, we had other people posting on our wall and their own posts integrating into ours, making the algorithm generate more and more.
After two weeks scrolling through the feed, Corey was drowning in his pre-set interests and one would conclude, he had no chance to take a breadth and explore different content. It was obvious we had locked him into a dense filter bubble…
Fbtrex helped us not only collect the data and organize it by simply using a function on Excel and nicely and clearly presenting it in a pie chart; it also allowed us to see, when compared to the data gathered from our personal profile, the differences in the forms of content and content itself, that the Facebook algorithm generates. In a way it made the algorithm visible and accessible to us. Of course, it was our intention to create a radical identity, entirely different to ours and to challenge the creation of a filter bubble. With this visibility, questions regarding our own facebook profiles appeared. Are we, as Corey drowning in a filter bubble?
When comparing the type of posts of the real Facebook feed with the persona we created (Figure 1 and 2), one of main thing which stands out is the amount of ads which appeared. On Corey’s profile, out of the 369 public posts which appeared on his feed, only 9 were adds (2.4%). This is a surprisingly small amount compared to the personal account, where more than a quarter (25.2%) were ads. Apart from that, there is not much which we thought was very surprising or which varied noticeably between the two profiles.
Figure 1: Corey’s Feed Figure 2: Personal Feed
Overall, the results were a direct reflection of everything we fed into Corey’s persona. For the project, the data collected was made up of a total of 369 public timeline posts (Figure 3). We found that most posts appearing on Corey’s timeline were group related and mainly composed of pictures, videos, etc …
Figure 3: Percentage Distribution
Although Corey’s profile received a lot of attention online through the multiple friend requests and his active participation online, the profile surprisingly only received a low number of targeted ads as he only received 9 compared to 41 for the other profile.
The main finding of the project showed that Corey’s filter bubble only ever showed content relevant to his pre-set interests, with posts appearing on his timeline which only matched his pre-set preferences and current online activity. Corey’s timeline never showed any content reflecting other views and opinions to his, leaving his profile to only ever show content matching his political and religious views.
The ethical boundaries in creating a bot for the purpose of gathering data on how Facebook’s algorithm operates are explicit, especially in Corey Govender’s case due to his online “popularity”. Although we did not send out any friend requests, we accepted all that came our way. We believe that some of these friend requests were probably sent out by other bots, judging from their corresponding profiles, many of which were of young attractive women. However, we are also sure that some were from real people who shared the same interests and were part of the same groups as Corey, based on the fact that they left relevant comments on posts he shared.
In that sense our project can be viewed as unethical, (although it cannot truly be considered our project as it was mandatory for us to carry out in order to be able to pass the course, which coincidentally can also be seen as unethical ), since we were deceiving real people in believing Corey Govender to be real.
Corey Govender. A Bot Blogpost
This project is an experiment to expose Facebook’s algorithm. As a group, we created the bot Corey Govender and set up a facebook profile which we used to like and share pages in order to create engagement and shape a persona, making sure it was very different from ourselves in politics, interests, etc. For a week, we took care of his Facebook activity, making sure he posted and liked content regularly. In terms of time frame and commitment, the project was sort of like that South Park episode where they have to take care of an egg for a week for a school assignment and not drop it. The purpose of this project was to examine Facebook’s algorithm. What kind of content showed up on Corey’s feed based on how his engagement, and how did this compare to one of our own Facebook profiles?
Corey Govender, originally born in South Africa, moved to the Netherlands to be with his kids. Corey speaks Afrikaans, English and a bit of Dutch; but prefers posting in English on Facebook to connect with his international friends. Corey is divorced from his Dutch wife, and is hoping to find love again with a woman who shares his Evangelist beliefs. His main passions are hunting, taxidermy and banjo playing. His passion for hunting stems from being previously employed in Trophy Hunting Safaris in Cape Town, from which he is now retired. Corey’s political views can be described as conservative and traditional; even going as far as believing that apartheid wasn’t all that bad.
In order to give credibility to the account, the first step was to create an engaging and convincing profile. In doing so, a critical part was to fill in information about Corey’s origin, current residence, his relationship status and music taste, to more important subjects like his political and religious views. The next step to generate engagement, was to join Facebook profiles embodying his interests and views. Indeed, groups like “Jesus Christ Family” and “Chasing Trophy Whitetails” were joined to solidify his strong opinions but engagement didn’t stop at that, it extended to sharing content and commenting on public posts to generate even more data.
When we first created the page we noticed that there was not much interaction as Corey had no friends at all. But once we started joining groups and liking pages many people sent friend request and even messaged him, thanking him for his friendship, or spamming him with bot requests to like profiles. We found it quite remarkable that the majority of the friend request were from women which considering his atrocious sexist shares and posts, comes as a surprise. Once we nurtured his online identity and built its basis by generating content, we received the same back. Notifications and friend requests came flowing, generating even more information for us to inspect.
After having actively run our bot for 6 days and used the facebook.tracking.exposed extension tool to scrape all of the public data Corey receives on his timeline, it was time to analyse the general results. When we opened up our data summary a sea of names of groups we joined, friends we made and people posting on these groups appeared. The results were a direct reflection of everything we fed into Corey’s persona with the most saved posts being from a religious group we joined called “Genesis Holdings”. We also saw many posts from “IOL News” a page we liked that claims to post “News that connect South Africans”. Most of the other recorded data is just associated with names of people that are part of the same groups we are and we got a lot of posts in Afrikaans in a group titled “Stop white genocide in South Africa”. Overall, our timeline reflected the character we created very well and we had generated enough momentum and engagement by liking a few pages and joining groups.
Considering the differences in the input into Facebook’s algorithm between our bot and our own facebook profiles, the outcome of a comparison between the content that appears on both news feeds is considerably contrasting. Corey’s newsfeed is extensively political-articles about current news affairs pop up among a sea of nationalistic and sexist memes, creating a dense filter bubble. Our decision to direct Corey’s interests(the pages he liked) towards more radical beliefs and ideas in order to challenge the algorithm, confirms that it generates a lot more political information, rather than the less-attractive ‘personal interest’ we decided he has (e.g banjo content). What stands out, piercingly, is the predominating Donald Trump content which is generated by the fact that we liked pages supporting him. In contrast to this explicit newsfeed, our own is a lot more expansive when looking at the data collected by the scraping tool. It gathered posts about places and events (based on our location), current world stories and news (appearing as most liked/shared top stories), as well as sponsored advertising.
On a whole, it was interesting to manipulate Facebook’s black box in such an extreme way and to see how it contrasted to our own profiles. We were surprised that the recommendations we received deviated from the pages we liked which were mainly hunting/safari ones. In turn, we received a lot of politically-charged content.
The Power of Statistics: The Panama Papers Uncovered
Daniell, Sarah, Veronika, Elisa, Anna, Sarah
An anonymous man, which has been named John Doe, leaked the documents of the Panamanian law firm Mossack Fonseca to a German journalist named Bastian Obermayer working for the Süddeutsche Zeitung newspaper. The arbitrariness of the whistleblower at first glance might be conceived as unreliable and rather dubious considering the huge amount of files and the fact they were leaked. The information that had to be processed could not be accumulated only by the German newspaper, hence, they sought help from the International Consortium of Investigative Journalism who held an international platform of journalists and a large scope of resources capable of handling the data. Over 400 journalists and 100 news organisation in 80 countries worked together for 2 years, researching and analysing the leak, decrypting and encrypting it to keep it classified. Eventually they compiled all the data, making it ready for publication in a way that it would be informative yet understandable, and exposing many names and crimes.
The ICIJ after organizing and sampling the data, further investigated its content, before promoting the figures to a large public. The specific cases break down the big numbers into particular people, places and companies, and promote their internal and local links. On a whole, the Suddeutsche Zeitung in collaboration with the ICIJ promote the figures. All the different cases and investigations have different journalists from over 80 countries signed under the presented tables of sampled figures. It can be said that the trust and validity given to the figures is rooted in the trust in authority, the authority of journalism as reporting the truth based on their rule of conduct.
The 2.6 TB of data provided to the Süddeutsche Zeitung by the whistleblower (John Doe) contains enormous lists of names and shell companies that used the law firm Mossack Fonseca as a tax evasion tool in the tax haven country Panama. Figure 1 shows the scale of the data, compared to previous leaks. The data has brought to light names of 140 politicians from more than 50 countries connected to offshore companies in 21 tax havens. The ICIJ has dedicated an entire section to their data methodology as well as various sets of graphs and charts on their Panama Papers investigative page. This data offers an in-depth analysis of the original, unorganised, raw data. Examples of what can be found amongst this data are: graphs depicting the growth and decline of Mossack Fonseca’s offshore companies or countries with the most active intermediaries. In this way, the data is able to tell a story and analyse different aspects of it successfully and efficiently.
Figure 2 Figure 3
The raw data of this investigation almost speaks for itself; most of the work of the investigation was related to organising it, indexing it, verifying it, and putting it into words for the public to understand. There’s always limitations of data – it can be manipulated, it can be picked and chosen to tell the story that the journalist wants to write. Considering the amount of data of this leak, however, and the number of countries, journalists and organisations involved – who all found the same incriminating evidence toward Mossack Fonseca and its shell companies, in other words all had the same “interpretation” -, it seems that manipulating this data would have meant organisation along the lines of the mafia as well as a passionately shared hidden agenda
The conclusions drawn from the leaked documents of Mossack Fonseca only serve to highlight the already existing incriminating data, and are therefore accurate and fact based. The data which among other important figures, revealed the involvement of 140 politicians worldwide, has been portrayed by the International Consortium of Investigative Journalists in the most transparent way possible, giving public access to names and infographics. Essentially, the conclusions are not only in line with the data, but the data is also presented in such a way that the reader can reach certain conclusions themself, for instance what are the implications of a family member of a political figure being involved but not the politician directly.
ICIJ. 2018. “Explore the Panama Papers Key Figures.” ICIJ. ICIJ. February 1. https://www.icij.org/investigations/panama-papers/explore-panama-papers-key-figures/.
ICIJ. 2018. “About the investigation.” ICIJ. ICIJ.https://www.icij.org/investigations/panama-papers/pages/panama-papers-about-the-investigation/