group 1.1

Roses are Red, Violets are Blue, how much does your filter bubble know about you?

 

Roses are Red, violets are Blue, is the filter bubble suffocating you?

On the 7th of March, 2019, Violet Kracht was brought into this world by five scholars on a quest to understand the phenomena commonly known as the filter bubble.The filter bubble is essentially what most social media platforms nowadays use to keep users engaged, by bombarding them with the same (or similar) type of content as whatyou’ve already shown a mild interest in. Whilst Facebook (one of the most popular social networking website) founder and CEO Mark Zuckerberg vehemently denies the existence of the filter bubble, calling it “a good sounding theory, and I can get why people repeat it, but it’s not true,” (UK 2016) Violet Kracht could potentially demonstrate how Zuckerberg’s claim that, “social media in general are the most diverse forms of media that are out there,” is a good sounding claim too, but it’s not true.

To prove the claim that the phenomenon of the filter bubble does in fact exist, we created Violet Kracht. But who exactly is Violet Kracht? In a nutshell, she is a 42-year- old self-employed transgender female of African-Moroccan descent, who resides in Amsterdam. Her religion is somewhat ambiguous; raised as a Christian, she struggled being accepted as a transgender and thus contemplated converting to Buddhism. Politically, she’s probably more left-leaning than anything else, although politics aren’ther biggest concern. Afraid of commitment and somewhat stuck in the past, she enjoys the occasional evening drink (or two), though she does not consider it problematic just yet. Her hobbies include playing tennis, contemporary dance, reggae music and goingto cat cafes. As a transgender herself, she’s also highly immersed in the LGBTQ community. All of Violet’s hobbies and interests have been reflected on her Facebook profile via likes, shares and her affiliation with various public groups. Ethically, it was decided from the beginning that Violet would not accept any friend requests or join any private groups, in order to not deceive real people.

Once Violet was created, we installed a beta version of the tracking software Fbtrex which would track her data and Facebook activity, and gathered all the public data available. However, to prove the existence of the filter bubble, we were required to instal the same software on one of our personal Facebook profiles, to then compare thefindings, as that would (apparently) help understand how Facebook’s algorithm works,and how the filter bubble acts on different profiles.

This is where Rose enters the game. To preserve our anonymity, we named our generous contributor ‘Rose’, whose data from Fbtrex was also collected and analyzed, to support our investigation on the existence of the filter bubble. Rose’s digital footprint was analyzed to evaluate whether there were any similarities in content and advertisements between the two profiles, who have completely different interests (“WhatIs Digital Footprint? - Definition from WhatIs.Com” n.d.). However, it should be noted that a comparison of a three-week-old profile next to a 10-year-old profile is by default flawed.Violet’s profile was engaged with following a strict regime (which will be elaborated upon further on), and the data collected only reflects the filter bubble’s interference over thecourse of three weeks. Rose on the other hand has a longstanding Facebook profile, with hundreds of friends, a vast repertoire of liked and shared content, and affiliations with numerous different groups and events. As such, the filter bubble will certainly look different on Violet and Rose’s profile. Nevertheless, looking at the data quantitatively yielded insightful results regarding the existence of the filter bubble.

But why conduct such an experiment to begin with? Other than to prove the existence of the filter bubble, it is of utmost importance and relevance for anyone with an online presence, be it on social media or simply on Google, to understand how sites choose the criteria by which they filter your information. As Eli Pariser said of the filter bubble, “you don’t choose to enter the bubble…they come to you— and because theydrive up profits for the Web sites that use them, they’ll become harder and harder to avoid” (UK 2016). So, since we can’t avoid the filter bubble (whether we believe in itsexistence or not), socially it is a highly relevant matter, as it affects everyone in the online environment. As such, maybe understanding how the filter bubble works will illuminate what is actually happening on the sites that people so impetuously engage with.

How did Rose become Red, and Violet so Blue?

To assess the potential presence of the filter bubble on Facebook, this research focused on conducting a comparative data analysis of the activities performed by Violet Kracht and our contributor Rose on Facebook. In order to collect and analyze the information on the interaction of both personas on Facebook, we installed the tracking tool Fbtrex, which accumulated and identified all the public posts that both accounts interacted with. In addition, to digitally amass the data, we manually collated the observations of the two accounts in a daily-updated journal over the course of 10 days.

During the data gathering process, Violet and Rose logged into their Facebook profiles using the “Internet Protocol” of different locations. One of the reasons for this was to avoid the identification of Violet’s fake identity by Facebook’s algorithm. More importantly, eliminating the infusion of any personal bias and signal that would conceivably originate from the contributor’s regular IP address was crucial to maintain the accuracy of the findings during the entire process.

Throughout the 10 days we conducted our research (between the 7th and 17th of March 2019), both personas had to log into their accounts five times a day at random time slots. The main task of each login consisted of liking five posts and sharing one, which ideally reflected the interests of the individual personalities on their timeline. After each entry, both personas noted their observations in the daily journal covering the following areas:

  1. Time: The exact duration in hours and minutes of the log-in.
  2. First Sight Observation: What were the first things you saw on your timeline?
  3. Liked: What posts did you like? Why?
  4. Shared: What posts did you share? Why?
  5. Connection: Were you able to identify a connection between the information posted on your timeline and the posts you liked and shared the day before and the present day? If so, describe the connection. If it is the first day of your entry, this connection is non-applicable.

Apart from liking and sharing content, Violet joined some public groups in order to further enrich the type of data that was fed to Facebook’s algorithm. Despite the friend suggestions that perpetually appeared on her profile, Violet intentionally refrained from accepting friend requests on Facebook due to potential hindrances in the research process that could be caused through biased and ethically dubious activities. Since the main goal of the research was to test if Facebook’s algorithm de facto acted upon Violet’s interests and perspectives, Violet liked posts that reflected her individual preferences in particular. Amongst others, these posts included LGBTQ-related news, TV shows, and human rights advocating content. After collecting the required data from both profiles, we used the data visualization tool Rawgraph to quantitatively visualize our findings, and show the correlating or varying patterns between Facebook’s data on Violet, and on Rose.

How many shades of Reds and Blues?

After conducting the research in the aforementioned way, the three major findings were converted into visual representations, in order to clarify the prior assumptions. During the research analysis, an error occurred on Fbtrex, as ‘groups’ (the prime type of content in Rose’s analysis) was missing. Thus, it may have influenced the results andvisuals we attained, which is elaborated upon in the limitations section. The following details our results.

Firstly, the data which was extracted from the spreadsheet formulated by Fbtrex provided valuable insight relating to Violet’s Facebook activity, and the content or source-type that she primarily interacted with on her newsfeed. As the circle packaging visual in Figure 1.1 illustrates, Violet’s most commonly used content-type was in the form of posts, with 72.7% of the content appearing on her Facebook newsfeed being posts. This can be explained on the basis of her Facebook activity, as it was decided from the beginning that she would focus on engaging with posts, and not join any private groups, nor accept any friend requests. In comparison, Figure 1.2 illustrates how Rose primarily engaged with posts too, which amounted to 52.2%, and events, at 36.5%. However, Figure 1.2 shows a 0.0% engagement with groups on Rose’s behalf (due to a missing category in Fbtrex), whereas Figure 2.2 reveals a large number of groups that Rose engaged with. A possible explanation for Rose’s high percentage of events could be that she has more “interpersonal networks on Facebook” (Bakshy, Messi and Adamic, 2015), due to Rose’s 10-year digital footprint which can cross-cut common interests for events of Rose’s friends and groups.

 

Figure 1.1 The dominance of the content presented on Violet’s Facebook Newsfeed

Figure 1.1 The dominance of the content presented on Violet’s Facebook Newsfeed

Figure 1.2 The dominance of the content presented on Rose’s Facebook Newsfeed

Figure 1.2 The dominance of the content presented on Rose’s Facebook Newsfeed

 

Secondly, in Figure 2.1, the sunburst distribution outlines the most common sources of content on Violet’s timeline from the gathered data. As such, it is evident that the highest frequency of content provided on Violet’s Facebook feed was the Facebook page LGBTQ Nation (an online magazine) at 19.7%, and the Washington Post (a politically right-center oriented newspaper) at 9.09%. The second-highest distributors were RTL Nieuws at 6,97% which can be considered a moderately neutral news source, and Het Parool at 6.57%. Yet these numbers only somewhat reflect Violet’s politically left leaning preference, mostly in terms of content, and thus does not entirely align with her political perspective.

Furthermore, the content provided by Washington Post was rather reflective of her interests, since the posts included categories like sport, music and entertainment, which link to her interests. This may be explained either by her religious inclination as a Christian (which may have influenced the way in which Facebook’s algorithm read her political leaning), or simply because Violet is not a very politically active persona. In comparison to Rose’s content dissemination, the grand majority of her posts originated from groups and her Facebook interaction, which was primarily from, or with, content sourced by public groups. What the data underlines is that because Rose has a longstanding Facebook network (her account has been active for over 10 years now), it is reasonable that the majority of her posts involve groups and people she voluntarily associates with (through liking, sharing and joining). Because Facebook’s algorithm has much more data on Rose, constituting a larger digital footprint, it can make better inferences on Rose. Regarding the filter bubble, it is much more disguised amongst the variety of posts Rose has interacted with over the years that have contributed to the formation of her personal filter bubble, since there are such diverse forms and sources of content.

 

Figure 2.1 The distribution of the content sources (Violet)

Figure 2.1 The distribution of the content sources (Violet)

Figure 2.1 The distribution of the content sources (Rose)

Figure 2.1 The distribution of the content sources (Rose)

 

Finally, the dendrogram in Figure 3 reveals how the highest distribution of content are “groups” for both accounts, which also presents an error of the Fbtrex software. The second-highest distribution of sources were “posts”. Although there are more groups, the most common form of content was presented to Violet on her Facebook news feed as posts. Consequently, this illustrated a significant correlation between the most dominant form of content, and the most frequent sources that appeared on Violet’s newsfeed. Likewise, a comparison between the journal observations and the data collected by Fbtrex pointed out a prominent correlation between the two datasets. The content posted by LGBTQ Nation became the most dominant source on Violet’s profile, since she liked numerous LGBTQ Nation posts (more than other ones). From the results gathered in the daily journal entries, it is evident that LGBTQ Nation posts were liked 18 times, whereas RTL Nieuws, the second most dominant source, only had four likes.

 

Figure 3.1 The categorization of content on Violet’s Facebook Newsfeed

Figure 3.1 The categorization of content on Violet’s Facebook Newsfeed

Figure 3.2 The categorization of content on Rose’s Facebook Newsfeed

Figure 3.2 The categorization of content on Rose’s Facebook Newsfeed

 

Roses are Red and Violet is definitely more Blue

Through the course of this research and analysis, we came to the conclusive results that Violet’s digital footprint is small compared to the almost 10 year old footprint of Rose’s profile, which consists of her interests, preferences, the locations she visited, and the multitude of other 200+ signals collected by the algorithm (Pariser, 2011) to frame and customize the profile. Rose’s existence in the digital world, not just on Facebook, had significant influence on the newsfeed curated for her by the algorithm, which is largely lacking in Violet’s profile, hence making it easier for us to realize the influence of the filter bubble in the new persona.

However, due to time limitations and the lack of accessibility to the actual algorithm used by Facebook, our results are not concrete regarding the filter bubble, while it does allow us to infer the above-stated conclusion. While the limitations of this research constrain the validity of our findings to some extent, the findings might potentially constitute a preliminary direction for future research which can be conducted over a longer period of time with a more controlled as well as in depth structure and research variables.

Roses being Red and Violet’s being Blue, why do we say so?

The results of the analysis conclusively show that the information Violet Kracht was presented with on her Facebook timeline was well aligned with the kind of data she chose to like and share on a constant basis. The influence and importance of a network of friends can also be seen clearly when comparing the data graphs of Violet’s profile with that of Rose’s, who has a significant digital footprint, making it easier for the Facebook algorithm to customize and personalize the data presented on her Facebook profile.

While these results are conclusive and can be inferred from the data without doubt, the research still faced limitations in terms of using the Fbtrex software, and the applied methodology.

The total data collection process through the Fbtrex software remains unclear due to the lack of transparency of its working, which makes the methodology and the foundation of our research questionable. The open source software, as well as the visibility of the data do not help this predicament due to the limited ability of coding skills and programming literacy skills within the group of people using it for this research purpose. The software itself is currently in its beta version and contains bugs and issues which need to be fixed to yield better answers and details for a more in-depth analysisof the working of Facebook’s algorithm.

Furthermore, the software detects only the data which has been made public, yet influences to Facebook’s algorithm are also largely determined by the personal profiles/groups relating to the profile’s interests. Propaganda is oftentimes spreadthrough private forums, which could potentially be sponsored and paid promotion through efficient use of the algorithm. The software not recording this still only tells us limited information about the workings of the algorithm. That is, the software only helps in auditing the algorithm for the content available publicly on Facebook. The data may be pseudonymized due to privacy concerns, however, due to the way a single profile is curated and personalized, it is still not very private, making it easy to track the data of each profile.

Lastly, the .csv files make the data easy to read in a spreadsheet version. However, this re-organizes the data, and also restricts the amount of data collected in a usable format in comparison to a .json file or other more efficient formats. The software used for visualizing and analyzing the data also brought up new limitations due to glitches in the way they work, as well as due to our limited knowledge with data visualization platforms both in terms of the software and our knowledge of their usability.

Note

Our team:

  • Elodie Behravan: Storyteller
  • Naina Parasher: Finder
  • Paula-Lilli Stahmer: Shaper
  • Dogasu Sitil: Visualizer
  • Maya Spangenberg: Team Worker

Whilst these were the roles each individual primarily fulfilled, we also engaged in other roles since we collaboratively helped each other fulfill the various tasks and so that the bulk of the work was spread out equally.

References

Appendix

Link to Daily Journal Entries for Violet Kracht and Rose

https://docs.google.com/document/d/1JAZ4NH1drtE0UFATpSMq9TczUC00TgUUuKb850xzrRo