Mining for gold in yesterday’s news

Post by Tilburg pre-master student Josh Boissevain

When most people use the term “data journalism” they use it in context of how journalists can use data to produce better and more in-depth content.

This is great, but does it go far enough? How else can the principles of data processing be used to improve journalism?  Are there any corners of the reader’s experience or the news producer’s experience where these tools have yet to be applied?

newsqry logoA team of master’s and pre-masters’ students at Tilburg recently tackled that question as part of a semester long project in Professor Suleman Shahid’s Human Media Interaction course.

The team, composed of Dhiratara Widhya, Alice Frain, Onur Kasımoğulları and myself decide to tackle one area we found to be relatively untouched by the advances of data science: The news archive.

In our opinion, even some of the globe’s top news sites still treat their archives like a dusty old filing system; a place for yesterdays news is stored and forgotten. But why does it have to be like that. Every news story is filled with myriads of data, from the actors and locations to the events themselves (and their causes and effects).

The Newsqry team

The Newsqry team

Instead of treating searching as simple string matches (i.e. a search for “Putin” only returns a list of articles with the matching string of those five characters, what if the archive system was aware of who Putin was and the context of the stories about him using semantic networks.  All of the sudden, a search for Putin becomes much more meaningful, because you don’t have to rely on luck of a string match but can rest assured that the search algorithm also knows you are probably interested in articles about Russia, the Russian presidency, Moscow, the Kremlin and Putin’s right-hand man Dmitri Medvedev.

This idea was inspired by the work of two projects on the same idea.  Credit goes first and foremost to The Neptuno Project, a team of researchers from Spain applying semantic web technologies to news archives.  But while their project focuses more on the implementation of these technologies,  our goal (for the class anyway) was to focus primarily on design and interface issues.

For this project, tour team took the online news website of Transitions Online ( and tested just how functional their archive search was for average users.  After testing the users, seeing the problems they encountered and hearing their feedback, we came up with a completely new interface; one that takes advantage of the capabilities of data and the relationships an entire news archive contains.

The result of our project is called Newsqry, a wholly new way to think about  how yesterday’s news can be presented to get the most out of yesterday’s news.  And as such we’ve tried to implement a much simpler search interface taking advantage of everything Semantic network analysis can offer as well as completely new ways to visualize search results.

For example, in these screenshots from our prototype, you’ll notice that while we have kept the traditional list view, we’ve also included two additional ways of displaying results.

The traditional "list view"

The traditional “list view”

The first is a map view, which for a news organization like TOL that covers a specific but broad region would be a useful way to browse stories and events.  This allows each news item to be tied to a particular geographic location. This can be useful when searching for a figure who shows up in the news in context to many places (such as Putin). So instead of looking up each of the results to see what it’s about, a user can simply switch over to the map interface and see exactly where each of the items is located.


The map view

The map view

We have also implemented a chronological view of results as well.  This interface was inspired by an amazing platform developed by the Explainer Project at the Australian Broadcasting Corporation.  (I highly recommend reading his blog post about the platform). If it looks familiar, that’s because the platform has also been used by The Guardian on several occasions to create amazing story visualizations (such as this fantastic example about the 2011 London riots).  But as far as we know, the idea has never been deployed on an entire news organization’s website.

The timeline view

The timeline view

Here you can see a poster presentation of our final results presented May 21.

NewsQry poster

And here you can see a more dynamic presentation of the platform.

 For more information or to contact the team, feel free to contact me at

Tagged with: , ,
Posted in projects