I have been a bit latent about writing here in the past couple weeks, but it’s not for lack of motivation. Since commencement a few weeks ago, I’ve spent just about every day flinging myself at the wall that is my thesis, trying to get it to crack. My thesis is by far top priority –I have less than 3 weeks to finish up — so I have a hard time letting myself work on anything else. But maybe changing gears for a little bit will be helpful; and this post won’t be entirely off topic.

In doing research for my thesis, I have had more than a few run-ins with Google’s academic paper search tool, Google Scholar. While it is both familiar (in interface) and fast, it lacks some more advanced search features that it seems like any researcher would inevitably want; sort by date comes to mind, and maybe the ability to comment on papers (like Google provides for businesses).

Thinking bigger, what I would really like to see from Google Scholar is a citation visualization tool that would allow a researcher to visually follow citations, jumping from paper to paper. This would make it much easier to get up to speed on a given topic. It is tedious to keep track of the connections between a group of papers by hand, only made worse by the various ways different authors reference their citations in papers (First letter of last name of each author, followed by last 2 digits of the publishing year? How about just a citation number!). Citation information already exists, so visualize it!

Google’s lack of this kind of tool for wading through research papers is perhaps made even more surprising when you consider that the original idea behind Google’s search algorithm draws heavily from a technique called citation analysis, which is all about considering the connections between academic papers.

It doesn’t actually surprise me that Google doesn’t provide this kind of visualization, they tend to be stronger with back-end problems than front-end, user experience type projects. But what they could (and should!) do is provide a Google Scholar API. In fact, considering the ridiculous amount of APIs they do provide, it’s a wonder one doesn’t exist! There is even a thread in the Google API issue tracker requesting a Google Scholar API that has been active since September 2008.

To be fair, the task of keeping track of academic research papers is made painful by the various paywalls that many papers sit behind, and perhaps Google isn’t confident enough in their data set (or, the data set is too inconsistent) to provide a robust API. Still, this application would not only be very cool, but could also make research in general more efficient.

There are already tools that basically do what I’ve described (e.g. paperscope for astrophysics papers) from smaller sets of papers. So once Google finally does provide a Google Scholar API to its awesome collection of many papers across many topics, it won’t be long before someone creates a citation visualization tool from it, and probably plenty of other cool mashups too.

P.S. Hey Google, feel free to hire me to work on this :)

  1. Sounds like a good addition to Google Scholar to me and I don’t have a problem with the article overall but there is a lot of Google hating going on ha. I understand that Google is the search engine of 99% of the educated world. But instead of singling out just them how about pointing out that Bing doesn’t really have what you are looking for either. Do you know if they provide anything similar to Google Scholar? It seems like when you talk about visualization tools for articles it should be right up Bing’s alley considering it is a “Decision Engine” (their words not mine) and not just a search engine. With all that said, I agree with everything that you said I just felt like I needed to play devil’s advocate for a bit.

    1. Wow, papercube is an awesome front-end. Very smooth and flexible. I hadn’t seen it before, thanks!

      It looks like papercube has been sitting on github since 2009. It’s crazy that nobody (e.g. IEEE) has bothered to at least pick up this technology for their own database.

