How SPEAR Identifies Domain Experts within Delicious
At the SIGIR 2009 conference, we had the great fortune of learning about a new academic research project that aims to discover the top authoritative users and links in social networking services like delicious. We were so impressed by the work and its wide array of applications that we asked the researchers to write a guest post here describing their findings. — Vik Singh
By Michael G. Noll and Ching-man Au Yeung
A major problem of the Internet today is that finding high quality information is not easy nor fast. The steady increase of spam and junk content on the Web further complicates this challenge. Another related issue is that finding knowledgeable and trustworthy users on social platforms like Delicious is much more difficult than it should be. Wouldn’t it be nice if Delicious recommended “good” users with similar interests? Or wouldn’t it be helpful if you could get a selection of great websites on jewelry or mortgage without being overwhelmed by spam?
To tackle this problem, we created the SPEAR algorithm. SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like Delicious. In the case of the latter, this means analyzing the timeline of the bookmarking and tagging activities of users. The focus of SPEAR is on the ability of users to find new, high quality information on the Internet. A great benefit of SPEAR is that it returns two very useful sets of results: first, a list of users ranked by their expertise; and second, a list of websites ranked by their quality. So, whether you are looking for experts on Delicious for the programming language JavaScript or want to find the best websites on photography, SPEAR can help.
On top of that, the algorithm has been shown to be very resistant to spamming attacks. We tested the SPEAR algorithm using data from Delicious - over 71,000 Web documents, 0.5 million users, and 2 million shared bookmarks. We set the algorithm to find JavaScript experts, for example, and it produced a list of users; the top two were professional software developers, and not a single spammer was ranked in the Top 200.
Technically, SPEAR is based on the well-known information retrieval algorithm HITS, a technique presented in 1999 that is used by search engines to rank Web pages. We came up with SPEAR by modifying HITS so that it fits to the characteristics of open and shared systems like Delicious and extended it with a new component that integrates the timeline of user activities into its analysis. This resulted in further performance improvements of the algorithm (refer to Figure 1 below).
The two main elements of the new SPEAR algorithm are:
1. Mutual reinforcement of user expertise and document quality: A user’s expertise in a particular topic depends on the quality of the documents she or he has found, and the quality of documents in turn depends on the expertise of the users who have found them.
2. Discoverers vs. followers: Expert users should be discoverers – they tend to be faster than others to identify new and high quality documents. In other words, “the early bird catches the worm” (see also Figure 1). SPEAR gives more credit to users the earlier they find high quality documents.
The combination of both these elements has the effect that SPEAR favors quality over quantity of user actions, and that the algorithm is quite resistant to today’s spamming attacks.
We believe SPEAR is very useful in the context of open systems, particularly, social networks. That said, we are already researching the next version of the algorithm – the popularity of online services like Delicious is rising, and so is the spam threat. Whether we want to improve the user experience on Delicious or win the arms race against spammers, there’s still a lot of work left to do!
Figure 1: The SPEAR algorithm gives more credit to early discoverers of new information. How much credit each user receives depends on a so-called credit score function, which is supplied as a parameter to the algorithm.

Figure 2: The main technical components of the actual SPEAR algorithm are a weighted adjacency matrix and two score vectors. The vectors keep track of the expertise score of users and quality scores of documents, respectively.

About Michael Noll
Michael is a researcher and bi-national Ph.D. candidate in Computer Science at the Hasso Plattner Institute, Germany, and the University of Luxembourg. His research interests are mainly within the fields of the Social Web, information retrieval and information security. He enjoys tackling difficult problems and solving them in practice, particularly with free and open source software.
About Albert Au Yeung
Albert is originally from Hong Kong, and is now a final year PhD candidate in Computer Science at the University of Southampton, UK. His PhD research project focuses on how implicit semantics and qualities of entities on the Web can be uncovered by analyzing the collective user behaviors on social Websites such as collaborative tagging systems. His research interests also include online social network analysis and linked data on the Semantic Web.
Related Links
SPEAR homepage, http://www.spear-algorithm.org/
Michael G. Noll, http://www.michael-noll.com/
Ching-man Au Yeung, http://www.ecs.soton.ac.uk/~cmay06r/
“A Better Way to Rank Expertise Online”, Technology Review, 07/09, http://www.technologyreview.com/web/23100/
24 comments August 31st, 2009

