Aug 31 2009

How SPEAR Identifies Domain Experts within Delicious

At the SIGIR 2009 conference, we had the great fortune of learning about a new academic research project that aims to discover the top authoritative users and links in social networking services like delicious. We were so impressed by the work and its wide array of applications that we asked the researchers to write a guest post here describing their findings. — Vik Singh

By Michael G. Noll and Ching-man Au Yeung

A major problem of the Internet today is that finding high quality information is not easy nor fast.  The steady increase of spam and junk content on the Web further complicates this challenge. Another related issue is that finding knowledgeable and trustworthy users on social platforms like Delicious is much more difficult than it should be. Wouldn’t it be nice if Delicious recommended “good” users with similar interests?  Or wouldn’t it be helpful if you could get a selection of great websites on jewelry or mortgage without being overwhelmed by spam?

To tackle this problem, we created the SPEAR algorithm. SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like Delicious. In the case of the latter, this means analyzing the timeline of the bookmarking and tagging activities of users. The focus of SPEAR is on the ability of users to find new, high quality information on the Internet. A great benefit of SPEAR is that it returns two very useful sets of results: first, a list of users ranked by their expertise; and second, a list of websites ranked by their quality. So, whether you are looking for experts on Delicious for the programming language JavaScript or want to find the best websites on photography, SPEAR can help.

On top of that, the algorithm has been shown to be very resistant to spamming attacks. We tested the SPEAR algorithm using data from Delicious - over 71,000 Web documents, 0.5 million users, and 2 million shared bookmarks. We set the algorithm to find JavaScript experts, for example, and it produced a list of users; the top two were professional software developers, and not a single spammer was ranked in the Top 200.

Technically, SPEAR is based on the well-known information retrieval algorithm HITS, a technique presented in 1999 that is used by search engines to rank Web pages. We came up with SPEAR by modifying HITS so that it fits to the characteristics of open and shared systems like Delicious and extended it with a new component that integrates the timeline of user activities into its analysis. This resulted in further performance improvements of the algorithm (refer to Figure 1 below).

The two main elements of the new SPEAR algorithm are:

1. Mutual reinforcement of user expertise and document quality: A user’s expertise in a particular topic depends on the quality of the documents she or he has found, and the quality of documents in turn depends on the expertise of the users who have found them.

2. Discoverers vs. followers: Expert users should be discoverers – they tend to be faster than others to identify new and high quality documents. In other words, “the early bird catches the worm” (see also Figure 1). SPEAR gives more credit to users the earlier they find high quality documents.

The combination of both these elements has the effect that SPEAR favors quality over quantity of user actions, and that the algorithm is quite resistant to today’s spamming attacks.

We believe SPEAR is very useful in the context of open systems, particularly, social networks. That said, we are already researching the next version of the algorithm – the popularity of online services like Delicious is rising, and so is the spam threat. Whether we want to improve the user experience on Delicious or win the arms race against spammers, there’s still a lot of work left to do!

Figure 1: The SPEAR algorithm gives more credit to early discoverers of new information. How much credit each user receives depends on a so-called credit score function, which is supplied as a parameter to the algorithm.

Figure 2: The main technical components of the actual SPEAR algorithm are a weighted adjacency matrix and two score vectors. The vectors keep track of the expertise score of users and quality scores of documents, respectively.

About Michael Noll

Michael is a researcher and bi-national Ph.D. candidate in Computer Science at the Hasso Plattner Institute, Germany, and the University of Luxembourg. His research interests are mainly within the fields of the Social Web, information retrieval and information security. He enjoys tackling difficult problems and solving them in practice, particularly with free and open source software.

About Albert Au Yeung

Albert is originally from Hong Kong, and is now a final year PhD candidate in Computer Science at the University of Southampton, UK. His PhD research project focuses on how implicit semantics and qualities of entities on the Web can be uncovered by analyzing the collective user behaviors on social Websites such as collaborative tagging systems. His research interests also include online social network analysis and linked data on the Semantic Web.

Related Links

SPEAR homepage, http://www.spear-algorithm.org/

Michael G. Noll, http://www.michael-noll.com/

Ching-man Au Yeung, http://www.ecs.soton.ac.uk/~cmay06r/

“A Better Way to Rank Expertise Online”, Technology Review, 07/09, http://www.technologyreview.com/web/23100/

Vik Singh · vik Tags: announcements conferences showcase Bookmark this

25 Comments Add your own

  • Martin  |  Aug 31 2009 at 2:53 pm

    Interesting, but nor the post nor the homepage address the current availability of the algorithm. Is it Open Source? Do you license it? At what cost? Where can be tested? And so on and so forth…

  • Finding Better Friends: D&hellip  |  Aug 31 2009 at 6:02 pm

    […] from the chaff. This morning the two postgraduate students offered their findings to Delicious in a blog post. The project was first evaluated using data sets collected from the popular bookmarking […]

  • WebHosts 2009» Blog&hellip  |  Aug 31 2009 at 6:06 pm

    […] from the chaff. This morning the two postgraduate students offered their findings to Delicious in a blog post. The project was first evaluated using data sets collected from the popular bookmarking […]

  • We Love Crowds » Fi&hellip  |  Aug 31 2009 at 6:08 pm

    […] the chaff. This morning the two postgraduate students offered their findings to the Delicious in a blog post. The project was first evaluated using data sets collected from the popular bookmarking […]

  • Finding Better Friends: D&hellip  |  Aug 31 2009 at 6:08 pm

    […] the chaff. This morning the two postgraduate students offered their findings to the Delicious in a blog post. The project was first evaluated using data sets collected from the popular bookmarking […]

  • Finding Better Friends: D&hellip  |  Aug 31 2009 at 7:42 pm

    […] from the chaff. This morning the two postgraduate students offered their findings to Delicious in a blog post. The project was first evaluated using data sets collected from the popular bookmarking […]

  • Webmaster Blog | SPEAR: E&hellip  |  Aug 31 2009 at 9:15 pm

    […] nach Dokumenten, sondern auch nach interessanten Personen nützlich sein können, zeigt nun eine Veröffentlichung zweier […]

  • Bram.us » SPEAR Alg&hellip  |  Sep 1 2009 at 1:04 am

    […] a ranked list of expert users; and second, a ranked list of high quality Web documents.” — via the delicious blog Spread the […]

  • delicious blog » How SPE&hellip  |  Sep 1 2009 at 5:50 am

    […] delicious blog » How SPEAR Identifies Domain Experts within Deliciousblog.delicious.com […]

  • Delicious: Finding high q&hellip  |  Sep 1 2009 at 11:02 am

    […] on Delicious or win the arms race against spammers, there’s still a lot of work left to do! [Delicious]   blog comments powered by Disqus var disqus_url = […]

  • Todd Troxell  |  Sep 1 2009 at 1:17 pm

    Quite interesting, though one point I’d contend with is calling early discoverers domain experts– I’ve noticed a lot of folks linking stuff early on who are not necessarily experts, but they follow news and rss closely.

  • AMB Album » Finding&hellip  |  Sep 1 2009 at 8:50 pm

    […] from the chaff. This morning the two postgraduate students offered their findings to Delicious in a blog post. The project was first evaluated using data sets collected from the popular bookmarking […]

  • links for 2009-09-02 &laq&hellip  |  Sep 2 2009 at 12:09 am

    […] How SPEAR Identifies Domain Experts within Delicious (tags: socialmedia social research statistics network crowdsourcing delicious Antispam filtering spam) […]

  • Xlog&hellip  |  Sep 2 2009 at 1:43 am

    […] Posted delicious blog » How SPEAR Identifies Domain Experts within Delicious […]

  • Ryan’s Lifestream Diges&hellip  |  Sep 2 2009 at 3:04 am

    […] Posted delicious blog » How SPEAR Identifies Domain Experts within Delicious […]

  • Peter Murray  |  Sep 2 2009 at 12:42 pm

    Are there plans to integrate this into the Delicious service?

  • dk  |  Sep 3 2009 at 5:16 pm

    “The combination of both these elements has the effect that SPEAR favors quality over quantity of user actions”

    I think there is a failure, because SPEAR gives trust to a user who acts fast on “indexing” and the more action the user takes, the more trust the user will gain.

    Anyway, this is great work.

  • Daily Digest for 2009-09-&hellip  |  Sep 3 2009 at 11:10 pm

    […] Shared a link on Google Reader. delicious blog » How SPEAR Identifies Domain Experts within Delicious […]

  • Aktuelle Links (gespeiche&hellip  |  Sep 4 2009 at 10:19 am

    […] delicious blog » How SPEAR Identifies Domain Experts within Delicious SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like Delicious. In the case of the latter, this means analyzing the timeline of the bookmarking and tagging activities of users. The focus of SPEAR is on the ability of users to find new, high quality information on the Internet. A great benefit of SPEAR is that it returns two very useful sets of results: first, a list of users ranked by their expertise; and second, a list of websites ranked by their quality. So, whether you are looking for experts on Delicious for the programming language JavaScript or want to find the best websites on photography, SPEAR can help. […]

  • links for 2009-09-04 &laq&hellip  |  Sep 4 2009 at 5:07 pm

    […] delicious blog » How SPEAR Identifies Domain Experts within Delicious (tags: delicious reputation search recomendation) […]

  • The web as a social lands&hellip  |  Sep 8 2009 at 3:15 pm

    […] delicious blog » How SPEAR Identifies Domain Experts within Delicious […]

  • links for 2009-09-14 | Ip&hellip  |  Sep 14 2009 at 5:03 am

    […] delicious blog » How SPEAR Identifies Domain Experts within Delicious "To tackle this problem, we created the SPEAR algorithm. SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like Delicious. In the case of the latter, this means analyzing the timeline of the bookmarking and tagging activities of users. The focus of SPEAR is on the ability of users to find new, high quality information on the Internet. A great benefit of SPEAR is that it returns two very useful sets of results: first, a list of users ranked by their expertise; and second, a list of websites ranked by their quality. So, whether you are looking for experts on Delicious for the programming language JavaScript or want to find the best websites on photography, SPEAR can help." (tags: web2.0 spear algorithms filtering experts collaboration crowdsourcing del.icio.us) […]

  • Plastic Sleeves  |  Oct 19 2009 at 8:28 am

    Having spent about 12 years on the Internet and often an early adopter, I can honestly say that I would be concerned that the content rating would be unfairly weighted against average people. For example, almost all of my friends are geeks and we follow most things Internet. When we like something, we send it out to each other and then we all might bookmark it. As long as the world sees the world and abortion or gay rights the way that we do, then all is good. However, where our opinion is on the fringe, we would get weighted values for our entries and thus begin to shape the influences for those who unwittingly rely on our prowess.

    I am sure that this can be resolved though.

    Thanks,
    Neil

  • mens dress shirts  |  Nov 6 2009 at 1:46 am

    Thanks a lot. but should not this be secrets, i mean spamers will figure out a way to deal with it, for example, if experts are more active, spamers will act active, anyway, thanks for sharing

  • Blank Label  |  Nov 7 2009 at 11:39 pm

    im fairly confident that spammers are smart enough to figure out the way things work.
    the experts / programmers just need to be a bit smarter =)

Leave a comment

Off-topic or spammy comments may be deleted. For general help, contact support.

required

required, hidden

Recent comments

Feeds

Who are we?

Posts by tag

Posts by month