Generating unambiguous URL clusters from web search

Hdl Handle:
http://hdl.handle.net/10149/94119
Title:
Generating unambiguous URL clusters from web search
Book Title:
Proceedings of the 2009 workshop on web search click data
Authors:
Smith, G. D. (George); Brailsford, T. (Tim); Donner, C.; Hooijmaijers, D. (Dennis); Truran, M. (Mark); Goulding, J. (James); Ashman, H. (Helen)
Editors:
Craswell, N. (Nick); Jones, R. (Rosie); Dupret, G. (Georges); Viegas, E. (Evelyne)
Affiliation:
University of Teesside
Citation:
Smith, G. D. et al. (2009) 'Generating unambiguous URL clusters from web search', 2009 workshop on Web Search Click Data, Barcelona, Spain, February 9, in Craswell, N. et al. (eds) Proceedings of the 2009 workshop on web search click data. New York: ACM, pp.28-34.
Publisher:
ACM
Conference:
2009 workshop on Web Search Click Data, 2009, Barcelona, Spain, February 9, 2009
Issue Date:
2009
URI:
http://hdl.handle.net/10149/94119
DOI:
10.1145/1507509.1507514
Abstract:
This paper reports on the generation of unambiguous clusters of URLs from clickthrough data from the MSN search query log excerpt (the RFP 2006 dataset). Selections (clickthroughs) by a single user from a single query can be assumed to have some mutual semantic relevance, and the URLs coselected in this way can be aggregated to form single-sense clusters. When the graphs for a single term separate into distinct clusters, the semantics of the distinct clusters can be interpreted as disambiguated aggregations of URLs. This principle had been tested on smaller and more constrained datasets previously, and this paper reports on findings from applying a method based on the principle to the RFP 2006 dataset. This paper evaluates the proposed coselection method for generating single-sense clusters against two other methods, with varying parameters. The evaluation is done both with a human evaluation to determine the quality of the clusters generated by the different methods, and by a simple "edit distance" analysis to determine the content difference of the methods. The main questions addressed are i) whether it is feasible to generate single-sense / sense-coherent clusters, and ii) whether, in a closed world, it would be feasible to discover ambiguous terms. The experimentation showed that sense-coherent clusters were found and further indicated that ambiguous terms could be detected from observing small overlap between large clusters.
Type:
Meetings and Proceedings; Book Chapter
Language:
en
Keywords:
unambiguous; clusters; URLs; clickthrough; coselection
ISBN:
9781605584348
Rights:
ACM allows authors' version of their own ACM-copyrighted work on their personal server or on servers belonging to their employers. For full details see http://www.acm.org/publications/policies/RightsResponsibilities
Citation Count:
0 [Web of Science and Scopus, 11/03/2010]

Full metadata record

DC FieldValue Language
dc.contributor.authorSmith, G. D. (George)en
dc.contributor.authorBrailsford, T. (Tim)en
dc.contributor.authorDonner, C.en
dc.contributor.authorHooijmaijers, D. (Dennis)en
dc.contributor.authorTruran, M. (Mark)en
dc.contributor.authorGoulding, J. (James)en
dc.contributor.authorAshman, H. (Helen)en
dc.contributor.editorCraswell, N. (Nick)en
dc.contributor.editorJones, R. (Rosie)en
dc.contributor.editorDupret, G. (Georges)en
dc.contributor.editorViegas, E. (Evelyne)en
dc.date.accessioned2010-03-11T14:15:26Z-
dc.date.available2010-03-11T14:15:26Z-
dc.date.issued2009-
dc.identifier.isbn9781605584348-
dc.identifier.doi10.1145/1507509.1507514-
dc.identifier.urihttp://hdl.handle.net/10149/94119-
dc.description.abstractThis paper reports on the generation of unambiguous clusters of URLs from clickthrough data from the MSN search query log excerpt (the RFP 2006 dataset). Selections (clickthroughs) by a single user from a single query can be assumed to have some mutual semantic relevance, and the URLs coselected in this way can be aggregated to form single-sense clusters. When the graphs for a single term separate into distinct clusters, the semantics of the distinct clusters can be interpreted as disambiguated aggregations of URLs. This principle had been tested on smaller and more constrained datasets previously, and this paper reports on findings from applying a method based on the principle to the RFP 2006 dataset. This paper evaluates the proposed coselection method for generating single-sense clusters against two other methods, with varying parameters. The evaluation is done both with a human evaluation to determine the quality of the clusters generated by the different methods, and by a simple "edit distance" analysis to determine the content difference of the methods. The main questions addressed are i) whether it is feasible to generate single-sense / sense-coherent clusters, and ii) whether, in a closed world, it would be feasible to discover ambiguous terms. The experimentation showed that sense-coherent clusters were found and further indicated that ambiguous terms could be detected from observing small overlap between large clusters.en
dc.language.isoenen
dc.publisherACMen
dc.rightsACM allows authors' version of their own ACM-copyrighted work on their personal server or on servers belonging to their employers. For full details see http://www.acm.org/publications/policies/RightsResponsibilitiesen
dc.subjectunambiguousen
dc.subjectclustersen
dc.subjectURLsen
dc.subjectclickthroughen
dc.subjectcoselectionen
dc.titleGenerating unambiguous URL clusters from web searchen
dc.typeMeetings and Proceedingsen
dc.typeBook Chapteren
dc.contributor.departmentUniversity of Teessideen
dc.title.bookProceedings of the 2009 workshop on web search click dataen
dc.identifier.conference2009 workshop on Web Search Click Data, 2009, Barcelona, Spain, February 9, 2009en
ref.citationcount0 [Web of Science and Scopus, 11/03/2010]en
or.citation.harvardSmith, G. D. et al. (2009) 'Generating unambiguous URL clusters from web search', 2009 workshop on Web Search Click Data, Barcelona, Spain, February 9, in Craswell, N. et al. (eds) Proceedings of the 2009 workshop on web search click data. New York: ACM, pp.28-34.-
prism.startingPage28-
prism.endingPage34-
All Items in TeesRep are protected by copyright, with all rights reserved, unless otherwise indicated.