|
ABSTRACT
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of context on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authorative” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristrics for link-based analysis.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Rob Barrett , Paul P. Maglio , Daniel C. Kellem, How to personalize the Web, Proceedings of the SIGCHI conference on Human factors in computing systems, p.75-82, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258595]
|
| |
3
|
BERMAN, O., HODGSON,M.J.,AND KRASS, D. 1995. Flow-interception problems. In Facility Location: A Survey of Applications and Methods, Z. Drezner, ed. Springer-Verlag, New York.
|
 |
4
|
|
| |
5
|
Krishna Bharat , Andrei Broder , Monika Henzinger , Puneet Kumar , Suresh Venkatasubramanian, The connectivity server: fast access to linkage information on the Web, Proceedings of the seventh international conference on World Wide Web 7, p.469-477, April 1998, Brisbane, Australia
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
CHAKRABARTI, S., DOM, B., GIBSON, D., KUMAR,S.R.,RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1998. Experiments in topic distillation. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web (Melbourne, Australia). ACM, New York.
|
| |
11
|
Soumen Chakrabarti , Byron Dom , Prabhakar Raghavan , Sridhar Rajagopalan , David Gibson , Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text, Proceedings of the seventh international conference on World Wide Web 7, p.65-74, April 1998, Brisbane, Australia
|
| |
12
|
CHUNG, F. R. K. 1997. Spectral Graph Theory. AMS Press, Providence, R.I.
|
| |
13
|
CHEKURI, C., GOLDWASSER, M., RAGHAVAN, P., AND UPFAL, E. 1997. Web search using automated classification. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11).
|
 |
14
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
15
|
DE SOLLA PRICE, D. 1981. The analysis of square matrices of scientometric transactions. Sciento-metrics 3 55-63.
|
| |
16
|
DEERWESTER, S., DUMAIS, S., LANDAUER, T., FURNAS, G., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 391-407.
|
| |
17
|
DIGITAL EQUIPMENT CORPORATION. AltaVista search engine, http://altavista.digital.com/.
|
| |
18
|
DONATH,W.E.,AND HOFFMAN, A. J. 1973. Lower bounds for the partitioning of graphs. IBM J. Res. Develop. 17.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
EGGHE, L., AND ROUSSEAU, R. 1990. Introduction to Informetrics, Elsevier, North-Holland, Am-sterdam, The Netherlands.
|
| |
23
|
FIELDER, M. 1973. Algebraic connectivity of graphs. Czech. Math. J. 23, 298-305.
|
| |
24
|
|
 |
25
|
|
| |
26
|
GARFIELD, E. 1972. Citation analysis as a tool in journal evaluation. Science 178, 471-479.
|
| |
27
|
GELLER, N. 1978. On the citation influence methodology of Pinski and Narin. Inf. Proc. Manage. 14, 93-95.
|
 |
28
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
| |
29
|
|
| |
30
|
GOLUB, G., AND VAN LOAN, C. F. 1989. Matrix Computations. Johns Hopkins University Press, Baltimore, Md.
|
| |
31
|
HOTELLING, H. 1933. Analysis of a complex statistical variable into principal components. J. Educ. Psychol. 24, 417-441.
|
| |
32
|
HUBBELL, C. H. 1965. An input-output approach to clique identification. Sociometry 28, 377-399.
|
| |
33
|
HUBERMAN, B., PIROLLI, P., PITKOW, J., AND LUKOSE, R. 1998. Strong regularities in world wide web surfing. Science, 280.
|
| |
34
|
JOLLIFFE, I. T. 1986. Principal Component Analysis. Springer-Verlag, New York.
|
| |
35
|
KATZ, L. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 39-43.
|
| |
36
|
KESSLER, M. M. 1963. Bibliographic coupling between scientific papers. Amer. Document. 14, 10-25.
|
| |
37
|
LARSON, R. 1996. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Proceedings of the Annual Meeting of the American Society of Information Science (Baltimore, Md., Oct. 19-24).
|
| |
38
|
LEVINE, J. H. 1979. Joint-space analysis of 'pick-any' data: Analysis of choices from an uncon-strained set of alternatives. Psychometrika, 44, 85-92.
|
| |
39
|
|
| |
40
|
MCBRYAN, O. 1994. GENVL and WWWW: Tools for taming the web. In Proceedings of the 1st International World Wide Web Conference (Geneva, Switzerland, May).
|
| |
41
|
MCCAIN, K. 1986. Co-cited author mapping as a valid representation of intellectual structure. J. Amer. Soc. Info. Sci. 37, 111-122.
|
| |
42
|
NOMA, E. 1982. An improved method for analyzing square scientometric transaction matrices. Scientometrics 4, 297-316.
|
| |
43
|
NOMA, E. 1984. Co-citation analysis and the invisible college. J. Amer. Soc. Info. Sci. 35, 29-33.
|
 |
44
|
Christos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala, Latent semantic indexing: a probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159-168, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275505]
|
| |
45
|
PINSKI, G., AND NARIN, F. 1976. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Inf. Proc. Manage. 12, 297-312.
|
 |
46
|
Peter Pirolli , James Pitkow , Ramana Rao, Silk from a sow's ear: extracting usable structures from the Web, Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, p.118-125, April 13-18, 1996, Vancouver, British Columbia, Canada
[doi> 10.1145/238386.238450]
|
 |
47
|
James Pitkow , Peter Pirolli, Life, death, and lawfulness on the electronic frontier, Proceedings of the SIGCHI conference on Human factors in computing systems, p.383-390, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258805]
|
| |
48
|
|
| |
49
|
SHAW, W. M. 1991. Subject and citation indexing. Part I: The clustering structure of composite representations in the cystic fibrosis document collection. J. Amer. Soc. Info. Sci. 42, 669-675.
|
| |
50
|
SHAW, W. M. 1991. Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations. J. Amer. Soc. Info. Sci. 42, 676-684.
|
| |
51
|
SMALL, H. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Amer. Soc. Info. Sci. 24, 265-269.
|
| |
52
|
|
| |
53
|
SMALL, H., AND GRIFFITH, B. C. 1974. The structure of the scientific literatures I. Identifying and graphing specialties. Science Studies 4, 17-40.
|
| |
54
|
|
| |
55
|
|
 |
56
|
Ron Weiss , Bienvenido Vélez , Mark A. Sheldon, HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering, Proceedings of the the seventh ACM conference on Hypertext, p.180-193, March 16-20, 1996, Bethesda, Maryland, United States
[doi> 10.1145/234828.234846]
|
| |
57
|
WIRED DIGITAL,INC. Hotbot, http://www.hotbot.com.
|
| |
58
|
YAHOO!CORPORATION Yahoo!, http://www.yahoo.com.
|
CITED BY 396
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y. Liu , B. Zhang , Z. Chen , M. R. Lyu , W. Ma, Affinity rank: a new scheme for efficient web search, Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, May 19-21, 2004, New York, NY, USA
|
|
Claudine Badue , Ramurti Barbosa , Paulo Golgher , Berthier Ribeiro-Neto , Nivio Ziviani, Basic issues on the processing of web queries, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
Muyuan Wang , Zhiwei Li , Lie Lu , Wei-Ying Ma , Naiyao Zhang, Web object indexing using domain knowledge, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
Chris Ding , Xiaofeng He , Parry Husbands , Hongyuan Zha , Horst D. Simon, PageRank, HITS and a unified framework for link analysis, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Joel C. Miller , Gregory Rae , Fred Schaefer , Lesley A. Ward , Thomas LoFaro , Ayman Farahat, Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.444-445, September 2001, New Orleans, Louisiana, United States
|
|
|
Nicola Capuano , Matteo Gaeta , Fabio Gasparetti , Alessandro Micarelli, Holmes: a prototype for the targeted search of information about hi-tech companies, Second international workshop on Intelligent systems design and application, p.233-238, August 07-08, 2002, Atlanta, Georgia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Christopher S. Campbell , Paul P. Maglio , Alex Cozzi , Byron Dom, Expertise identification using email communications, Proceedings of the twelfth international conference on Information and knowledge management, November 03-08, 2003, New Orleans, LA, USA
|
|
Amy McGovern , Lisa Friedland , Michael Hay , Brian Gallagher , Andrew Fast , Jennifer Neville , David Jensen, Exploiting relational structure to understand publication patterns in high-energy physics, ACM SIGKDD Explorations Newsletter, v.5 n.2, December 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rada Mihalcea , Paul Tarau , Elizabeth Figa, PageRank on semantic networks, with application to word sense disambiguation, Proceedings of the 20th international conference on Computational Linguistics, p.1126-es, August 23-27, 2004, Geneva, Switzerland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Baruch Awerbuch , Boaz Patt-Shamir , David Peleg , Mark Tuttle, Collaboration of untrusting peers with changing interests, Proceedings of the 5th ACM conference on Electronic commerce, May 17-20, 2004, New York, NY, USA
|
|
|
|
|
Takahiko Ito , Masashi Shimbo , Taku Kudo , Yuji Matsumoto, Application of kernels to link analysis, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Donghui Feng , Erin Shaw , Jihie Kim , Eduard Hovy, Learning to detect conversation focus of threaded discussions, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.208-215, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
Dong Zhou , Mark Truran , Tim Brailsford , Helen Ashman , James Goulding, Gcon: a graph-based technique for resolving ambiguity in query translation candidates, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
Yu-Ru Lin , Yun Chi , Shenghuo Zhu , Hari Sundaram , Belle L. Tseng, Facetnet: a framework for analyzing communities and their evolutions in dynamic networks, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
Christos Makris , Yannis Panagis , Yannis Plegas , Evangelos Sakkopoulos, An integrated web system to facilitate personalized web searching algorithms, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
Allan Borodin , Gareth O. Roberts , Jeffrey S. Rosenthal , Panayiotis Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, Proceedings of the 10th international conference on World Wide Web, p.415-429, May 01-05, 2001, Hong Kong, Hong Kong
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vlassis Krikos , Sofia Stamou , Pavlos Kokosis , Alexandros Ntoulas , Dimitris Christodoulakis, DirectoryRank: ordering pages in web directories, Proceedings of the 7th annual ACM international workshop on Web information and data management, November 04-04, 2005, Bremen, Germany
|
|
|
|
|
|
Shen Huang , Gui-Rong Xue , Ben-Yu Zhang , Zheng Chen , Yong Yu , Wei-Ying Ma, TSSP: A Reinforcement Algorithm to Find Related Papers, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, p.117-123, September 20-24, 2004
|
|
|
|
|
Yuanhua Lv , Le Sun , Junlin Zhang , Jian-Yun Nie , Wan Chen , Wei Zhang, An iterative implicit feedback approach to personalized search, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, p.585-592, July 17-18, 2006, Sydney, Australia
|
|
Lei Yang , Lei Qi , Yan-Ping Zhao , Bin Gao , Tie-Yan Liu, Link analysis using time series of web graphs, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
|
|
|
|
|
Benyu Zhang , Hua Li , Yi Liu , Lei Ji , Wensi Xi , Weiguo Fan , Zheng Chen , Wei-Ying Ma, Improving web search results using affinity graph, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zheng Chen , Shengping Liu , Liu Wenyin , Geguang Pu , Wei-Ying Ma, Building a web thesaurus from web link structure, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
|
|
|
|
|
|
|
Christian Borgs , Jennifer Chayes , Mohammad Mahdian , Amin Saberi, Exploring the community structure of newsgroups, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yong Yang , Weishi Zhang , Xiuguo Zhang , Jinyu Shi, A weighted ranking algorithm for facet-based component retrieval system, Proceedings of the 2nd IASTED international conference on Advances in computer science and technology, p.274-279, January 23-25, 2006, Puerto Vallarta, Mexico
|
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , Guang Feng , De-Sheng Wang , Wei-Ying Ma, Topic distillation via sub-site retrieval, Information Processing and Management: an International Journal, v.43 n.2, p.445-460, March 2007
|
|
|
|
|
|
|
|
|
|
Hiroo Saito , Masashi Toyoda , Masaru Kitsuregawa , Kazuyuki Aihara, A large-scale study of link spam detection by graph algorithms, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May 08-08, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |