Study in CACM August Issue Finds Wikipedia Faces No Limits to Growth
Although Wikipedia's Scope Is Increasing, Its Coverage Is Not Deteriorating
The Association for Computing Machinery
Advancing Computing as a Science & Profession
Contact: Virginia Gold
NEW YORK, July 31, 2008 -- A new study published in the August 2008 issue of Communications of the ACM, the Association for Computing Machinery (ACM) flagship magazine, shows that Wikipedia is likely to grow and remain usable. The authors, Diomidis Spinellis and Panagiotis Louridas, identified a growth pattern called preferential attachment, marking the first time this pattern has been studied live on a structure the size of Wikipedia. Wikipedia is the multilingual, Web-based, free content encyclopedia project that is written collaboratively by volunteers from all around the world.
Although there have been many studies on Wikipedia, little attention has been given to the limits to its growth. The authors, from the Athens University of Economics and Business, studied the entire Wikipedia corpus, which includes 485 Gbytes of data, adding up to 1.9 million pages and 28.2 million revisions.
The study considered two possible growth patterns behind Wikipedia's expansion. Either new concepts may be added without having corresponding articles, or the number of new concepts will grow slower than the number of articles. In the first case, Wikipedia's coverage will deteriorate as articles are drowned in an increasing number of undefined concepts. In the second case, Wikipedia's growth may stall. The study’s authors found that Wikipedia sits comfortably between these two extremes.
Using a suite of tools they developed, the authors showed that the ratio of undefined to defined concepts in Wikipedia has been stable over time. Furthermore, they found that articles are added to Wikipedia in a collaborative fashion: Wikipedians often add a new article when they encounter a missing entry. Finally, the two researchers established that Wikipedia grows in a manner similar to that witnessed in a number of different areas, by having new articles linked to the most popular existing articles.
This preferential attachment growth pattern has been used to explain the number of species per genus, the Internet, the World Wide Web, scientific citations, collaboration networks between people, and others. As to how long the process may continue, the authors conclude by citing Jorge Luis Borges's 1946 short story "On Exactitude in Science". The wise men of the empire undertake to create a complete map of it; upon finishing, they realize the map was so big that it coincided with the empire itself.
The article, "The Collaborative Organization of Knowledge," appears in Communications of the ACM Volume 51, Number 8 (August 2008), pp. 68-73. CACM launched its expanded editorial scope and redesigned new format in July, offering readers access to this generation’s most significant leaders and innovators in computing and information technology. The current issue is available online in digital format at http://mags.acm.org/communications/current .
ACM, the Association for Computing Machinery www.acm.org , is the world’s largest educational and scientific computing society, uniting computing educators, researchers and professionals to inspire dialogue, share resources and address the field’s challenges. ACM strengthens the computing profession’s collective voice through strong leadership, promotion of the highest standards, and recognition of technical excellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, career development, and professional networking.