Text Practice Mode
The Size of Wikipedia. (Part 1.)
created Oct 3rd 2014, 06:17 by Nehemiah Thomas
0
785 words
3 completed
1
Rating visible after 3 or more votes
saving score / loading statistics ...
00:00
This Wikipedia:Statistics page measures the size of the English-language edition of Wikipedia; mostly page and article count. There are currently 4,614,530 articles in the English Wikipedia.
Most of the earlier entries were extracted from Wikipedia:Announcements. Later entries are taken from observations of the new software's built-in article count features. For information on what Wikipedia's software counts as an article, see Wikipedia:What is an article#Lists of articles and statistics. For more details and discussion of other models, see Wikipedia:Modelling Wikipedia's growth.
Before 2012 Wikipedia's growth approximately followed a Gompertz growth model. This model was created in June 2010, and it is determined by the Gompertz function,
y(t)=ae^{be^{ct}}
with parameters:
a = 4378449
b = −15.42677
c = −0.384124
e = 2.71828 (Euler's number)
t is the time in years since 1/1/2000 (so 1/1/2010 is t = 10.00). Some characteristics of this model are:
a pivot point at which the growth is at its peak. For en.wikipedia.org this might have been in August 2006 with 60,000 new articles per month.
a maximum to the number of articles of about 4.4 million (as determined by parameter a of the model). It should be noted that there will always be new events and people to describe in the future, which this model does not account for.
This model is related to the quantity (number of articles). The quality might still increase independently. The two graph images show: in the first graph, the historical and expected total number of articles; in the second graph, the monthly growth rate, slowing since late 2006 (line sloping downward).
Detailed analysis of the data shows that from 2006 to 2009 the article growth rate followed a six-monthly cycle with faster growth in February and August than in May and November. This cycle does not appear in the growth-rate graph here, because the values shown in the graph have been averaged over periods of six months. 2002-01-01 19,700 19,700 — 54
2003-01-01 96,500 76,800 390% 210
2004-01-01 188,800 92,300 96% 253
2005-01-01 438,500 249,700 132% 682
2006-01-01 895,000 456,500 104% 1251
2007-01-01 1,560,000 665,000 74% 1822
2008-01-01 2,153,000 593,000 38% 1625
2009-01-01 2,679,000 526,000 24% 1437
2010-01-01 3,144,000 465,000 17% 1274
2011-01-01 3,518,000 374,000 12% 1025
2012-01-01 3,835,000 317,000 9% 868
2013-01-01 4,133,000 298,000 8% 814
2014-01-01 4,413,000 280,000 6% 767
2014-10-03 4,614,530 201,530[a] — 733[a]
[a] - Calculated live, so far, as only for partial year. This graph is based on data from http://www.stats.wikimedia.org/EN/TablesArticlesTotal.htm as of June 2, 2007, with recent values for the English Wikipedia taken from the data below. The sum includes all 270+ Wikipedia languages. See the front page at http://www.wikipedia.org for a recent article count for the 10 largest Wikipedias.
The English edition remains the largest Wikipedia, almost three times as large as the second largest edition, the German Wikipedia. Many other editions shared the quasi-exponential growth of the English edition, though lagging one to three years behind. As these other Wikipedias have grown, the overall percentage of articles in English has been steadily decreasing, and it fell below 25% in March 2007. The percentage of articles in the ten largest Wikipedias has also been decreasing, although these top ten still account for about 67% of all Wikipedia articles as of June 2007. This data set notes the fact that these figures are drawn from multiple data sources and different estimates (see the key below for details), and presents them as a spreadsheet-ready table for graphing. The original data sets are archived: see the links below. Note also that the figures are sampled at random times of day. The following tries to illustrate how big the English-language Wikipedia might be if the articles (without images and other multimedia content) were to be printed and bound in book form. Each volume is assumed to be 25 cm tall, 5 cm thick, and containing 1,600,000 words or 8,000,000 characters. The size of this illustration is based upon the live article count. Key to the data below:
approx: this figure is an approximation
lowerbound indicates that there were at least this many pages
mpac3.1: main page article count from the Phase III software since May 25, 2003: article namespace, not redirects, containing at least one internal wiki link
mpacIII: main page article count from the Phase III software up to May 22, 2003: article namespace, comma, not redirect
mpacII: main page article count from the Phase II software
spII: stats page article count from the Phase II software
all: total of all pages of any sort
commapp: pages which include a comma, a crude way of finding "real" articles
conscnt: "conservative count" taken by removing the count of various types of non-article from the comma page count
MF: Malcolm Farmer
LMS: Larry Sanger
WA: Wikipedia:Announcements
Now extended and annotated with (somewhat gnomic) source information. Note that sampling times are only recorded to the day given by the user recording the entry, and that there is no clear time-zone information for that day.
Note: The current mpac3.1 article count for the English-language Wikipedia is 4,614,530 articles
Most of the earlier entries were extracted from Wikipedia:Announcements. Later entries are taken from observations of the new software's built-in article count features. For information on what Wikipedia's software counts as an article, see Wikipedia:What is an article#Lists of articles and statistics. For more details and discussion of other models, see Wikipedia:Modelling Wikipedia's growth.
Before 2012 Wikipedia's growth approximately followed a Gompertz growth model. This model was created in June 2010, and it is determined by the Gompertz function,
y(t)=ae^{be^{ct}}
with parameters:
a = 4378449
b = −15.42677
c = −0.384124
e = 2.71828 (Euler's number)
t is the time in years since 1/1/2000 (so 1/1/2010 is t = 10.00). Some characteristics of this model are:
a pivot point at which the growth is at its peak. For en.wikipedia.org this might have been in August 2006 with 60,000 new articles per month.
a maximum to the number of articles of about 4.4 million (as determined by parameter a of the model). It should be noted that there will always be new events and people to describe in the future, which this model does not account for.
This model is related to the quantity (number of articles). The quality might still increase independently. The two graph images show: in the first graph, the historical and expected total number of articles; in the second graph, the monthly growth rate, slowing since late 2006 (line sloping downward).
Detailed analysis of the data shows that from 2006 to 2009 the article growth rate followed a six-monthly cycle with faster growth in February and August than in May and November. This cycle does not appear in the growth-rate graph here, because the values shown in the graph have been averaged over periods of six months. 2002-01-01 19,700 19,700 — 54
2003-01-01 96,500 76,800 390% 210
2004-01-01 188,800 92,300 96% 253
2005-01-01 438,500 249,700 132% 682
2006-01-01 895,000 456,500 104% 1251
2007-01-01 1,560,000 665,000 74% 1822
2008-01-01 2,153,000 593,000 38% 1625
2009-01-01 2,679,000 526,000 24% 1437
2010-01-01 3,144,000 465,000 17% 1274
2011-01-01 3,518,000 374,000 12% 1025
2012-01-01 3,835,000 317,000 9% 868
2013-01-01 4,133,000 298,000 8% 814
2014-01-01 4,413,000 280,000 6% 767
2014-10-03 4,614,530 201,530[a] — 733[a]
[a] - Calculated live, so far, as only for partial year. This graph is based on data from http://www.stats.wikimedia.org/EN/TablesArticlesTotal.htm as of June 2, 2007, with recent values for the English Wikipedia taken from the data below. The sum includes all 270+ Wikipedia languages. See the front page at http://www.wikipedia.org for a recent article count for the 10 largest Wikipedias.
The English edition remains the largest Wikipedia, almost three times as large as the second largest edition, the German Wikipedia. Many other editions shared the quasi-exponential growth of the English edition, though lagging one to three years behind. As these other Wikipedias have grown, the overall percentage of articles in English has been steadily decreasing, and it fell below 25% in March 2007. The percentage of articles in the ten largest Wikipedias has also been decreasing, although these top ten still account for about 67% of all Wikipedia articles as of June 2007. This data set notes the fact that these figures are drawn from multiple data sources and different estimates (see the key below for details), and presents them as a spreadsheet-ready table for graphing. The original data sets are archived: see the links below. Note also that the figures are sampled at random times of day. The following tries to illustrate how big the English-language Wikipedia might be if the articles (without images and other multimedia content) were to be printed and bound in book form. Each volume is assumed to be 25 cm tall, 5 cm thick, and containing 1,600,000 words or 8,000,000 characters. The size of this illustration is based upon the live article count. Key to the data below:
approx: this figure is an approximation
lowerbound indicates that there were at least this many pages
mpac3.1: main page article count from the Phase III software since May 25, 2003: article namespace, not redirects, containing at least one internal wiki link
mpacIII: main page article count from the Phase III software up to May 22, 2003: article namespace, comma, not redirect
mpacII: main page article count from the Phase II software
spII: stats page article count from the Phase II software
all: total of all pages of any sort
commapp: pages which include a comma, a crude way of finding "real" articles
conscnt: "conservative count" taken by removing the count of various types of non-article from the comma page count
MF: Malcolm Farmer
LMS: Larry Sanger
WA: Wikipedia:Announcements
Now extended and annotated with (somewhat gnomic) source information. Note that sampling times are only recorded to the day given by the user recording the entry, and that there is no clear time-zone information for that day.
Note: The current mpac3.1 article count for the English-language Wikipedia is 4,614,530 articles
