[Corpora-List] The size of Internet in words

From: Serge Sharoff (s.sharoff@leeds.ac.uk)
Date: Tue Jan 20 2004 - 17:22:42 MET

    Does anyone know the size of Internet in terms of words and relative
    to languages? Google shows the number of documents on its front page
    (3,307,998,701 at the time of writing this), there is a comparative
    analysis of the database used by various search engines at:

    Two things that are not known from the statistics: the number of words
    of real text per page and the amount of texts for a given language.

    The first question is partly addressed by an older statistic survey:
    Can we estimate that 6 terabytes per 800 million pages gives the average
    page length to 7.5 KB, or about 1000 words (in English)? So, the size of
    modern Internet would be about 3 terawords, if it was English only. But can
    we trust this and how about its distribution over different languages?


