Google Indexed One Trillion unique URLs

Source article

The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. Over the last eight years, the company has seen a lot of big numbers about how much content is really out there. Recently, even their search engineers stopped in awe about just how big the web is these days -- when their systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!

How did Google find all those pages? They started at a set of well-connected initial pages and follow each of their links to new pages. Then the company follows the links on those new pages to even more pages and so on, until they have a huge list of links. In fact, they found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. Even after removing those exact duplicates, Google saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.

So how many unique pages does the web really contain? Google says it doesn't know; because they don't have time to look at them all! :-) Strictly speaking, the number of pages out there is infinite -- for example, web calendars may have a "next day" link, and they could follow that link forever, each time finding a "new" page. Google didn't do that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what's a useful page, and there is no exact answer.

Comments

Be the first to write a comment

You must me logged in to write a comment.