kottke.org home archives + xml about kottke.org contact me
kottke.org - home of fine hypertext products

Technorati is now tracking 1,000,000 weblogs

Technorati is now tracking 1,000,000 weblogs.

Reader Comments
16 comments
Matt Haughey says:

Why hasn't anyone ever really looked at how Technorati determines what is a blog? I don't believe the Technorati numbers myself, I think it's greatly inflated.

Why? Sometimes in technorati results I see every category of a Radio weblog counted as a separate blog. I just dug around about 30 URLs until I found an example. Check out Merlin's cosmos. It says that 64 blogs are pointing 71 links to him, so there should be very few repeat listings, right?

Look down for the person running a Radio blog that is pointing at Merlin's site, called monkinetic weblog. The guy must have a sidebar link to kungfugrippe and by the looks of it he has 13 categories on his Radio blog, which show up as 13 blogs. In this entire list, there should only be 7 doubly listed blogs (Jish is one), but here we have 13 links from a single blog and the numbers don't add up.

This isn't the fault of Radio's design, it's how Sifry coded his algorithm to determine the difference between a blog and other pages of the same blog. For some reason it's not quite right for Radio blogs hosted on their own domains (personally, I've never seen the problem on the userland hosted radio sites).

I bet the counting of livejournal sites may also be wonky, since the URLs aren't that predictable and other pages might be showing up as other blogs.

» by Matt Haughey on Oct 01, 2003 at 02:29 AM
Swami Prem says:

Who was the lucky winner to own the one millionth blog?

» by Swami Prem on Oct 01, 2003 at 02:32 AM
Matt Haughey says:

I think it's greatly inflated.

Actually, I'm probably overdoing it a bit here by saying "greatly" but it could be off by a lot, if there are enough sites with weird URL storage schemes being miscounted (and I don't see why a MT blog couldn't trip the algorithm). I would say it's got to be at least 10% from my personal result tracking, and could be higher depending on how widespread the problem is.

» by Matt Haughey on Oct 01, 2003 at 02:35 AM
David Sifry says:

You're right, Radio is somewhat messed up in that is attempts to count each "category" as a separate blog. We go through and cull the database regularly to pull that crap out. If you continue to see any results that look funky, please send an email to feedback@technorati.com and let us know.

I'm pretty sure the LJ stuff is accurate though, you'd be amazed at how many people are posting over there.

I'm working really hard to make sure that the Technorati database is accurate and clean, but wacky things happen all the time, and to expect 100% accuracy is of course, impossible. But I really believe that the numbers are pretty accurate.

» by David Sifry on Oct 01, 2003 at 03:45 AM
Matt Haughey says:

Radio is somewhat messed up in that is attempts to count each "category" as a separate blog

How does Radio do the separate blog stuff, does Radio ping weblogs.com for each category? When you make a post?

I'm pretty sure the LJ stuff is accurate though

When I was going through a bunch of cosmos looking for good examples of the previous problem, I found some results with a single LJ post listed 5-10 times, but there were so many results I couldn't make it out if they were treated as one blog with many links or many blogs (they all seemed to point at the same URL).

wacky things happen all the time

I noticed that Typepad blogs are counted twice, once for the root URL of foo.typepad.com, then again for the default blog directory, foo.typepad.com/bar (it's the same files in both places).

» by Matt Haughey on Oct 01, 2003 at 04:36 AM
Swami Prem says:

Oh, what about the Typepad blog having a domain name? Does that mean the blog will be counted three times?

» by Swami Prem on Oct 01, 2003 at 06:44 AM
Nick says:

Dave's right that it can be very difficult to filter out "false" Radio weblogs, we've had that problem ourselves. I'm not doubting his assessment of the number of LJ sites either, but that's an area we scratched our heads trying to figure out for some time. The problem with LJ or any of the blog hosting groups is that "failures" of those central servers will often cause a few thousand sites to simultaneously point to some default list of links (I distinctly remember a day when a page from the PHP manual jumped to the top of our 4 hour trends list). I think we've got them under control now however.

Our site is currently tracking around 150,000 weblogs -- no where near the million Technorati's got. I wonder if one of the differences in number is that we delete URLs that don't respond to our robots after a certain number of tries. Typically if a site comes back it finds a way to get added back intot he system. This keeps our database leaner, and keeps our robots reading "actual" pages instead of waiting for errors.

Either way, "about one million" is a nice round number to point to for those of us trying to show how quickly blogging is growing around the world.

» by Nick on Oct 01, 2003 at 09:31 AM
jkottke says:

The million number seems fairly accurate. Maciej's Blog Census puts the number at 1.35 million with an estimate of ~890,000 that are active.

» by jkottke on Oct 01, 2003 at 10:03 AM
megnut says:

I think active is key here, I've noticed a lot of totally dead (I mean haven't been updated since 2000/2001) appearing in Technorati lately.

» by megnut on Oct 01, 2003 at 10:15 AM
megnut says:

Wait, a lot is too strong. A fair amount would be a more accurate statement.

» by megnut on Oct 01, 2003 at 10:15 AM
jim winstead says:

just as another few data points: according to blo.gs, 136,955 blogs have updated in the last week, 272,764 have updated in the last month, 391,042 have updated in the last two months, and 34,753 new blogs have been added in the last week (unfortunately, i haven't been keeping track of that for long).

this includes all blogs that ping weblogs.com, and that show up in the blogger.com changes feed, and a few other sources (and that ping blo.gs directly, of course).

this does almost totally exclude livejournal.com users.

» by jim winstead on Oct 01, 2003 at 11:03 AM
Gene says:

Speaking of active, this Marlow post on churn rate and this Blogcensus follow up have some good information about blog activity. The Blogcensus post shows 5% of their sample had been abandoned (> 52 weeks since the last post). I wonder how Technorati's numbers would compare.

» by Gene on Oct 02, 2003 at 11:50 AM
Blum Valerie says:

Unusual ideas can make enemies.

» by Blum Valerie on Dec 09, 2003 at 07:46 PM
Peterson Lee says:

'May you live all the days of your life.' - Swift

» by Peterson Lee on Dec 10, 2003 at 12:27 PM
Good Heidi says:

The important thing isn't doing, but knowing how you do it.

» by Good Heidi on Dec 10, 2003 at 12:27 PM
Fields Lesley says:

Just because there's a pattern doesn't mean there's a purpose.

» by Fields Lesley on Dec 20, 2003 at 04:10 PM

 
This thread is closed to new comments. Thanks to everyone who responded.

More about this page

This entry was published on September 30, 2003 at 10:59 pm.

kottke.org is a weblog about the liberal arts 2.0 edited by Jason Kottke since March 1998. You can read about me and kottke.org here. If you've got questions, concerns, or an interesting link for me, send them along. Here's the kottke.org RSS feed kottke.org RSS feed.

Advertisement

dot dot dot

Advertise on kottke.org via The Deck.

Looking for work?
kottke.org

You're visiting kottke.org. All content by Jason Kottke (contact me) unless otherwise noted, with some restrictions on its use. Good luck will come to those who dig around in the archives. If you've reached this point by accident, I suggest panic.