kottke.org home archives + xml about kottke.org contact me
kottke.org - home of fine hypertext products

Blog search still sucks (a little)

Update: I fucked up on this post and you should reread it if you've read it before. After reading this post by Niall Kennedy, I checked and found that I have mentioned or linked to the site for Freakonomics 5 times (1 2 3 4 5), not 13. The other 8 times, I either linked to a post on the Freakonomics blog that was unrelated to the book, had the entry tagged with "freakonomics" (tags are not yet exposed on my site and can't be crawled by search engines), or I used the word "Freakonomists", not "Freakonomics". Bottom line: the NY Times listing is still incorrect, Google and Yahoo picked up all the posts where I actually mentioned "Freakonomics" in the text of the post but missed the 2 links to freakonomics.com, Google Blog Search got 2/3 (& missed the 2 links), Technorati got 1/3 (& missed the 2 links), and IceRocket, Yahoo Blog Search, BlogPulse, & Bloglines whiffed entirely. Steven Levitt would be very disappointed in my statistical fact-checking skills right now. :(

I wish Niall had emailed me about this instead of posting it on his site, but I guess that's how weblogs work, airing dirty laundry instead of trying to get it clean. Fair enough...I've publicly complained about the company he works for (Technorati) instead of emailing someone at the company about my concerns, so maybe he had a right to hit back. Perhaps a little juvenile on both our parts, I'd say. (Oh, and I turned off the MT search thing that Niall used to check my work. I'm not upset he used it, but I'm irritated that it seems to be on by default in MT...I never intended for that search interface to be public.)

------

The NY Times recently released their list of the most blogged about books of 2005. Their methodology in compiling the list:

This list links to a selection of Web posts that discuss some of the books most frequently mentioned by bloggers in 2005. The books were selected by conducting an automated survey of 5,000 of the most-trafficked blogs.

Unsurprisingly, the top spot on the list went to Freakonomics. I remembered mentioning the book several times on my site (including this interview with author Steven Levitt around the release of the book), so I checked out the citations they had listed for it. According to the Times, Freakonomics was cited by 125 blogs, but not once by kottke.org, a site that by any measure is one of the most-visited blogs out there.[1] A quick search in my installation of Movable Type yielded 13 5 mentions of the book on kottke.org in the last 9 months. I had also mentioned Blink, Harry Potter, Getting Things Done, Collapse, The Wisdom of Crowds, The Singularity is Near, and State of Fear, all of which appear in the top 20 of the Times' list and none of which are cited by the Times as having been mentioned on kottke.org in 2005.

I chalked this up to a simple error of omission, but then I started checking around some more. Google's main index returned only three distinct mentions of Freakonomics on kottke.org. Google Blog Search returned two results. Yahoo: 3 results (0 results on Yahoo's blog search). Technorati only found one result (I'm not surprised). Many of the blog search services don't even let you search by site, so IceRocket, BlogPulse, and Bloglines were of no help. (See above for corrections.) I don't know where the Times got their book statistics from, but it was probably from one of these sites (or a similar service).

Granted this is just one weblog[2], which I only checked into because I'm the author, but it's not like kottke.org is hard to find or crawl. The markup is pretty good [3], fairly semantic, and hasn't changed too much for the past two years. The subject in question is not off-topic...I post about books all the time. And it's one of the more visible weblogs out there...lots of links in to the front page and specific posts and a Google PR of 8. So, my point here is not "how dare the Times ignore my popular and important site!!!" but is that the continuing overall suckiness of searching blogs is kind of amazing and embarrassing given the seemingly monumental resources being applied to the task. It's forgivable that the Times would not have it exactly right (especially if they're doing the crawling themselves), but when companies like Technorati and Google are setting themselves up as authorities on how large the blogosphere is, what books and movies people are reading/watching, and what the hot topics online are but can't properly catalogue the most obvious information out there, you've got to wonder a) how good their data really is, and b) if what they are telling us is actually true.

[1] Full disclosure: I am the author of kottke.org.

[2] This is an important point...these observations are obviously a starting point for more research about this. But this one hole is pretty gaping and fits well with what I've observed over the past several months trying to find information on blogs using search engines.

[3] I say only pretty good because it's not validating right now because of entity and illegal character errors, which I obviously need to wrestle with MT to correct at some point. But the underlying markup is solid.

More about this page

This entry was published on December 21, 2005 at 09:26 pm.

Tags for this entry:  weblogs  nytimes  technorati  google  search  freakonomics  movabletype 

kottke.org is a weblog about the liberal arts 2.0 edited by Jason Kottke since March 1998. You can read about me and kottke.org here. If you've got questions, concerns, or an interesting link for me, send them along. Here's the kottke.org RSS feed kottke.org RSS feed.

Advertisement

dot dot dot

Advertise on kottke.org via The Deck.

Looking for work? Tags, tags, tags!

Many posts on kottke.org have been "tagged" with keywords, which activity results in collections of related posts like sports, infoviz, or bestof.

Recently popular tags (last 3 weeks)

swimming   olympics   movies   video   sports   design   trackandfield   photography   lists   free   science   books   tv   food   language

All-time popular tags

movies   photography   books   nyc   science   food   lists   design   business   sports   video   weblogs   music   bestof   art

Some of my favorite tags

photography   economics   lists   bestof   infoviz   food   nyc   firstworldproblems   cities   restaurants   video   timelapse   interviews   language   maps   fashion   nsfw   remix  

Random tags

sunshine   prison   cities   barcade   marypoppins   lifeafterpeople   realestate   cars   fundraising   hosseinderakhshan   fridakahlo   sony   pentagram   movies   im

kottke.org

You're visiting kottke.org. All content by Jason Kottke (contact me) unless otherwise noted, with some restrictions on its use. Good luck will come to those who dig around in the archives. If you've reached this point by accident, I suggest panic.