I'm intrigued by Marc Hedlund's differentiation of Impressionist bloggers from Realist bloggers. My interpretation of this difference (which might not be what Marc meant by it) is that Realist blog posts are self-contained, -explanatory, and -evident entities while a post on an Impressionist blog serves to complement the whole, much like the dots making up a Seurat painting aren't that interesting until you stand back to see the whole thing.
The downside for Impressionist blogs is that their individual posts don't work that well outside of their intended context. If you run across a single post from an Impressionist blog in your River of News, a remixed Yahoo Pipes RSS feed, in del.icio.us, or an item in a Google search results set, it might not make a whole lot of sense. Impressionist blog posts are less likely to get Dugg or bookmarked in del.icio.us or linked around much at all. Fewer incoming links, big or small, to individual pages means fewer pageviews, which makes it more difficult to run an Impressionist blog as a business that relies on advertising revenue. If you look at most of the big blog sites, they're all non-Impressionist blogs. All the sites whose posts are featured on the front page of Digg are non-Impressionist...those posts/articles are designed to float self-contained around the web. The blogosphere is dominated by non-Impressionist blogs and the sort of content they produce...which is sad for me because, like Marc, I value Impressionism in a weblog.
After working on this -- on again and off again, mostly off -- for much too long, I'm pleased to say that a significant chunk of kottke.org now has tags (around 5,100 entries are tagged, out of ~13,000). Right now, the only way to access them is through individual tag pages, but after all the bugs are ironed out, I'll be putting them in different places around the site (front page, main archive page, etc.).
Each tag page lists all the entries1 on the site that are tagged with that particular word...some good examples to start you off are: photography, economics, lists, infoviz, food, nyc, cities, restaurants, video, timelapse, interviews, language, maps, and fashion. Each page also has a list of tags related to that particular tag and further down in the sidebar, you'll find lists of recently popular tags, all-time popular tags, a few favorite tags of mine, and some random tags...lots of stuff to explore.
I've tweaked the design as well: the main column is a little wider, the post metadata look/feel is consistent among short posts and long posts, faint dotted lines now separate all entries, and per-entry tags were added to the post metadata. I'm testing all that out for eventual site-wide use. Questions, comments, bug reports, etc. are welcome...send them on in.
Update: I almost forgot, the nsfw tag.
[1] Not all the entries exactly. Until I figure out how to do some pagination, I've limited the number of entries to 100 for each tag page. The movies page was more than 1 Mb when all the entries were listed. ↩
Heather Armstrong, on meeting her new neighbors and having to explain what she does for a living:
Over the last few weeks several neighbors have stopped by to introduce themselves, and invariably they are older than we are, more established, and have careers in medicine or law. And when they ask what we do, both Jon and I sort of flinch and exchange a quick look that says IT'S YOUR TURN TO LIE. We're web developers, we say, and that is never enough, they just can't leave it alone, and one of us will try to explain that I have a website. This thing. That I do. And because we're being all coy about it I just know, from the very worried expressions on their faces, that these neighbors think that we run a porn site.
This is the exact interaction I have with most people that I've met in the past couple of years, right down to the "we're web developers, we say, and that is never enough, they just can't leave it alone" part. I imagine professional mimes, phone sex operators, and people who make a living selling other people's stuff on eBay have the same sorts of awkward conversations with their new neighbors.
Regarding the Twitter vs. Blogger thing from earlier in the week, I took another stab at the faulty Twitter data. Using some educated guesses and fitting some curves, I'm 80-90% sure that this is what the Twitter message growth looks like:


These graphs cover the following time periods: 8/23/1999 - 3/7/2002 for Blogger and 3/21/2006 - 5/7/2007 for Twitter. It's important to note that the Twitter trend is not comprised of actual data points but is rather a best-guess line, an estimate based on the data. Take it as fact at your own risk. (More specifically, I'm more sure of the general shape of the curve than with the steepness. My gut tells me that the curve is probably a little flatter than depicted rather than steeper.)
That said, most of what I wrote in the original post still holds, as do the comments in subsequent thread. Twitter did not grow as fast as the faulty data indicated, but it did get to ~6,000,000 messages in about half the time of Blogger. Here are the reasons I offered for the difference in growth:
1. Twitter is easier to use than Blogger was and had a lower barrier to entry.
2. Twitter has more ways to update (web, phone, IM, Twitterific) than did Blogger.
3. Blogger's growth was limited by a lack of funding.
4. Twitter had a larger pool of potential users to draw on.
5. Twitter has a built-in social aspect that Blogger did not.
And commenters in the thread noted that:
6. Twitter's 140-character limit encourages more messages.
7. More people are using Twitter for conversations than was the case with Blogger.
What's interesting is that these seeming advantages (in terms of message growth potential) for Twitter didn't result in higher message growth than Blogger over the first 9-10 months. But then the social and network effects (#5 and #7 above) kicked in and Twitter took off.
Since swearing off Technorati a couple of years ago, I've been checking back every few months to see if the situation has improved. The site is definitely more responsive but their data problems seemingly remain, at least with regard to kottke.org; Google Blog Search gives consistently better results and easy access to RSS feeds of searches.
Technorati recently introduced something called the Technorati Authority number, which is a fancy name for the number of blogs linking to a site in the last six months. Curious as to where kottke.org fell on the authority scale, I checked out the top 100 blogs list. Not there, so I proceeded to the "Everything in the known universe about kottke.org" page where a portion of that huge cache of kottke.org knowledge was the authority number: 5,094. Looking at the top 100 list, that should put the site at #47, nestled between The Superficial and fishki.net, but it's not there. Technorati also currently states that kottke.org hasn't been updated in the last day, despite several updates since then and my copy of MT pinging Technorati after each update.
Maybe kottke.org has been intentionally excluded because I've been so hard on them in the past. Or maybe it's just a glitch (or two) in their system. Or maybe it's an indication of larger problems with their service. Either way, as the company is attempting to offer an authentic picture of the blogosphere, this doesn't seem like the type of rigor and accuracy that should send reputable media sources like the BBC, Washington Post, NY Times, and the Wall Street Journal scurrying to their door looking for reliable data about blogs.
Update: As of 3:45pm EST, the top 100 list has been updated to include kottke.org. The site also picked up this post right away, but failed to note a subsequent post published a few minutes later..
This morning I posted a comparison of the growth in messages with both Blogger and Twitter. The Twitter data was based on information collected by Andy Baio in a post that was widely read in the blogosphere. In the course of looking at the Twitter data, neither of us noticed that from Nov 21, 2006 to Feb 4, 2007 and March 9, 2007 to the present, the Twitter post IDs had the same last digit, indicating that the data is not strictly sequential. If you look at Twitter's public timeline, the Twitter post IDs skip around by multiples of 10.
Anil suggested via email that could be an artifact of database sharding and lo and behold, if you take off the last digit of the post ID, they seem to become sequential again, more or less. He's going to ask the Twitter gang about it.
For right now though, the parts of this morning's post that rely on Twitter data from the above dates is incorrect. Basically, all of it. Here it is in all caps: WRONG WRONG WRONG ERROR ERROR, F-----, WOULD NOT BUY DATA ANALYSIS FROM AGAIN. In hindsight, it seems obvious that the data was incorrect...that sort of growth seems impossible, especially when Twitter was having all sorts of scaling problems. Anyway, good thing this is just a blog and not a refereed journal, eh? Big thanks to the commenters in the other post for pointing me toward the error. More as I have it.
Update: Email from Biz Stone, who works for Twitter. He says:
There's truth in the essence of what you're talking about here -- Twitter updates *are* coming in faster and furiouser than Blogger updates. However, the way we number Twitter updates has switched back and forth a few times which pretty much screws up the exactness of your analysis.
We have been doubling the number of active users about every three weeks for a sustained period of months now which is definitely contributing significantly to more and more updates. Also, active users of Twitter a measured by how many times they update per day (at Blogger it was per month). So activity in general at Twitter is crazy by comparison.
We're going to start digging in to more data visualization, user patterns, etc in the coming weeks so if there's anything you think we should be looking at specifically please let us know!
So we'll have to wait a few weeks for an accurate look at this stuff. (thx, biz)
Important update: I've re-evaluated the Twitter data and came up with what I think is a much more accurate representation of what's going on.
Important update: I've re-evaluated the Twitter data and came up with what I think is a much more accurate representation of what's going on.
Further update: The Twitter data is bad, bad, bad, rendering Andy's post and most of this here post useless. Both jumps in Twitter activity in Nov 2006 and March 2007 are artificial in nature. See here for an update.
Update: A commenter noted that sometime in mid-March, Twitter stopped using sequential IDs. So that big upswing that the below graphs currently show is partially artificial. I'm attempting to correct now. This is the danger of doing this type of analysis with "data" instead of data.
--
In mid-March, Andy Baio noted that Twitter uses publicly available sequential message IDs and employed Twitter co-founder Evan Williams' messages to graph the growth of the service over the first year of its existence. Williams co-founded Blogger back in 1999, a service that, as it happens, also exposed its sequential post IDs to the public. Itching to compare the growth of the two services from their inception, I emailed Matt Webb about a script he'd written a few years ago that tracked the daily growth of Blogger. His stats didn't go back far enough so I borrowed Andy's idea and used Williams' own blog to get his Blogger post IDs and corresponding dates. Here are the resulting graphs of that data.1
The first one covers the first 253 days of each service. The second graph shows the Twitter data through May 7, 2007 and the Blogger data through March 7, 2002. [Some notes about the data are contained in this footnote.]


As you can see, the two services grew at a similar pace until around 240 days in, with Blogger posts increasing faster than Twitter messages. Then around November 21, 2006, Twitter took off and never looked back. At last count, Twitter has amassed five times the number of messages than Blogger did in just under half the time period. But Blogger was not the slouch that the graph makes it out to be. Plotting the service by itself reveals a healthy growth curve:

From late 2001 to early 2002, Blogger doubled the number of messages in its database from 5M to 10M in under 200 days. Of course, it took Twitter just over 40 days to do the same and under 20 days to double again to 20M. The curious thing about Blogger's message growth is that large events like 9/11, SXSW 2000 & 2001, new versions of Blogger, and the launch of blog*spot didn't affect the growth at all. I expected to see a huge message spike on 9/11/01 but there was barely a blip.
The second graph also shows that Twitter's post-SXSW 2007 growth is real and not just a temporary bump...a bunch of people came to check it out, stayed on, and everyone messaged like crazy. However, it does look like growth is slowing just a bit if you look at the data on a logarithmic scale:

Actually, as the graph shows, the biggest rate of growth for Twitter didn't occur following SXSW 2007 but after November 21.
As for why Twitter took off so much faster than Blogger, I came up with five possible reasons (there are likely more):
1. Twitter is easier to use than Blogger was. All you need is a web browser or mobile phone. Before blog*spot came along in August 2000, you needed web space with FTP access to set up a Blogger blog, not something that everyone had.
2. Twitter has more ways to create a new message than Blogger did at that point. With Blogger, you needed to use the form on the web site to create a post. To post to Twitter, you can use the web, your phone, an IM client, Twitterrific, etc. It's also far easier to send data to Twitter programatically...the NY Times account alone sends a couple dozen new messages into the Twitter database every day without anyone having to sit there and type them in.
3. Blogger was more strapped for cash and resources than Twitter is. The company that built Blogger ran out of money in early 2001 and nearly out of employees shortly after that. Hard to say how Blogger might have grown if the dot com crash and other factors hadn't led to the severe limitation of its resources for several key months.
4. Twitter has a much larger pool of available users than Blogger did. Blogger launched in August 1999 and Twitter almost 7 years later in March 2006. In the intervening time, hundreds of millions of people, the media, and technology & media companies have become familiar and comfortable with services like YouTube, Friendster, MySpace, Typepad, Blogger, Facebook, and GMail. Hundreds of millions more now have internet access and mobile phones. The potential user base for the two probably differed by an order of magnitude or two, if not more.
5. But the biggest factor is that the social aspect of Twitter is built in and that's where the super-fast growth comes from. With Blogger, reading, writing, and creating social ties were decoupled from each other but they're all integrated into Twitter. Essentially, the top graph shows the difference between a site with social networking and one largely without. Those steep parts of the Twitter trend on Nov 21 and mid-March? That's crazy insane viral growth2, very contagious, users attracting more users, messages resulting in more messages, multiplying rapidly. With the way Blogger worked, it just didn't have the capability for that kind of growth.
A few miscellaneous thoughts:
It's important to keep in mind that these graphs depict the growth in messages, not users or web traffic. It would be great to have user growth data, but that's not publicly available in either case (I don't think). It's tempting to look at the growth and think of it in terms of new users because the two are obviously related. More users = more messages. But that's not a static relationship...perhaps Twitter's userbase is not increasing all that much and the message growth is due to the existing users increasing their messaging output. So, grain of salt and all that.
What impact does Twitter's API have on its message growth? As I said above, the NY Times is pumping dozens of messages into Twitter daily and hundreds of other sites do the same. This is where it would be nice to have data for the number of active users and/or readers. The usual caveats apply, but if you look at the Alexa trends for Twitter, pageviews and traffic seem to leveling out. Compete, which only offers data as recently as March 2007, still shows traffic growing quickly for Twitter.
Just for comparison, here's a graph showing the adoption of various technologies ranging from the automobile to the internet. Here's another graph showing the adoption of four internet-based applications: Skype, Hotmail, ICQ, and Kazaa (source: a Tim Draper presentation from April 2006).
[Thanks to Andy, Matt, Anil, Meg, and Jonah for their data and thoughts.]
[1] Some notes and caveats about the data. The Blogger post IDs were taken from archived versions of Evhead and Anil Dash's site stored at the Internet Archive and from a short-lived early collaborative blog called Mezzazine. For posts prior to the introduction of the permalink in March 2000, most pages output by Blogger didn't publish the post IDs. Luckily, both Ev and Anil republished their old archives with permalinks at a later time, which allowed me to record the IDs.
The earliest Blogger post ID I could find was 9871 on November 23, 1999. Posts from before that date had higher post IDs because they were re-imported into the database at a later time so an accurate trend from before 11/23/99 is impossible. According to an archived version of the Blogger site, Blogger was released to the public on August 23, 1999, so for the purposes of the graph, I assumed that post #1 happened on that day. (As you can see, Anil was one of the first 2-3 users of Blogger who didn't work at Pyra. That's some old school flavor right there.)
Regarding the re-importing of the early posts, that happened right around mid-December 1999...the post ID numbers jumped from ~13,000 to ~25,000 in one day. In addition to the early posts, I imagine some other posts were imported from various Pyra weblogs that weren't published with Blogger at the time. I adjusted the numbers subsequent to this discontinuity and the resulting numbers are not precise but are within 100-200 of the actual values, an error of less than 1% at that point and becoming significantly smaller as the number of posts grows large. The last usable Blogger post ID is from March 7, 2002. After that, the database numbering scheme changed and I was unable to correct for it. A few months later, Blogger switched to a post numbering system that wasn't strictly sequential.
The data for Twitter from March 21, 2006 to March 15, 2007 is from Andy Baio. Twitter data subsequent to 3/15/07 was collected by me. ↩
[2] "Crazy insane viral growth" is a very technical epidemiological term. I don't expect you to understand its precise meaning. ↩
I'm working on a longish post for later today (or early tomorrow) about this graph:

More soon.
Update: The long post is done...the above graph is (roughly) the growth of Blogger (in orange) to the growth of Twitter (in blue).
The NY Times published an article this morning on the efforts to develop a code of conduct for online discourse. The code is a reaction to recent comments made about blogger Kathy Sierra. Three things bother me about the proposed rules.
We take responsibility for our own words and for the comments we allow on our blog.
I don't want to take one bit of responsibility for someone else's words. A person's words are their own. By taking responsibility for them, you open yourself up to all sorts of problems, mostly legal in nature. Why should someone get sued for slander or libel because someone else posts something on your site? Of course, I also believe that Google isn't responsible for people posting copyrighted videos to YouTube, that Napster wasn't responsible for people trading copyrighted material via its service, and that ISPs aren't responsible for what their customers publish to the web.
We do not allow anonymous comments.
There has to be a mechanism for anonymous comments, even if they need to be approved before being posted. As the EFF says, "anonymous communications have an important place in our political and social discourse".
The missing piece in this discussion so far is: who's going to police all this misconduct? Punishing the offenders and erasing the graffiti is the easy part...fostering "a culture that encourages both personal expression and constructive conversation" is much more difficult. Really fucking hard, in fact...it requires near-constant vigilance. If I opened up comments on everything on kottke.org, I could easily employ someone for 8-10 hours per week to keep things clean, facilitate constructive conversation, coaxing troublemakers into becoming productive members of the community, etc. Both MetaFilter and Flickr have dedicated staff to perform such duties...I imagine other community sites do as well. If you've been ignoring all of the uncivility on your site for the past 2 years, it's going to be difficult to clean it up. The social patterns of your community's participants, once set down, are difficult to modify in a significant way.
For now, my blogger code remains "B9 d+ t+ k++ s u= f++ i o x+ e++ l- c--".
Marc Hedlund, founder of the intriguing Wesabe, recently made this interesting observation:
One of my favorite business model suggestions for entrepreneurs is, find an old UNIX command that hasn't yet been implemented on the web, and fix that. talk and finger became ICQ, LISTSERV became Yahoo! Groups, ls became (the original) Yahoo!, find and grep became Google, rn became Bloglines, pine became Gmail, mount is becoming S3, and bash is becoming Yahoo! Pipes. I didn't get until tonight that Twitter is wall for the web. I love that.
A slightly related way of thinking about how to choose web projects is to take something that everyone does with their friends and make it public and permanent. (Permanent as in permalinked.) Examples:
- Blogger, 1999. Blog posts = public email messages. Instead of "Dear Bob, Check out this movie." it's "Dear People I May or May Not Know Who Are Interested in Film Noir, Check out this movie and if you like it, maybe we can be friends."
- Twitter, 2006. Twitter = public IM. I don't think it's any coincidence that one of the people responsible for Blogger is also responsible for Twitter.
- Flickr, 2004. Flickr = public photo sharing. Flickr co-founder Caterina Fake said in a recent interview: "When we started the company, there were dozens of other photosharing companies such as Shutterfly, but on those sites there was no such thing as a public photograph -- it didn't even exist as a concept -- so the idea of something 'public' changed the whole idea of Flickr."
- YouTube, 2005. YouTube = public home videos. Bob Saget was onto something.
Not that this approach leads naturally to success. Several companies are exploring music sharing (and musical opinion sharing), but no one's gotten it just right yet, due in no small measure to the rights issues around much recorded music.
As I mentioned the other day, I recently joined Twitter. I've been poking around its nooks and crannies ever since. Here are some observations, presented in Twitter-sized chunks:
Playing with Twitter reminds me of blogging circa 2000. Back then, all weblogs were personal in nature and most people used them to communicate with their friends and family. If I wanted to know what my friends were up to back then, I read their blogs. Now I follow Twitter (and Flickr and Vox).
The reaction to Twitter mirrors the initial reaction to weblogs...the same tired "this is going to ruin the web" and "who cares what you ate for dinner" arguments.
Also like blogs, everyone has their own unique definition of what Twitter is (stripped down blogs, public IM, Dodgeball++, etc.), and to some extent, everyone is correct. Maybe that's when you know how you've got a winner: when people use it like mad but can't fully explain the appeal of it to others. See also: weblogs, Flickr.
For people with little time, Twitter functions like an extremely stripped-down version of MySpace. Instead of customized pages, animated badges, custom music, top 8 friends, and all that crap, Twitter is just-the-facts-ma'am: where are my friends and what are they up to?
Twitter's like Flickr without the images.
When one thing (i.e. Twitter) is easier than something else (i.e. blogging) and offers almost the same benefits, people will use it.
Twitter brings back the "type words in one box and press submit" thing that made Blogger so popular back in the day. Compare with current blogging systems. To publish a post in MT, I've got to fiddle with 7-9 different text boxes and options. See immediately above.
Let's not forget Dodgeball here, which was used extensively at SXSW in 2006. (In other words, all the Twittering at SXSW 2007 was not unprecedented. Chill.) It's more focused on location and SMS though...by allowing updates in more ways and being more flexible about the type of message allowed, Twitter is attractive to a wider group of people.
If your friends are not on Twitter, I can't imagine it would be that interesting.
Twitterholic tracks the top 100 Twitter users in terms of followers. I know, let's not turn absolutely everything on the web into a popularity contest!! We already know Scoble is a big blowhard and has weak ties to lots of people...let's move on, shall we?
I wonder what the average number of followers per person is? The folks with 5 zillion followers get all the attention, but as with blogging, those posting updates for their 20 friends form the bulk of the activity.
Lists of friends and followers are presented alphabetically. Does Anil attract more friends, on average, than Veen because he always shows up near the top of the listings?
I can see why Obvious dropped Odeo for Twitter. With podcasts, you've got all that data locked up in binary format (no easy cut-and-paste) and it takes you 20 listening minutes before you can react to it (by commenting, by linking, etc.). With blogs, the reaction time to a post is 1-2 minutes, with Flickr it's 5 seconds, and Twitter is 2-3 seconds. The barrier to entry for reacting to and remixing podcasts is just so much higher.
Twitter is the first thing on the web that I've been excited about in ages. Like years. The last thing was probably Flickr. (Talk about burying the lede.) It's just so damn simple but useful. Again, reminds me of weblogs in that way.
If you're on a Mac and using Twitter, download Twitterific, a little app that sits on your desktop and displays updates from your friends. My only complaint: it doesn't completely show updates, forcing you to the web to read the last 2-3 words of a longish message. Come on...it's only 140 characters, show them all!
Twittermap displays recent Twitter messages on Google Maps. All you do is send Twitter a message with your location -- like so...the "L:10003" is the important part -- and Twittermap will pick it up.
Even more mesmerizing is Twittervision...a world tour of recent Twitter messages. Just sit back and watch the updates come in one at a time, displayed on a world map. (This is in beta and Twitter's having some downtime issues right now, so the data may be less than fresh when you go.)
Twitter seems to work equally well for busy people and not-busy people. It allows folks with little time to keep up with what their friends are up to without having to email and IM with them all day. Those with a lot of time on their hands can spend a lot of time finding new people to follow, having back-and-forths with friends all day, and updating their status 40 times a day. Too many web apps fail because they only appeal to those with abundant free time.
I'm fascinated to see where Obvious takes this app once they get their scaling issues under control.
The default display of recent messages plus your own messages is genius. Makes it feel more like a conversation. The "with friends" display is great too...perfect for discovering other people to follow.
"Friends" still isn't the right word.
Tyler Cowen:
Blogging makes us more oriented toward an intellectual bottom line, more interested in the directly empirical, more tolerant of human differences, more analytical in the course of daily life, more interested in people who are interesting, and less patient with Continental philosophy.
The cover story of the December 9th issue of Science News, The Predator's Gaze, is about psychopathy. The whole article is worth a read, but the brief description of psychopathy at the beginning got me thinking about something that Anil Dash wrote the other day. He highlighted a review of a B&B made by a potential guest that was upset that his many attempts to persuade the owners to accept his expired gift certificate. Anil labeled this person a sociopath:
As a public service, I offer you my analysis. This quote is how you can tell this guy is a sociopath. Not that he merely went online and vented to random strangers about his greediness. No, rather, that he was willing to concede his own willful ignorance (or illiteracy?) while complaining. The web is littered with these chuckleheads who point out their own sociopathic behavior while complaining about others.
At dinner the other night, a group of us were talking about a particularly irksome message board contributor and the subject of sociopathy came up again. This particular person seemed to be oblivious to the rules of the board, didn't pick up on the social cues of other participants or moderators to modify his behavior, and was making public personal attacks against others while complaining that others were doing the same to him, even though they were not. Anyone who runs a community site, has comments on their blog, or participates on a message board knows this guy -- and it usually is a guy. He's the fly in everyone else's ointment, screaming in the middle a quiet conversation, and then says things like "if you hate me, I must be doing something right".
With that in mind, some quotes from the Science News article:
Psychopaths lack a conscience and are incapable of experiencing empathy, guilt, or loyalty.
People with psychopathy don't modify behaviors for which they're punished and don't learn to avoid actions that harm others, Blair proposes in the September Cognition. As a result, they fail to develop a moral sense, in his view. Blair's theory fits with previous observations that psychopaths have difficulty learning to avoid punishments, show weak physiological responses to threats, and don't often recognize sadness or fear in others.
He views psychopathic personalities as the product of an attention deficit. Psychopaths focus well on their explicit goals but ignore incidental information that provides perspective and guides behavior, Newman holds. Most other people, as they take action, unconsciously consult such information, for instance, rules of conduct in social settings and nonverbal signs of discomfort in those around them.
Sounds a lot like the fellow we were discussing at dinner. I don't think most of the people that demonstrate antisocial behavior in comment threads are actually psychopaths or sociopaths (there is a difference) in real life. Rather, interacting via text strips out so much social context and "incidental information" that causes some people to display psychopathic behavior online and fail to develop an online moral sense.
Thinking about disruptive commenters in this way presents an interesting challenge. According to the article, psychopathy seems to be genetic in nature and curing people of this extreme antisocial behavior can be difficult. An Australian study cited in the article found that boys with behavioral problems reacted better to rewards for good behavior than to punishments for bad behavior. Maybe looking for ways to reward bad online community members for their good behavior as well as trying to replace some of the stripped away social context is the way forward. (A quick idea for replacing some social context: add a graphic of eyes to the text-posting interface?)
Jonah Peretti, late of Eyebeam and currently of Huffington Post, and his fine team have launched Buzzfeed. From the about page:
BuzzFeed distinguishes what is actually interesting from what is merely hyped. We only feature movies, music, fashion, ideas, technology, and culture that are on the rise and worth your time.
The content territory that Buzzfeed aims to fill is an interesting one. The site is not Digg with 125 new items to read on the front page every day. Neither is it an historical record of what people thought was interesting at a certain point in time. It's more like a water cooler conversation with velocity, a moving snapshot of what the media and blogosphere is talking about. As a result, the stuff you see on Buzzfeed is not the absolute newest, freshest thing...there's no truly breaking news on the site because to have buzz around something, people already need to be talking about it somewhere. But unless you're completely obsessive about keeping up with everything going on in all corners of the world, it's likely that Buzzfeed will show you something new and interesting every day, especially if it's in an area you don't normally pay attention to. That's the goal, anyway.
I think it's a great approach, an attempt to cut through a bit of the hype and look past the memes you might chuckle at and then completely forget about and instead, as the about page says, "aggregate authentic excitement that captures what real people are saying about the things they find most interesting". The Borat trend is an example of something that really works with this approach. Unlike most films released these days, there's a surprising number of different things around Borat to talk about. There's the movie itself. There's the surprise popularity of it. And the almost universal great reviews. Then came the lawsuits. Now there's a bit of a backlash. And there's the Snakes on a Plane angle...Borat is a movie that succeeded through viral marketing where SoaP largely failed. A bit of something for everyone there, even for the hardcare Borat fan.
Warning Disclosure: I am an advisor to Buzzfeed.
If you've ever used any of the various menu sites out there, you may have noticed that the menus are occasionally not as up-to-date or complete as they could be. A typical response in the blogosphere to a situation like this is to fire off a snarky missive about how menu sites suck, wish harm on the site's owners and their children, and why don't they just die already, those sucking bastards, and basically overreact in such a way as to make the writer feel temporarily better and all but ensures that nothing constructive comes of it.
Since its launch last year, I've admired the tone of Eater, a site about New York city food and dining. The site strikes the right balance between criticism, enthusiasm, insider knowledge, and detatched reportage while covering a topic where too much of any one of these is deadly for the reader. Last week, Eater took note of the menu site situation, but instead of just complaining, they went looking for some evidence and reported the results:
Last week, Eater began an exhaustive investigative series called MenuGate. For those who think we'd forgotten about it, ten-hut. Tomorrow morning, we'll be conducting a SPOT INSPECTION of the major menu site players, then scoring them on how accurate (or inaccurate) their menus are. The benchmark will be the menu that's freely available, at this very moment, on the restaurant's official website.
In canning the snark, offering fair criticism, and letting the results speak for themselves, Eater made it possible for the menu sites to respond in a congenial fashion:
We saw you chose 11 Madison Park this morning to do a menu comparison and our menu was out of date. To be fair, we waited to let you investigate the differences before we updated the menu, even though we noticed the menu had changed. In any event, now that you've written your piece, we have updated the menu as we do for restaurants everyday. We have a team specifically assigned to update menus and we receive user submissions as well to let us know about restaurant changes.
The end result? The situation improved for everyone. A small improvement perhaps, but MenuGate is an ongoing Eater feature so we can expect future improvements. And perhaps when the menu sites get tired of taking their lumps each time around, MenuGate may lead them to think of better ways to keep their menus up-to-date and useful. Anil Dash wrote a post two years ago about how bloggers could take positive action against "Stuff That Sucks":
I'm proud of what [bloggers have] done in creating so many different weblog communities, and I don't want our legacy to be one of having the positives overshadowed by our frequent, though understandable, tendency to be unkind or uncivil to those we're communicating with.
The way Eater has approached the menu sites issue is certainly a good example of what Anil was talking about. Good show.
Six Apart recently launched a preview version of their new Vox blogging service. When you log in to Vox, one of the first things you notice on the front page is the Question of the Day followed by a quick posting box. Answer the question, press "continue", and you've got yourself a blog post. I asked Six Apart president Mena Trott how the feature came about.
Jason: Everyone loves the Question of the Day feature on Vox. The QotD cleverly formalizes the memes that travel through LiveJournal and the blogosphere at large, making it OK for the kind of people who hate email joke forwards to participate collectively in something on a regular basis. Who is responsible for generating these questions? Are they recycled memes from LJ or do you have some meme genius working for 6A?
Mena: Question of the Day actually started in a design comp I did -- meaning it hadn't been specified in any product requirements docs. I was creating the Vox dashboard and realized that the one thing really missing from the page was a call to action. So, I tried to think what would be the one thing that would make me want to post and the Question of the Day made total sense.
You're exactly correct in saying that we're wanting to legitimize the behavior we've seen in email (forwards). It's all about trying to figure out the behavior that would make my mom feel comfortable posting or make someone not feel overwhelmed by a big white posting box.
If you remember the Four Things meme that floated around a couple months ago, you'll recall that this simple meme got people (like me) to post on their blogs after significant absences. We wanted to capture that sort of motivator.
And of course, LiveJournal is the inspiration for all of this.
As far as who creates the questions, we have a scratchpad that is generated by various members of the staff as well as suggestions that come in from our feedback forms. We're still in such an early stage of Vox that these questions are evolving daily. One thing we've seen, however, is that the two topics that people most like to answer questions about are nostalgia (favorite childhood candy, childhood fears, etc...) and media-based (favorite movie, song that makes you happy, anything television).
Some questions, surprisingly bomb in an unexpected way. In April, I posed the question "If you had a time machine and could travel anywhere in time, where would you go and why?" It's a difficult question for those who don't obsess about time travel as much as I do. And, I have to admit, I made it question of the day since *I* had my own answer. Still, I'd love to try this one again now that more people are in Vox.
--
Thanks, Mena. Sometimes it's these little things, tiny addictive hooks, that make the difference between a product taking off, and Vox's QotD is a nice hook indeed. (Also, I'm totally with you on the time travel question.)
Update: Mena posted some more info about the QotD on Vox.
Some dreams deserve to be immortalized on tshirts:


He can feel the anger in my voice, so he immediately tries to calm me down. "I'm trying to explain the differences between MySQL and Perl to my friend," he answers as if that were the most logical thing to ever come out of his mouth.
"You're friends with Gisele Bundchen?" I ask.
"Well, yeah," he says. "I met her on a WordPress message board a few months ago."
My whole world does a sort of belly flop, and I start to get a little dizzy because what I used to think was right-side-up is now turned on its head. "That's not okay," I say to him.
"What do you mean it's not okay?" he asks. "We're talking about databases, for crying out loud."
Apologies to Mike for beating him to the punch.
ps. Sorry, you can't actually order the shirts. I've offered Heather the design if she wants to do so at some point.
Upon my return to civilization last week, Greg Knauss wrote up some thoughts he had after doing the remaindered links here for two weeks. His thoughts, reproduced in full:
Over the past two weeks, David Jacobs, Anil Dash and I have attempted to reproduce (in some halting way) Jason Kottke, while the actual Jason Kottke was in rehab on his honeymoon. The attempt, on my part at least, has been an abject failure. Or haven't you noticed all the crappy links with "GK" at the end of them? Go-kart magazines? What the hell?
Like most of the disasters I've had a hand in, I've got a theory that both explains what happened and exonerates me. Ducking responsibility sounds better if you put on academic airs about it.
The theory: There are two kinds of bloggers, referential and experiential. Kottke is one. I, now two weeks too late in realizing this, am another.
The referential blogger uses the link as his fundamental unit of currency, building posts around ideas and experiences spawned elsewhere: Look at this. Referential bloggers are reporters, delivering pointers to and snippets of information, insight or entertainment happening out there, on the Intraweb. They can, and do, add their own information, insight and entertainment to the links they unearth -- extrapolations, juxtapositions, even lengthy and personal anecdotes -- but the outward direction of their focus remains their distinguishing feature.
The experiential blogger is inwardly directed, drawing entries from personal experience and opinion: How about this. They are storytellers (and/or bores), drawing whatever they have to offer from their own perspective. They can, and do, add links to supporting or explanatory information, even unique and undercited external sources. But their motivation, their impetus, comes from a desire to supply narrative, not reference it.
There's nothing here to imply that one type of blogger is better than the other. There are literally thousands -- OK, hundreds... OK, at least a dozen -- of both kinds that are valuable additions to the on-going conversation/food-fight/furry-cuddle that is the Internet. My point is that Jason Kottke is a very, very good referential blogger and I am a very, very bad one. And I'm sure I wouldn't have trouble finding a link that expresses this sentiment (many, many times over, with varying degrees of vehemence), but I'd rather say it from my own experience:
Welcome back, Jason. You've been missed.
After reading Greg's thoughts, Meg reminded me that Rebecca Blood had made a distinction between filter-style and journal-style bloggers in Weblogs: A History and Perspective. If you want to generalize outside the realm of weblogs, they're both talking about the difference between writers and editors1.
At a party a couple of years ago, I was talking to Nick Denton and he was puzzled by the number of bloggers who were getting book deals and told me that "the natural upgrade path for bloggers is from blogging to editing, not to writing". As Greg and Rebecca note, that doesn't apply to everyone, but it sure describes what I do here. kottke.org has always been more edited than written. I've never particularly thought of myself as a writer (I get by, but I wish I were better), but I do pay a lot of attention to how the writing is presented and contextualized...how the overall package "feels".
[1] And if you want to go even further out on the metaphorical gangplank here, the writer/editor dichotomy compares well to that of the musician/DJ. ↩
After linking to a particularly active thread on a politics blog, Chris asks:
What is the record for the most amount of comments left on a blog?
The Matrix Reloaded thread (it actually spans two threads because MT was beginning to buckle under the pressure) got 1767 comments in six months. MetaFilter's longest thread has 1729 comments. I've seen 1000+ comment threads on Dooce and political blogs like Daily Kos probably have 1000+ comments threads all the time. This Engadget thread has 3324 comments. Slashdot's thread on the end of the 2004 Presidential election garnered 5687 comments. (This SpyMac forum thread seems to have about 167,000 comments, but it's not a blog and seems like cheating because it was an attempt at the longest thread ever.)
Any other contenders? Digg? Huffington Post?
Glenn Reynolds makes an interesting analogy about journalism and beer making in his new book:
Without formal training and using cheap equipment, almost anyone can do it. The quality may be variable, but the best home-brews are tastier than the stuff you see advertised during the Super Bowl. This is because big brewers, particularly in America, have long aimed to reach the largest market by pushing bland brands that offend no one. The rise of home-brewing, however, has forced them to create "micro-brews" that actually taste of something. In the same way, argues Mr Reynolds, bloggers--individuals who publish their thoughts on the internet--have shaken up the mainstream media (or MSM, in blogger parlance).
What, no "drunk on power" quip? Curiously, the Economist piece fails to mention the name of Reynolds' book, An Army of Davids, although it appears over in the right sidebar, almost camouflaged as an ad.
Benford's Law describes a curious phenomenon about the counterintuitive distribution of numbers in sets of non-random data:
A phenomenological law also called the first digit law, first digit phenomenon, or leading digit phenomenon. Benford's law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability ~30%, much greater than the expected 11.1% (i.e., one digit out of 9). Benford's law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881). While Benford's law unquestionably applies to many situations in the real world, a satisfactory explanation has been given only recently through the work of Hill (1996).
I first heard of Benford's Law in connection with the IRS using it to detect tax fraud. If you're cheating on your taxes, you might fill in amounts of money somewhat at random, the distribution of which would not match that of actual financial data. So if the digit "1" shows up on Al Capone's tax return about 15% of the time (as opposed to the expected 30%), the IRS can reasonably assume they should take a closer look at Mr. Capone's return.
Since I installed Movable Type 3.15 back in March 2005, I have been using its "post to the future" option pretty regularly to post my remaindered links...and have been using it almost exclusively for the last few months[1]. That means I'm saving the entries in draft, manually changing the dates and times, and then setting the entries to post at some point in the future. For example, an entry with a timestamp like "2006-02-20 22:19:09" when I wrote the draft might get changed to something like "2006-02-21 08:41:09" for future posting at around 8:41 am the next morning. The point is, I'm choosing basically random numbers for the timestamps of my remaindered links, particularly for the hours and minutes digits. I'm "cheating"...committing post timestamp fraud.
That got me thinking...can I use the distribution of numbers in these post timestamps to detect my cheating? Hoping that I could (or this would be a lot of work wasted), I whipped up a MT template that produced two long strings of numbers: 1) one of all the hours and minutes digits from the post timestamps from May 2005 to the present (i.e. the cheating period), 2) and one of all the hours and minutes digits from Dec 2002 - Jan 2005 (i.e. the control group). Then I used a PHP script to count the numbers in each string, dumped the results into Excel, and graphed the two distributions together. And here's what they look like, followed by a table of the values used to produce the chart:

| Digit | 5/05-now | 12/02-1/05 | Difference |
| 1 | 31.76% | 33.46% | 1.70% |
| 2 | 11.76% | 14.65% | 2.89% |
| 3 | 10.30% | 9.96% | 0.34% |
| 4 | 10.44% | 9.58% | 0.86% |
| 5 | 10.02% | 10.52% | 0.51% |
| 6 | 4.83% | 5.40% | 0.57% |
| 7 | 5.66% | 4.96% | 0.70% |
| 8 | 7.62% | 4.65% | 2.97% |
| 9 | 7.60% | 6.81% | 0.79% |
As expected, 1 & 2 show up less than they should during the cheating period, but not overly so[2]. The real fingerprint of the crime lies with the 8s. The number 8 shows up during the cheating period ~64% more than expected. After thinking about it for awhile, I came up with an explanation for the abundance of 8s. I often schedule posts between 8am-9am so that there's stuff on the site for the early-morning browse and I usually finish off the day with something between 6pm-7pm (18:00 - 19:00). Not exactly the glaring evidence I was expecting, but you can still tell.
The obvious next question is, can this technqiue be utilized for anything useful? How about detecting comment, trackback. or ping spam? I imagine IPs and timestamps from these types of spam are forged to at least some extent. The difficulties are getting enough data to be statistically significant (one forged timestamp isn't enough to tell anything) and having "clean" data to compare it against. In my case, I knew when and where to look for the cheating...it's unclear if someone who didn't know about the timestamp tampering would have been able to detect it. I bet companies with services that deal with huge amounts of spam (Gmail, Yahoo Mail, Hotmail, TypePad, Technorati) could use this technique to filter out the unwanted emails, comments, trackbacks, or pings...although there's probably better methods for doing so.
[1] I've been doing this to achieve a more regular publishing schedule for kottke.org. I typically do a lot of work in the evening and at night and instead of posting all the links in a bunch from 10pm to 1am, I space them out over the course of the next day. Not a big deal because increasing few of the links I feature are time-sensitive and it's better for readers who check back several times a day for updates...they've always got a little something new to read.
[2] You'll also notice that the distributions don't quite follow Benford's Law either. Because of the constraints on which digits can appear in timestamps (e.g. you can never have a timestamp of 71:95), some digits appear proportionally more or less than they would in statistical data. Here's the distribution of digits of every possible time from 00:00 to 23:59:
1 - 25.33
2 - 17.49
3 - 12.27
4 - 10.97
5 - 10.97
6 - 5.74
7 - 5.74
8 - 5.74
9 - 5.74
Wes Felter calls for the ass fact-checking of William Safire over the latter's article in the NY Times about blog jargon and he's not wrong. Wes correctly notes the etymology of "weblog" and "blog" and hopefully the people responsible for things like the AP Style Guide, English dictionaries, and influential columns like On Language will, at some point, do the 20 minutes of research necessary to convince them and the unwashed journalist masses that "blog" is not and was never short for "web log".
Safire also gets tripped up on where the word "blogosphere" came from. While William Quick's usage in 2002 popularized the term, Brad Graham first used the term in 1999.
In 2002, Dave Winer of Scripting News and Martin Nisenholtz of the New York Times made a Long Bet about the authority of weblogs versus that of NY Times in Google:
In a Google search of five keywords or phrases representing the top five news stories of 2007, weblogs will rank higher than the New York Times' Web site.
I decided to see how well each side is doing by checking the results for the top news stories of 2005. Eight news stories were selected and an appropriate Google keyword search was chosen for each one of them. I went through the search results for each keyword and noted the positions of the top results from 1) "traditional" media, 2) citizen media, 3) blogs, and 4) nytimes.com. Finally, the scores were tallied and an "actual" winner (blogs vs. nytimes.com) and an "in-spirit" winner (any traditional media source vs. any citizen media source) were calculated. (For more on the methodology, definitions, and caveats, read the methodology section below.)
So how did the NY Times fare against blogs? Not very well. For eight top news stories of 2005, blogs were listed in Google search results before the Times six times, the Times only twice. The in-spirit winner was traditional media by a 6-2 score over citizen media. Here the specific results:
1) Hurricane Katrina hits New Orleans.
Search term: "hurricane katrina"
3. Top citizen media result (Wikipedia)
13. Top media result (CNN)
56. Top NY Times mention (NY Times).
61. Top blog result (Kaye's Hurricane Blog)
Winner (in spirit): Citizen media
Winner (actual): NY Times
2) Big changes in the US Supreme Court (Rhenquist dies, O'Conner retires, Roberts appointed Chief Justice, Harriet Miers rejected).
Search term: "harriet miers"
4. Top media result (Washington Post)
5. Top citizen media result (Wikipedia)
8. Top NY Times mention (NY Times)
11. Top blog result (TalkLeft)
Winner (in spirit): Media
Winner (actual): NY Times
3) Terrorists bomb London, killing 52.
Search term: "london bombing"
1. Top media result (CNN)
2. Top citizen media result (Wikipedia)
21. Top blog result Schneier on Security
No NY Times article appears in the first 100 results.
Winner (in spirit): Media
Winner (actual): Blogs
4) First elections in Iraq after Saddam.
Search term: "iraq election"
1. Top media result (BBC News)
6. Top blog result (Iraq elections newswire)
6. Top citizen media result (Iraq elections newswire)
14. Top NY Times mention (NY Times)
Winner (in spirit): Media
Winner (actual): Blogs
5) Terri Schiavo legal fight and death.
Search term: "terri schiavo"
2. Top blog result (Abstract Appeal)
2. Top citizen media result (Abstract Appeal)
4. Top media result (CNN)
65. Top NY Times mention (NY Times)
Winner (in spirit): Citizen media
Winner (actual): Blogs
6) Pope John Paul II dies and Cardinal Joseph Ratzinger appointed Pope Benedict XVI.
Search term: "pope john paul ii death"
1. Top media result (CNN)
3. Top citizen media result (Wikipedia)
58. Top blog result (The Pope Blog: Pope Benedict XVI)
No NY Times article appears in the first 100 results.
Winner (in spirit): Media
Winner (actual): Blogs
7) The Israeli withdrawal from the Gaza Strip.
Search term: "gaza withdrawal"
1. Top media result (Worldpress.org)
31. Top blog result (Simply Appalling)
31. Top citizen media result (Simply Appalling)
No NY Times article appears in the first 100 results.
Winner (in spirit): Media
Winner (actual): Blogs
8) The investigation into the Valerie Plame affair, Judith Miller, Scooter Libby indicted, etc..
Search term: "scooter libby indicted":
1. Top media result (CNN)
15. Top blog result (Seven Generational Ruminations)
15. Top citizen media result (Seven Generational Ruminations)
43. Top NY Times mention (NY Times)
Winner (in spirit): Media
Winner (actual): Blogs
And just for fun here's a search for "judith miller jail" (not included in the final tally):
1. Top media result (Washington Post)
3. Top blog result (Gawker)
3. Top citizen media result (Gawker)
No NY Times article appears in the first 100 results (even though there are several matching articles on the Times site).
In covering the jailing of their own reporter, the Times lagged in the Google results behind such informational juggernauts as Drinking Liberally, GOP Vixen, and Feral Scholar.
Winner (in spirit): Media
Winner (actual): Blogs
Here's the overall results, excluding the Judith Miller search:
Overall winner (in spirit): Media (beating citizen media 6-2).
Overall winner (actual): Blogs (beating the NY Times 6-2).
Some observations:
- My feeling is that Mr. Nisenholtz will likely lose his bet come 2007. Even though the nytimes.com fares very well in getting linked to by the blogosphere, it does very poorly in Google. This isn't exactly surprising given that most NY Times articles disappear behind a paywall after a week and some of their content (TimesSelect) isn't even publicly accessible at all. Also, I didn't look too closely at the HTML markup of the NY Times, but it could also be that it's not as optimized for Google as well as that of some weblogs and other media outlets.
- "www.nytimes.com" has a PageRank of 10/10, higher than that of "www.cnn.com" (9/10), yet stories from CNN consistently appeared higher in the search results than those from the Times. The Times clearly has overall authority according to Google, but when it comes to specific instances, it falls short. In some cases, a NY Times story didn't even appear in the first 100 search results for these keyword searches.
- By 2007, it may be difficult to differentiate a blog from a traditional media source. All of the Gawker and Weblogs, Inc. sites are presented in a blog format and are referred to as blogs but otherwise how are they distinguishable from traditional media? Engadget paid to send 12 people to cover the CES technology conference, probably as many or more than the Times sent. The Sundance film festival was heavily covered by paid writers for both companies as well. In the spirit in which this bet was made, I'd have a hard time counting any of their sites as blogs. (And what about kottke.org? I get paid to write it. Am I still a member of the citizen media or have I crossed over?)
- Choosing appropriate news stories and keywords for those stories was difficult in some cases. Katrina was a no-brainer, but was the Terri Schiavo story really one of the top eight news stories of 2005? Resolving the methodology for this bet in 2007 will be tricky. I wonder how the Long Bets Foundation will handle its determination of the victory.
- Wikipedia does very well in Google results for topical search terms. Overall, traditional media still dominates (in first appearance as well as number of results), but blogs and Wikipedia do very well in some instances.
- What do these results mean? Probably not a whole lot. Nisenholtz asserts that "[news] organizations like the Times can provide that far more consistently than private parties can" while Winer says that "in five years, the publishing world will have changed so thoroughly that informed people will look to amateurs they trust for the information they want". It's difficult to draw any conclusions on this matter based on these results. Contrary to what most people believe, PageRank has a bias, a point of view. That POV is based largely (but not entirely) on what people are linking to. As someone said in the discussion of this bet, this bet is about Google more than influence or reputation, so these results probably tell us more about how Google determines influence on a keyword basis rather than how readers of online informational sources value or rate those sources. Do web users prefer the news coverage of blogs to that of the NY Times? I don't think you can even come close to answering that question based on these results.
Methodology and caveats
The eight news stories were culled from various sources (Lexis-Nexis, Wikipedia, NY Times) and narrowed down to the top stories that would have been prominently covered in both the NY Times and blogs.
The keyword phrase for each of the eight stories was selected by the trial and error discovery of the shortest possible phrase that yielded targeted search results about the subject in question. In some cases, the keyword phrase chosen only returned results for a part of a larger news story. For instance, the phrase "pope john paul" was not specific enough to get targeted results, so "pope john paul ii death" was used, but that didn't give results about the larger story of his death, the conclave to select a new pope, and the selection of Cardinal Joseph Ratzinger as Pope Benedict XVI. In the case of "katrina", that single keyword was enough to produce hundreds of targeted search results for both Hurricane Katrina and its aftermath. Keyword phrases were not tinkered with to promote or demote particular types of search results (i.e. those for blogs or nytimes.com); they were only adjusted for the relevence of overall results.
The searches were all done on January 27, 2006 with Google's main search engine, not their news specific search.
Since the spirit of the bet deals with the influence of traditional media versus that of citizen-produced media, I tracked the top traditional media (labeled just "media" above) results and the top citizen media results in addition to blog and nytimes.com results. For the purposes of this exercise, relevent results were those that linked to pages that an interested reader would use as a source of information about a news story. For citizen media, this meant pages on Wikipedia, Flickr (in some cases), weblogs, message boards, wikis, etc. were fair game. For traditional media, this meant articles, special news packages, photo essays, videos, etc.
In differentiating between "media" & citizen media and also between relevent and non-relevent results, in only one instance did this matter. Harriet Miers's Blog!!!, a fictional satire written as if the author were Harriet Miers, was the third result for this keyword phrase, but since the blog was not a informational resource, I excluded it. In all other cases, it was pretty clear-cut.
David Carr wrote an article for the NY Times about the Washington Post's recent decision to close down comments on their blog when one of their threads turned ugly. As the article points out, the issue of web sites having problems dealing with feedback (particularly published feedback like comments) is not localized to mainstream media publications:
Mickey Kaus of kausfiles.com, which does not carry comments, said that "the world is crying out for the jerk-zapper," although he added that he thought that The Washington Post's Web site overreacted. BoingBoing, a heavily trafficked "directory of wonderful things," shut down its comments section last year. "We took a lot of heat over it," said Xeni Jardin, a founder of the site. "But until we are able to come up with a better comments system - most of what is out there is too crude - it is not worth the trouble.
If you're wondering why the comments on kottke.org aren't on more often, this is the reason.[1] This site is a one-person operation and even though I work on it full-time, I don't have the throughput to manage a lot of threads. Comment gardening (as I call it) is hard work if you want to maintain an appropriate level of discourse. And as Xeni said, the current technological and user experience solutions suck. Approved commenting, sign-in to comment, Slashdot-like comment moderation...they all have their problems.
As an experiment back in October, I opened the comments on all threads on kottke.org for a little over a week. During that time, I kept track of my comment gardening duties, basically everything I did to keep those threads clear of trolling, flaming, off-topic comments, and the like. The only thing I didn't record was how many times per day I checked for activity in all the open threads -- every 15-30 minutes or so while I was awake (~8am to midnight) -- because I would have been too busy recording the checking to actually do the checking. At one point, I had almost 60 simultaneous threads open and was spending half my day keeping up with all of them.
After more than a week, I stopped recording everything...even though most of the threads were still open and the comments, flames, trolls, and spam kept pouring in. But the resulting document will still give you some idea of what's involved with opening comments on kottke.org. I would love better tools to deal with this because I enjoy having comments open on the site and so do my readers. But for now, I think it's a better use of my time to focus on other aspects of the site and open comments when I feel a particular post would benefit from them.
[1] You can't imagine the reasons I've heard about why comments are off on kottke.org. Most of them are variations on the theme of: "All the big bloggers have their comments turned off because they're too stuck-up and self-important to care what their readers have to say, those arrogant bastards. They can't stand people disagreeing with them." And so on.
Two weeks ago, I wrote:
In terms of editorial and quality, I am unconvinced that a voting system like Digg's can produce a quality editorial product.
Lloyd Shepherd, Deputy Director of Digital Publishing at Guardian Unlimited, has been thinking along similar lines:
Everything we do to "edit" the [Guardian Unlimited] site seeks to keep a balance between editorial instinct and the desires of the audience, and that, in doing that, we may be reflecting the "community" more fairly, both mathematically and ethically, than the likes of digg.
So how do you reflect the community more fairly? Paging Mr. Surowiecki:
In order for a crowd to be smart, [Surowiecki] says it needs to satisfy four conditions: 1. Diversity, 2. Independence, 3. Decentralization, and 4. Aggregation.
Much of the online media we're familiar with uses a mix of humans and automated systems to perform the aggregating task. Human editors choose the stories that will run in the newspaper (drawing from a number of sources of information as Lloyd illustrated), blog authors select what links and posts to put on their blog (by reading other blogs & media outlets, listening to reader feedback, and sifting through already aggregated sources like del.icio.us or Digg), and the editors of Slashdot filter through hundreds of reader submissions a day to create Slashdot's front page. Google News uses technology to decide which stories are important, based primarily on what the publishers are publishing. Digg and del.icio.us rely almost entirely on the crowd to submit and determine by a simple vote what stories go on its front page.
Some of these methods work better than others for different tasks. The product of 50,000 diverse, independent, decentralized bloggers is probably more editorially interesting, fair, and complete than that of 50,000 diverse, independent, decentralized Digg users, but the Digg vote & tally approach is less time-intensive for all concerned and the information flows faster. A site like Slashdot sits in the middle...it's a little slower than Digg but offers a more consistent editorial product. A hybrid Digg+Slashdot approach (which is not unlike the one used by individual bloggers) would be for Digg to produce a "Digg digest", a human selected (could use simple voting or let the most highly respected community members choose) collection of the best stories of the day that incorporates what was said in the comments and around the web as well. Actually, I think if you wanted to start a blog that did this, it would do very well.
If you're like me, you're waiting patiently for that day in early January when you can go more than 10 minutes without seeing a reference to some best of 2005 list. If you're also like me, you love lists so much that you can't get enough of them. So, with apologies to that first part of me, here's a final 2005 lists from me: a few movies, weblogs, books, and musical selections that I enjoyed this past year (in no particular order).
Music (not necessarily released in 2005)
Ladytron, Witching Hour. This one grew on me a lot.
Kelly Clarkson, Since U Been Gone.
Fischerspooner, Odyssey.
Bloc Party, Silent Alarm.
Royksopp, The Understanding.
Diplo, Megatroid Mix. (download)
Boards of Canada, Campfire Headphase.
Mark Mothersbaugh (and others), The Life Aquatic soundtrack.
Stars, Set Yourself on Fire.
Clap Your Hands Say Yeah, Clap Your Hands Say Yeah.
Kanye West, Gold Digger.
Sigur Ros, Takk.
BBC Philharmonic, Beethoven's Symphonies.
Two disappointments: Franz Ferdinand, You Could Have It So Much Better and Broken Social Scene by the band of the same name. I enjoyed Franz's debut album and You Forgot It in People so much, but the follow-ups fell flat for me. Still trying though...
Movies (not necessarily released in 2005)
Primer.
Garden State.
Crash.
Revenge of the Sith.
Sideways.
Million Dollar Baby.
Deliverance.
Cinderella Man.
King Kong.
Didn't see a lot of movies this year, unfortunately.
Books
Wind-Up Bird Chronicle, Haruki Murakami.
The Corrections, Jonathan Franzen.
Snow Crash, Neal Stephenson.
Consider the Lobster, David Foster Wallace.
Jonathan Strange and Mr. Norrell, Susanna Clarke.
The Botany of Desire, Michael Pollan.
Pieces for the Left Hand, J. Robert Lennon.
Freakonomics, Steven Levitt, Stephen Dubner.
I read a ton of non-fiction but always enjoy the small amount of fiction I do read more.
Favorite weblogs. Compare with last year's list.
Waxy. Despite a year-end Yahoo! slowdown/hangover, still one of the absolute best.
Collision Detection. Enthusiasm about technology without the irrational exuberance or Web 2.0ness of other tech/tech culture blogs.
del.icio.us inbox. Not technically a blog, but I love this ever-fresh flow of my friends' favorites.
Robotwisdom. The original weblog was back this year after a 1.5 year hiatus. Jorn still has it.
The Morning News. Also not technically a blog, but TMN has been delivering high quality content on a daily basis for a long time now.
Flickr friends. Still the most fun on the web.
Cynical-C. Can't remember where or when I found this one, but almost every single thing on there is something I'm interested in.
Scripting News. I skim most of his opinion stuff, disagree with 90% of the rest of what I do read, but Dave has his finger on the pulse of the part of the web I care most about. He gets links so quickly sometimes that I think he's actually part RSS aggregator. "He's more machine than man now." "No, there is still good in him..."
Boing Boing. There's stuff I don't care about here, but the best of BB is really good.
3 Quarks Daily. The most accessible smart weblog out there.
Marginal Revolution. Quirky economics. Interesting everyday.
Goldenfiddle. I dislike celebrity gossip, but gf makes it seem interesting somehow. Damn you!
Youngna. Rationally exuberant.
You may notice that there are few "pro" blogs on this list. The best stuff out there is still being generated by interested, enthusiastic amateurs. When you're producing media for a profit, there's a certain vitality that's lost, I think...a loss I've been struggling with on kottke.org for the past few months. kottke.org was on last year's list but doesn't appear this year...here's hoping for a better year for the site in 2006.
Update: I fucked up on this post and you should reread it if you've read it before. After reading this post by Niall Kennedy, I checked and found that I have mentioned or linked to the site for Freakonomics 5 times (1 2 3 4 5), not 13. The other 8 times, I either linked to a post on the Freakonomics blog that was unrelated to the book, had the entry tagged with "freakonomics" (tags are not yet exposed on my site and can't be crawled by search engines), or I used the word "Freakonomists", not "Freakonomics". Bottom line: the NY Times listing is still incorrect, Google and Yahoo picked up all the posts where I actually mentioned "Freakonomics" in the text of the post but missed the 2 links to freakonomics.com, Google Blog Search got 2/3 (& missed the 2 links), Technorati got 1/3 (& missed the 2 links), and IceRocket, Yahoo Blog Search, BlogPulse, & Bloglines whiffed entirely. Steven Levitt would be very disappointed in my statistical fact-checking skills right now. :(
I wish Niall had emailed me about this instead of posting it on his site, but I guess that's how weblogs work, airing dirty laundry instead of trying to get it clean. Fair enough...I've publicly complained about the company he works for (Technorati) instead of emailing someone at the company about my concerns, so maybe he had a right to hit back. Perhaps a little juvenile on both our parts, I'd say. (Oh, and I turned off the MT search thing that Niall used to check my work. I'm not upset he used it, but I'm irritated that it seems to be on by default in MT...I never intended for that search interface to be public.)
------
The NY Times recently released their list of the most blogged about books of 2005. Their methodology in compiling the list:
This list links to a selection of Web posts that discuss some of the books most frequently mentioned by bloggers in 2005. The books were selected by conducting an automated survey of 5,000 of the most-trafficked blogs.
Unsurprisingly, the top spot on the list went to Freakonomics. I remembered mentioning the book several times on my site (including this interview with author Steven Levitt around the release of the book), so I checked out the citations they had listed for it. According to the Times, Freakonomics was cited by 125 blogs, but not once by kottke.org, a site that by any measure is one of the most-visited blogs out there.[1] A quick search in my installation of Movable Type yielded 13 5 mentions of the book on kottke.org in the last 9 months. I had also mentioned Blink, Harry Potter, Getting Things Done, Collapse, The Wisdom of Crowds, The Singularity is Near, and State of Fear, all of which appear in the top 20 of the Times' list and none of which are cited by the Times as having been mentioned on kottke.org in 2005.
I chalked this up to a simple error of omission, but then I started checking around some more. Google's main index returned only three distinct mentions of Freakonomics on kottke.org. Google Blog Search returned two results. Yahoo: 3 results (0 results on Yahoo's blog search). Technorati only found one result (I'm not surprised). Many of the blog search services don't even let you search by site, so IceRocket, BlogPulse, and Bloglines were of no help. (See above for corrections.) I don't know where the Times got their book statistics from, but it was probably from one of these sites (or a similar service).
Granted this is just one weblog[2], which I only checked into because I'm the author, but it's not like kottke.org is hard to find or crawl. The markup is pretty good [3], fairly semantic, and hasn't changed too much for the past two years. The subject in question is not off-topic...I post about books all the time. And it's one of the more visible weblogs out there...lots of links in to the front page and specific posts and a Google PR of 8. So, my point here is not "how dare the Times ignore my popular and important site!!!" but is that the continuing overall suckiness of searching blogs is kind of amazing and embarrassing given the seemingly monumental resources being applied to the task. It's forgivable that the Times would not have it exactly right (especially if they're doing the crawling themselves), but when companies like Technorati and Google are setting themselves up as authorities on how large the blogosphere is, what books and movies people are reading/watching, and what the hot topics online are but can't properly catalogue the most obvious information out there, you've got to wonder a) how good their data really is, and b) if what they are telling us is actually true.
[1] Full disclosure: I am the author of kottke.org.
[2] This is an important point...these observations are obviously a starting point for more research about this. But this one hole is pretty gaping and fits well with what I've observed over the past several months trying to find information on blogs using search engines.
[3] I say only pretty good because it's not validating right now because of entity and illegal character errors, which I obviously need to wrestle with MT to correct at some point. But the underlying markup is solid.
One of the most interesting things to come out of the secret sites discussion is that people are keeping their private journals on the web instead of in a paper journal under their mattress or in a Word document on their computer. This sounds surprising, but there's a couple of good reasons for it:
- The tools for writing, organizing, and searching an online journal written with Typepad or LiveJournal are superior to those for writing a paper journal or an electronic diary (in Word or text format) stored locally. Hyperlinks, entries organized by date, mood, category, if you're used to using these things writing a public site, you might have trouble going back to just text in a Word document for your important innermost thoughts.
- Your diary may actually be more private and secure on the web. A password protected online journal is more difficult for a parent, significant other, or parole officer to stumble upon and read than a document sitting on a hard drive of a shared computer or hidden on the top shelf of a closet, especially if you're careful with your cookies, browser history, choose a good password, and are more computer savvy than said parent/S.O./P.O.
I bet few would have predicted keeping personal diaries secret as a use of the public internet several years ago.
The decompression from my trip to Asia continues. I have read through ~8000 items in my newsreader and discarded almost all of them (despite much interest in solving the problem, no one has built a machine that has any idea about what content needles I want out of the media haystack).
However, one item caught my interest (although I can't remember where I saw it): someone asked their readers how many secret sites/blogs they maintained. That is, sites that no one knows you're the author of (written anonymously or with a nom de plume) or sites to which the general public does not have access. If I remember correctly, a large number of the respondents not only maintained a secret site, but had several. I have one secret blog, published under my own name, that only a small group of friends can read. I just started it recently (after learning that several friends have been doing this for awhile) and don't update it very often. How about you...any secret sites? Why keep them on the down-low?
The day before yesterday, we went for dim sum for lunch again...can't get enough of those meat-stuffed buns and pastries. This time, we cleverly arranged to bring some locals along so we'd have a little better idea what was going on food-wise. Or rather, they cleverly arranged to meet up with us. A couple of days into the trip, we received an email from a couple of HK high school students, Denise and Christine. They just happened to be working on an article about blogging for a school magazine that gets published once a year, and wrote to see if they could interview us. We agreed -- on the condition that we treat them to dim sum -- and off we went on Saturday to the Chao Inn on Peking Road in TST.
We ordered a variety of dim sum, including a Chaozhou specialty dish (made of beef...it looked a little like headcheese), which after an initial taste by everyone at the table, was left for the wait staff to collect. We also had some shrimp dumplings, BBQ pork buns, sticky rice (and beef?) wrapped in lotus leaf, spring rolls, and some rice noodle dish I'm forgetting the exact ingredients of. We chatted about food, blogging, teen life in Hong Kong, movies, etc. They attend an English-speaking school, so their English was quite good and the conversation flowed easily. A favorite conversational tidbit was that when you buy fake electronics in Hong Kong, they ask you which logo you want on it (Sony, Panasonic, NEC, etc.) and then affix the proper sticker. Awesome.

Thanks for the nice lunch, girls. I hope you got what you needed for your interview.
If you happen to be in NYC on November 3rd, stop by Eyebeam in the evening and check out a panel that I'm on about criticism called "Everybody's A Critic, Or Are They?" Here's a description:
With 9 million blogs, umpteen online message boards, thousands of shows on hundreds of cable channels, and an increased number of magazines on the newsstand, the number of outlets for expressing criticism has never been higher and the barriers to would-be critics have never been lower. Is this devaluing evaluation or does the shotgun approach result in better criticism? YOU be the Judge!
Joining me on the panel are Emily Gordon, Village Voice film critic Michael Atkinson, and Columbia professor & author Duncan Watts. The wonderful Steven Heller will moderate and no doubt bring the conversation to a higher level. Details:
November 3, 2005
7:00 PM - 9:00 PM
Eyebeam (map)
540 W. 21st St.
New York, NY 10011
On my web travels the other day, I came across a new (to me) kind of weblog, the tumblelog. Here are a few examples to get the gist of what a tumblelog is: hit projectionist first and then Anarchaia (which seems to have been the first one), Church Burning tumblelog, Mikael's Tumblelog, and ones zeros majors and minors.
A tumblelog is a quick and dirty stream of consciousness, a bit like a remaindered links style linklog but with more than just links. They remind me of an older style of blogging, back when people did sites by hand, before Movable Type made post titles all but mandatory, blog entries turned into short magazine articles, and posts belonged to a conversation distributed throughout the entire blogosphere. Robot Wisdom and Bifurcated Rivets are two older style weblogs that feel very much like these tumblelogs with minimal commentary, little cross-blog chatter, the barest whiff of a finished published work, almost pure editing...really just a way to quickly publish the "stuff" that you run across every day on the web.
Many of the tumblelogs I ran across seem to be powered by Ruby on Rails, itself a quick and dirty programming framework that emphasizes fast prototyping. You can kind of see how tumblelogging is the blog equivalent of Rails. Christian Neukirchen describes how he edits his tumblelog using a templating language called Vooly.
I like the idea of tumblelogging a lot; I've been slowly moving kottke.org in a similar direction for awhile. Different ways of displaying various types of content...remaindered links, regular posts, book reviews, and movie reviews are all displayed differently. I'm working on incorporating photo albums and perhaps a daily photolog...as well as a couple other different types of content. I've been focusing a lot more on the remaindered links (because they're more fun and closer to pure editing, which I enjoy a lot more than writing) and less on the magazine-like regular posts-with-titles. The further away from punditry I can get, the better it will be for all of us.
Boy, the scent of money is in the air these days. The latest report is that Dave Winer has sold weblogs.com to Verisign (~$5 million is the figure being bandied about for $2.3 million). This is an interesting one because it seemed crazy (see below) when I first heard about it, but now that I've heard it from multiple sources, who knows?
Verisign is interested in blogs and RSS (another of their acquisitions in this space will be announced soon) and it's not hard to see why Dave would sell weblogs.com (the site needs some firm financial backing to keep from buckling under the ever-increasing strain of all those pings), but to Verisign? To me, Verisign embodies the idiocy and ineptitude of the BigCos Dave often rails against...the BigCo to end all BigCos. If true, those are some odd bedfellows indeed.
Update: Silicon Beat says they have confirmation that Verisign bought weblogs.com:
We're getting confirmation that the rumors about Verisign buying Dave Winer's Weblogs.com are true. The price is $2 million. What Verisign wants with Weblogs is another matter. Weblogs was one of the first, if not the first, centralized ping servers that blogs could use to alert the world to new content.
I like how when a weblog has two independent sources on something, it's a "rumor"...
Update #2: Verisign confirms the purchase.
Jeremy Heigh makes an interesting observation about a recent thread on kottke.org, which I think applies broadly across the blogosphere:
...we were trying to understand how to better leverage all the great, individual thinking being done on blogs because what Kottke hosted wasn't a conversation at all. It was nearly 80 people carrying on their own conversations with themselves while others watched. That's not a conversation -- that's philosophical voyeurism spiced with a hint of insanity.
I think choice of topic, the way in which the question is posed, and the pace of the commenting[1] has a lot to do with it. Despite the large number of comments, there are some threads on kottke.org which have been more