## Weblogs and power laws FEB 09 2003

Many systems and phenomena are distributed according to a power law distribution. A power law applies to a system when large is rare and small is common. The distribution of individual wealth is a good example of this: there are a very few rich men and lots & lots of poor folks. A familiar way to think about power laws is the 80/20 rule: 80% of the wealth is controlled by 20% of the population.

It's been shown that the distribution of links on the web scales according to a power law, so it comes as no surprise that the distribution of links to weblogs does as well. Taking the top 100 most linked to weblogs on Technorati as a data set (specifically from 1/24/03), I used Excel to plot and fit a curve to the data:

The data conforms quite well to a power law curve. The R-squared value, a measure of how well the curve fits the data (1.0 is a perfect fit), is 0.9918. I ran a similar analysis of the distribution of the top 200 inbound referers to kottke.org and observed a fit of the data to a power law curve (R-squared = ~0.95). Clay Shirky showed that the distribution of the number of outbound links in the LiveJournal community follows a power law. Paul Hammond has observed a similar pattern with his outgoing links.**

This NEC study reveals that the deviation of a set of data from the power law correlates to how much competition is present in the system. The better the fit, the more competitive the environment is. Again, no surprise that the system of weblogs is a highly competitive one.

But what are weblogs competing for? Matt Webb posits that power laws arise due to scarcity. Links themselves can't be scarce (a page can have as many links as it can hold without running out), but they are a measure of something that is: people.

More specifically, the time that people have for visiting sites and linking to sites is limited. Mary only has so much time for visiting weblogs; if she goes to BoingBoing, she doesn't have time for MetaFilter. Some visitors are linkers and they link what they visit. Similarly, linkers have only so much time for linking. Sam can link to 20 sites about airplanes, but he can't link to 5000. The scarcity of people's time results in the distribution of links that can be described using power laws.

** Other places you *might* find power laws in the weblog world if you took the time to look: Daypop Top 40, Blogdex top links, the Blogging Ecosystem (in both "most linked" and "most prolific linkers" data sets), average # of posts per weblog, average # of words per post, average # of smileys per post, # of visitors per weblog, # of comments per post per weblog, and so on...

Addendum: I wrote this post last week (Wednesday or Thursday, can't remember exactly), but didn't publish it until today. Clay Shirky published an article called Power Laws, Weblogs, and Inequality on similar issues over the weekend (probably prompted by these two threads, same as me). Rather than modify my post to include a discussion of Clay's findings, I decided to leave it the way it was. I added a link to his article under "further reading" and figure that the discussion here will apply to both. Have at it!