QuickTopic free message boards logo

The document below has a numbered blue "comment dot" () at the beginning of selected items. Click a blue dot to add your comment regarding that item. A glasses icon () indicates existing comments on an item; click it to see them. Click the buttons above to navigate between views.

You can add a general comment here:
 View general commentsAdd a general commentAdd a general comment
Show comments in-line

Power Laws, Blogs, Newspapers and Movies

Add your comment on this item1  Clay Shirky's Power Laws, Weblogs, and Inequality set off a lot of discussion on Power laws and weblogs. Clay's paper is correct as far as it goes, but it makes a couple of classic mistakes. Despite saying that the numbers represent a continuum, Clay focuses mainly on the top few, and then classifies the different blogs into 3 classes, propagating the same 'hubs' or 'connectors' myth that bedevils discussions of power laws post 'Tipping Point'. Clay plots everything in linear space, making it very hard to see the accuracy of the power laws claimed.

Add your comment on this item2  The easiest way to see a power law is to plot the data in log/log space. If there is a power law relationship, you get a straight line. Here's the Technorati top 100 plotted in this way:

Incoming Blog links by rank



Add your comment on this item3 
Log(blogs) = 8.7554359 - 0.8350166 Log(rank)

Add your comment on this item4 
R20.93801

Add your comment on this item5  Clay suggests we compare this with the top 100 newspapers by sales:

Newspaper sales by rank



Add your comment on this item6 
Log(sales) = 14.728982 - 0.6479379 Log(rank)

Add your comment on this item7 
R20.984953

Add your comment on this item8  There is a superficial similarity, but note the trend at the right hand side in each case - the blogs trend above the line, the newspapers below. Lets look at Movie box office gross by rank (data from the-numbers.com)

Box Office gross By rank

Add your comment on this item9  Weight: gross


Add your comment on this item10 
Log(gross) = 21.385097 - 0.6423828 Log(rank)

Add your comment on this item11 
R20.813102

Add your comment on this item12  Here we can clearly see a saturation effect - the power law breaks down after a while, as there are feedback mechanisms in place for movies (they won't show a second week if they gross poorly in the first). Also, the cost of entry to movie making is rather high. Lets stop and think about this a little though. There were many more movies made between 1991 and 2002 than are shown in this data, because most of them didn't get a cinematic release- all those student films, wedding videos and family vacations that were made don't show up in these figures. There is a sampling bias here, as there is in the other figures too.

Add your comment on this item13  Lets look at the original data Clay cited - the Blogging Ecosystem, as this provides more detailed data than just the top 100. I took a dump of the python data there, and ran a script over it to give me the number of blogs with each number of links, then assigned a rank. If we plot this, we see a similar effect.

Incoming links By rank



Add your comment on this item14  However, there is obviously sample bias at work here too - Ev claims over a million blogs on Blogger, with 200,000 active, so the total of a few thousand here is obviously missing most blogs, and most likely missing those at the lower end. However, we still get about 4000 blogs with one incoming link each. Let's plot the total number of incoming links by rank - multiply the incoming links by the number of blogs listed with that many incoming links:

Total blog links By rank



Add your comment on this item15  Suddenly, the picture is not so clear - we have a great many links in these 'low ranked' blogs. In fact we have more there than in the high-ranked ones. And that's just with 10,000 or so blogs, not the full million. Lets look at this another way. I binned the data in powers of 2, (blogs with 1,2,4,8,16 etc links) and plotted that:

sites and links by incoming links




Add your comment on this item16  This is the other way round - the 'low ranked' blogs by links are at the left, the high-ranked ones at the right. See how many more links the low-ranked ones account for (remember, this sample is missing over 90% of low ranked blogs).

Add your comment on this item17  Another way of looking at this, showing all the data points:

>sites and links by incoming links




Add your comment on this item18  The green line sloping from top right to bottom left is Clay's power law, the red line from top left to bottom right is the number of blogs with that many incoming links. They roughly balance out the total number of links.

Add your comment on this item19  So what conclusions can we draw from all these graphs?

  1. View comments on this itemAdd your comment on this item20 Weblog links do follow a power law
  2. Add your comment on this item21 This saturates less quickly than other media, due to low barriers to entry
  3. View comments on this itemAdd your comment on this item22 Therefore the many lightly linked weblogs outnumber the few heavily linked ones

Add your comment on this item23 Kevin Marks, February 17th 2003