This morning I posted a comparison of the growth in messages with both Blogger and Twitter. The Twitter data was based on information collected by Andy Baio in a post that was widely read in the blogosphere. In the course of looking at the Twitter data, neither of us noticed that from Nov 21, 2006 to Feb 4, 2007 and March 9, 2007 to the present, the Twitter post IDs had the same last digit, indicating that the data is not strictly sequential. If you look at Twitter’s public timeline, the Twitter post IDs skip around by multiples of 10.
Anil suggested via email that could be an artifact of database sharding and lo and behold, if you take off the last digit of the post ID, they seem to become sequential again, more or less. He’s going to ask the Twitter gang about it.
For right now though, the parts of this morning’s post that rely on Twitter data from the above dates is incorrect. Basically, all of it. Here it is in all caps: WRONG WRONG WRONG ERROR ERROR, F——-, WOULD NOT BUY DATA ANALYSIS FROM AGAIN. In hindsight, it seems obvious that the data was incorrect…that sort of growth seems impossible, especially when Twitter was having all sorts of scaling problems. Anyway, good thing this is just a blog and not a refereed journal, eh? Big thanks to the commenters in the other post for pointing me toward the error. More as I have it.
Update: Email from Biz Stone, who works for Twitter. He says:
There’s truth in the essence of what you’re talking about here — Twitter updates *are* coming in faster and furiouser than Blogger updates. However, the way we number Twitter updates has switched back and forth a few times which pretty much screws up the exactness of your analysis.
We have been doubling the number of active users about every three weeks for a sustained period of months now which is definitely contributing significantly to more and more updates. Also, active users of Twitter a measured by how many times they update per day (at Blogger it was per month). So activity in general at Twitter is crazy by comparison.
We’re going to start digging in to more data visualization, user patterns, etc in the coming weeks so if there’s anything you think we should be looking at specifically please let us know!
So we’ll have to wait a few weeks for an accurate look at this stuff. (thx, biz)
Important update: I’ve re-evaluated the Twitter data and came up with what I think is a much more accurate representation of what’s going on.