kottke.org home archives + xml about kottke.org contact me
kottke.org - home of fine hypertext products

Google

Google Apps

The NY Times today:

On Thursday, Google, the Internet search giant, will unveil a package of communications and productivity software aimed at businesses, which overwhelmingly rely on Microsoft products for those functions.

The package, called Google Apps, combines two sets of previously available software bundles. One included programs for e-mail, instant messaging, calendars and Web page creation; the other, called Docs and Spreadsheets, included programs to read and edit documents created with Microsoft Word and Excel, the mainstays of Microsoft Office, an $11 billion annual franchise.

kottke.org from April 2004:

Google isn't worried about Yahoo! or Microsoft's search efforts...although the media's focus on that is probably to their advantage. Their real target is Windows. Who needs Windows when anyone can have free unlimited access to the world's fastest computer running the smartest operating system? Mobile devices don't need big, bloated OSes...they'll be perfect platforms for accessing the GooOS. Using Gnome and Linux as a starting point, Google should design an OS for desktop computers that's modified to use the GooOS and sell it right alongside Windows ($200) at CompUSA for $10/apiece (available free online of course). Google Office (Goffice?) will be built in, with all your data stored locally, backed up remotely, and available to whomever it needs to be (SubEthaEdit-style collaboration on Word/Excel/PowerPoint-esque documents is only the beginning). Email, shopping, games, music, news, personal publishing, etc.; all the stuff that people use their computers for, it's all there.

When you swing a hammer in the vicinity of so many nails, you're bound to hit one on the head every once in awhile. Well, I got it in the general area of the nail, anyway.

New Google Maps features

Not sure when these features were added, but Google Maps now displays public transportation stops (NYC subway, the T in Boston, the L in Chicago) and building outlines for metropolitan areas. Here's a shot of the West Village in NYC:

Google Maps subway stops and buildings

Tiny but useful improvements. (thx, meg)

No nofollow

All links on Wikipedia now automatically use the "nofollow" attribute, which means that when Google crawls the site, none of the links it comes across get any PageRank from appearing on Wikipedia. SEO contest concerns aside, this also has the effect of consolidating Wikipedia's power. Now it gets all the Google juice and doesn't pass any of it along to the sources from which it gets information. Links are currency on the web and Wikipedia just stopped paying it forward, so to speak.

It's also unclear how effective nofollow is in curbing spam. It's too hard for spammers to filter out which sites use nofollow and which do not and much easier & cheaper just to spam everyone and everywhere. Plus there's a not-insignificant echo effect of links in Wikipedia articles getting posted elsewhere so the effort is still worth it for spammers.

Googling from the future

A few years ago, I wrote about the potential hazards of watching time-shifted entertainment. Meg and I were watching a Red Sox-Yankees playoff game on TiVo and were about 20 minutes behind realtime events when Meg's phone rang:

She picked it up and looked at it, distracted by the game and unsure of what to do with it. I immediately realized it was her parents, calling with word of the completed game.

"No, no, don't answer it!" I yelled. "It's your parents! They're calling from the future!"

In promoting season four of The Wire, HBO sent out screener DVDs of the entire season to reviewers. By mid-October, some enterprising person ripped those DVDs and made all season 4 episodes available online, more than a month before the final episode was to be shown on TV. Unfortunately, those early viewers did some Googling about upcoming plot points which ended up in the referer logs of Heaven and Here, a popular blog about The Wire. (Note: if you haven't watched all of season 4, DON'T CLICK THROUGH to Heaven and Here...major spoilers!!) A spoiler-free excerpt:

Finally, I would like to say a few words on spoilers, On-Demand, and the concept of the collective. My big spoiler moment came about halfway through the season, which is rather a lucky break for me considering how much material I have been traversing each week related to the show. It was in the search terms for this very site, and it came in just three words: "[redacted]" It's the image you see for a second, recognize that you don't want to see, and quickly turn away from but can never even hope to forget. [...] I was able to avoid other spoilers, which again is kind of miraculous, but that note rang in my head all season, and it also had to be this ugly secret i kept while discussing the show here and with friends.

Who says time travel hasn't been invented yet?

Historical maps on Google Earth

Google Earth recently added some maps from the David Rumsey Historical Map Collection to their software, so you can just click them on and off on the globe. Included are a US map from 1833, a 1680 map of Tokyo, Paris from 1716, and a world map from 1790. I spent some time exploring the map of New York from 1836. Here's a screenshot of the southern tip of Manhattan with the present-day buildings turned on:

Nyc Gearth Rumsey

A larger version is available on Flickr. Google Earth continues to be a fantastic software product. It's almost more of a game than an atlas or educational program...so much fun.

Related: I did a project using Google Earth called Manhattan Elsewhere and made a scrollable, zoomable version of Viele's Map of Manhattan.

Search, always dead

Via Tim O'Reilly comes this comment from Bill Burnham:

A couple of months ago I had the pleasure of moderating a panel at TIECon on the Search Industry. Peter Norvig, Google's Director of Research, made one comment in particular that stood out in my mind at the time. In response to a question about the prospects for the myriad of search start-ups looking for funding Peter basically said, and I am paraphrasing somewhat, that search start-ups, in the vein of Google, Yahoo Ask, etc. are dead. Not because search isn't a great place to be or because they can't create innovative technologies, but because the investment required to build and operate an Internet-scale, high performance crawling, indexing, and query serving farm were now so great that only the largest Internet companies had a chance of competing.

For Norvig to say what he did seems a little crazy, given the company he works for. The first time that search died was back in 1998. Yahoo, Altavista, Hotbot, Webcrawler, and other sites had the search game all sewn up. They were all about the same in terms of quality and people found what they were looking for much of the time. No one needed another search engine, and starting a search company in such a mature market seemed like folly. Around that time, Google became a company and eventually the world figured out it really did need another search engine.

Google code search

Google launched a new code search feature today. At least two sites already offer this functionality, but a great deal of attention follows Google wherever they go.

Code search is a great resource for web developers and programmers, but like the making available of all previously unsearched bodies of information, it's given lots of flashlights to people interested in exploring dark corners. Here are some things that people have uncovered already:

Got any other Google code search goodies? Send them along. If you find this interesting, Digg this story.

I, for one, welcome our pixelated Google overlords

Pixelated Google

Portraits of Larry, Sergey, and Eric Schmidt courtesy of eboy.

Blogs versus the NY Times in Google

In 2002, Dave Winer of Scripting News and Martin Nisenholtz of the New York Times made a Long Bet about the authority of weblogs versus that of NY Times in Google:

In a Google search of five keywords or phrases representing the top five news stories of 2007, weblogs will rank higher than the New York Times' Web site.

I decided to see how well each side is doing by checking the results for the top news stories of 2005. Eight news stories were selected and an appropriate Google keyword search was chosen for each one of them. I went through the search results for each keyword and noted the positions of the top results from 1) "traditional" media, 2) citizen media, 3) blogs, and 4) nytimes.com. Finally, the scores were tallied and an "actual" winner (blogs vs. nytimes.com) and an "in-spirit" winner (any traditional media source vs. any citizen media source) were calculated. (For more on the methodology, definitions, and caveats, read the methodology section below.)

So how did the NY Times fare against blogs? Not very well. For eight top news stories of 2005, blogs were listed in Google search results before the Times six times, the Times only twice. The in-spirit winner was traditional media by a 6-2 score over citizen media. Here the specific results:

1) Hurricane Katrina hits New Orleans.
Search term: "hurricane katrina"

3. Top citizen media result (Wikipedia)
13. Top media result (CNN)
56. Top NY Times mention (NY Times).
61. Top blog result (Kaye's Hurricane Blog)

Winner (in spirit): Citizen media
Winner (actual): NY Times

2) Big changes in the US Supreme Court (Rhenquist dies, O'Conner retires, Roberts appointed Chief Justice, Harriet Miers rejected).
Search term: "harriet miers"

4. Top media result (Washington Post)
5. Top citizen media result (Wikipedia)
8. Top NY Times mention (NY Times)
11. Top blog result (TalkLeft)

Winner (in spirit): Media
Winner (actual): NY Times

3) Terrorists bomb London, killing 52.
Search term: "london bombing"

1. Top media result (CNN)
2. Top citizen media result (Wikipedia)
21. Top blog result Schneier on Security
No NY Times article appears in the first 100 results.

Winner (in spirit): Media
Winner (actual): Blogs

4) First elections in Iraq after Saddam.
Search term: "iraq election"

1. Top media result (BBC News)
6. Top blog result (Iraq elections newswire)
6. Top citizen media result (Iraq elections newswire)
14. Top NY Times mention (NY Times)

Winner (in spirit): Media
Winner (actual): Blogs

5) Terri Schiavo legal fight and death.
Search term: "terri schiavo"

2. Top blog result (Abstract Appeal)
2. Top citizen media result (Abstract Appeal)
4. Top media result (CNN)
65. Top NY Times mention (NY Times)

Winner (in spirit): Citizen media
Winner (actual): Blogs

6) Pope John Paul II dies and Cardinal Joseph Ratzinger appointed Pope Benedict XVI.
Search term: "pope john paul ii death"

1. Top media result (CNN)
3. Top citizen media result (Wikipedia)
58. Top blog result (The Pope Blog: Pope Benedict XVI)
No NY Times article appears in the first 100 results.

Winner (in spirit): Media
Winner (actual): Blogs

7) The Israeli withdrawal from the Gaza Strip.
Search term: "gaza withdrawal"

1. Top media result (Worldpress.org)
31. Top blog result (Simply Appalling)
31. Top citizen media result (Simply Appalling)
No NY Times article appears in the first 100 results.

Winner (in spirit): Media
Winner (actual): Blogs

8) The investigation into the Valerie Plame affair, Judith Miller, Scooter Libby indicted, etc..
Search term: "scooter libby indicted":

1. Top media result (CNN)
15. Top blog result (Seven Generational Ruminations)
15. Top citizen media result (Seven Generational Ruminations)
43. Top NY Times mention (NY Times)

Winner (in spirit): Media
Winner (actual): Blogs

And just for fun here's a search for "judith miller jail" (not included in the final tally):

1. Top media result (Washington Post)
3. Top blog result (Gawker)
3. Top citizen media result (Gawker)
No NY Times article appears in the first 100 results (even though there are several matching articles on the Times site).

In covering the jailing of their own reporter, the Times lagged in the Google results behind such informational juggernauts as Drinking Liberally, GOP Vixen, and Feral Scholar.

Winner (in spirit): Media
Winner (actual): Blogs

Here's the overall results, excluding the Judith Miller search:

Overall winner (in spirit): Media (beating citizen media 6-2).
Overall winner (actual): Blogs (beating the NY Times 6-2).

Some observations:

  • My feeling is that Mr. Nisenholtz will likely lose his bet come 2007. Even though the nytimes.com fares very well in getting linked to by the blogosphere, it does very poorly in Google. This isn't exactly surprising given that most NY Times articles disappear behind a paywall after a week and some of their content (TimesSelect) isn't even publicly accessible at all. Also, I didn't look too closely at the HTML markup of the NY Times, but it could also be that it's not as optimized for Google as well as that of some weblogs and other media outlets.
  • "www.nytimes.com" has a PageRank of 10/10, higher than that of "www.cnn.com" (9/10), yet stories from CNN consistently appeared higher in the search results than those from the Times. The Times clearly has overall authority according to Google, but when it comes to specific instances, it falls short. In some cases, a NY Times story didn't even appear in the first 100 search results for these keyword searches.
  • By 2007, it may be difficult to differentiate a blog from a traditional media source. All of the Gawker and Weblogs, Inc. sites are presented in a blog format and are referred to as blogs but otherwise how are they distinguishable from traditional media? Engadget paid to send 12 people to cover the CES technology conference, probably as many or more than the Times sent. The Sundance film festival was heavily covered by paid writers for both companies as well. In the spirit in which this bet was made, I'd have a hard time counting any of their sites as blogs. (And what about kottke.org? I get paid to write it. Am I still a member of the citizen media or have I crossed over?)
  • Choosing appropriate news stories and keywords for those stories was difficult in some cases. Katrina was a no-brainer, but was the Terri Schiavo story really one of the top eight news stories of 2005? Resolving the methodology for this bet in 2007 will be tricky. I wonder how the Long Bets Foundation will handle its determination of the victory.
  • Wikipedia does very well in Google results for topical search terms. Overall, traditional media still dominates (in first appearance as well as number of results), but blogs and Wikipedia do very well in some instances.
  • What do these results mean? Probably not a whole lot. Nisenholtz asserts that "[news] organizations like the Times can provide that far more consistently than private parties can" while Winer says that "in five years, the publishing world will have changed so thoroughly that informed people will look to amateurs they trust for the information they want". It's difficult to draw any conclusions on this matter based on these results. Contrary to what most people believe, PageRank has a bias, a point of view. That POV is based largely (but not entirely) on what people are linking to. As someone said in the discussion of this bet, this bet is about Google more than influence or reputation, so these results probably tell us more about how Google determines influence on a keyword basis rather than how readers of online informational sources value or rate those sources. Do web users prefer the news coverage of blogs to that of the NY Times? I don't think you can even come close to answering that question based on these results.

Methodology and caveats

The eight news stories were culled from various sources (Lexis-Nexis, Wikipedia, NY Times) and narrowed down to the top stories that would have been prominently covered in both the NY Times and blogs.

The keyword phrase for each of the eight stories was selected by the trial and error discovery of the shortest possible phrase that yielded targeted search results about the subject in question. In some cases, the keyword phrase chosen only returned results for a part of a larger news story. For instance, the phrase "pope john paul" was not specific enough to get targeted results, so "pope john paul ii death" was used, but that didn't give results about the larger story of his death, the conclave to select a new pope, and the selection of Cardinal Joseph Ratzinger as Pope Benedict XVI. In the case of "katrina", that single keyword was enough to produce hundreds of targeted search results for both Hurricane Katrina and its aftermath. Keyword phrases were not tinkered with to promote or demote particular types of search results (i.e. those for blogs or nytimes.com); they were only adjusted for the relevence of overall results.

The searches were all done on January 27, 2006 with Google's main search engine, not their news specific search.

Since the spirit of the bet deals with the influence of traditional media versus that of citizen-produced media, I tracked the top traditional media (labeled just "media" above) results and the top citizen media results in addition to blog and nytimes.com results. For the purposes of this exercise, relevent results were those that linked to pages that an interested reader would use as a source of information about a news story. For citizen media, this meant pages on Wikipedia, Flickr (in some cases), weblogs, message boards, wikis, etc. were fair game. For traditional media, this meant articles, special news packages, photo essays, videos, etc.

In differentiating between "media" & citizen media and also between relevent and non-relevent results, in only one instance did this matter. Harriet Miers's Blog!!!, a fictional satire written as if the author were Harriet Miers, was the third result for this keyword phrase, but since the blog was not a informational resource, I excluded it. In all other cases, it was pretty clear-cut.

Book author to her publishing company: your lawsuit is not helping me or my book

I got an email this morning from a kottke.org reader, Meghann Marco. She's an author and struggling to get her book out into the hands of people who might be interested in reading it. To that end, she asked her publisher, Simon & Schuster, to put her book up on Google Print so it could be found, and they refused. Now they're suing Google over Google Print, claiming copyright infringement. Meghann is not too happy with this development:

Kinda sucks for me, because not that many people know about my book and this might help them find out about it. I fail to see what the harm is in Google indexing a book and helping people find it. Anyone can read my book for free by going to the library anyway.

In case you guys haven't noticed, books don't have marketing like TV and Movies do. There are no commercials for books, this website isn't produced by my publisher. Books are driven by word of mouth. A book that doesn't get good word of mouth will fail and go out of print.

Personally, I hope that won't happen to my book, but there is a chance that it will. I think the majority of authors would benefit from something like Google Print.

She has also sent a letter of support to Google which includes this great anecdote:

Someone asked me recently, "Meghann, how can you say you don't mind people reading parts of your book for free? What if someone xeroxed your book and was handing it out for free on street corners?"

I replied, "Well, it seems to be working for Jesus."

And here's an excerpt of the email that Meghann sent me (edited very slightly):

I'm a book author. My publisher is suing Google Print and that bothers me. I'd asked for my book to be included, because gosh it's so hard to get people to read a book.

Getting people to read a book is like putting a cat in a box. Especially for someone like me, who was an intern when she got her book deal. It's not like I have money for groceries, let alone a publicist.

I feel like I'm yelling and no one is listening. Being an author can really suck sometimes. For all I know speaking up is going to get me blacklisted and no one will ever want to publish another one of my books again. I hope not though.

[My book is] called 'Field Guide to the Apocalypse' It's very funny and doesn't suck. I worked really hard on it. It would be nice if people read it before it went out of print.

As Tim O'Reilly, Eric Schmidt, and Google have argued, I think these lawsuits against Google are a stupid (and legally untenable) move on the part of the publishing industry. I know a fair number of kottke.org readers have published books...what's your take on the situation? Does Google Print (as well as Amazon "Search Inside the Book" feature) hurt or help you as an author? Do you want your publishing company suing Google on your behalf?

Investing is risky?

From a Washington Post article about google.org, Google's philanthropic effort:

Shareholder activists said Google's charitable commitment raises questions about whether this is an appropriate use of company cash or whether company founders Sergey Brin and Larry Page ought to make donations to their favorite causes personally. The foundation of Bill Gates, the founder and chairman of Microsoft Corp. and the nation's richest person according to Forbes, gave away more than a billion dollars last year to fight poverty, hunger and disease around the world. But Gates donates through a personal foundation, rather than through Microsoft itself.

"The board of directors should make it clear to the company's founders what should be personal and what should be corporate," said Patrick S. McGurn, special counsel to Institutional Shareholder Services Inc. "Google is spending shareholders' money, and it raises questions if there is not a valid corporate purpose."

Shareholder activists? You've got to be kidding me. You'd think that stock shareholders are a bunch of babies that need their noses wiped and hands held to go potty or something. If you don't want to support Google's philanthropic efforts and think that they're throwing your money away by doing so, there's an easy way to opt out: DON'T BUY GOOGLE STOCK. It's a free country and open market...vote with your money on what you think is a "valid corporate purpose". There are thousands of other companies to invest in that are doing other things, many of which operate exactly the same...nice and safe and by the book. The information on what these companies are doing with their shareholders' money is freely available...get informed about what you're buying. Given their P/E ratio, unique corporate approach, and incredible rate of growth, Google might just be the riskiest large-cap stock opportunity out there, but the potential upside (as well as the downside) is a lot greater than all of those companies playing it safe. As long as it's stated (and I believe Google certainly has made their views very clear), risk isn't something from which shareholders should be warned away.

Working offline

Back when I wrote about how a WebOS might work (basically XHTML/JS web apps that run on the desktop as well), I got a lot of responses along the lines of: with internet access becoming more ubiquitous (broadband, wifi, wireless broadband, WiMax, etc.), there will be less and less need for applications that don't need a connection to the network to function. When you can literally get a fast, cheap internet connection anywhere, you don't need a version of Gmail that works offline and so that's not going to drive the development of this WebOS thing you're talking about.

I've been thinking for several weeks about why I think that's wrong and I've come up with a couple ideas.

1. Fast, cheap internet everywhere? Hoo boy, wake me when that happens...you'll likely find me driving my hydrogen-powered hovercar with ESP to my paperless office.

2. For many people, the more you get used to having access to your applications/data/etc., the more important that access becomes. Let's say 98% of the applications you use are entirely on the web (with no offline capabilities) and you're online almost all the time wherever you go. Then the network winks out for 1/2 an hour. Or Salesforce.com is down for a couple hours. That last little inch is going to be painful. And no use telling me that sounds insane because I've seen the madness and fear in people's eyes while they clutch their Crackberries, furiously reading email mere minutes away from the office and the full-speed, full-screen experience.

3. The offline thing is a good way for companies to bootstrap the WebOS. I think most people have a sense that the apps they use in their browser are more alive, more social, more connected, even if they can't articulate that feeling. And whether it's true or not (Gmail isn't actually more "connected" than Outlook), companies can market the "aliveness" of their web apps (even when they run offline) versus the "deadness" of desktop apps.

These are the people in my (Web) neighborhood

In reaction to some ads of questionable value being placed on some of O'Reilly's sites (response from Tim O'Reilly), Greg Yardley has written a thoughtful piece on selling PageRank called I am not responsible for making Google better:

Google, Yahoo, Microsoft and the other big search engine companies aren't public utilities - they're money-making, for-profit enterprises. It's time to stop thinking of search engines as a common resource to be nurtured, and start thinking of them as just another business to compete with or cooperate with as best suits your individual needs.

I love the idea that after more than 10 years of serious corporate interest in the Web that it's still up to all of us and our individual decisions. The search engines in particular are based on our collective action; they watch and record the trails left as we scatter the Web with our thoughts, commerce, conversations, and connections.

Me? I tend to think I need Google to be as good a search engine as it can be and if I can help in some small way, I'm going to. As corny as it sounds, I tend to think of the sites I frequent as my neighborhood. If the barista at Starbucks is sick for a day, I'm not going to jump behind the counter and start making lattes, but if there's a bit of litter on the stoop of the restaurant on the corner, I might stop to pick it up. Or if I see some punk slipping a candy bar into his pocket at the deli, I may alert the owner because, well, why should I be paying for that guy's free candy bar every time I stop in for a soda?

Sure those small actions help those particular businesses, but they also benefit the neighborhood as a whole and, more importantly, the neighborhood residents. If I were the owner of a business like O'Reilly Media, I'd be concerned about making Google or Yahoo less useful because that would make it harder for my employees and customers to find what they're looking for (including, perhaps, O'Reilly products and services). As Greg said, the Web is still largely what we make of it, so why not make it a good Web?

Google attempting to patent RSS advertising?

John Battelle points to news of Google (the author is Nelson Minar) attempting to patent the idea of automating the incorporation of targetted ads into RSS files. Here's the application on the USPTO site. I've got a few questions and concerns:

Is this a joke?

Ok, bad first question since it seems unlikely that Nelson and Google would write up this application just to have a few laughs. So here's a better question: where's the prior art on this? The patent was filed on 12/31/2003. I floated the idea of embedding advertising into RSS ads in October 2002 and there was prior art then. But Google's patent application covers "targeted ads" in a "syndicated, e.g., RSS, presentation format in an automated manner". Curiously, I believe this is already covered by an older Google patent, filed in 12/2002:

The relevance of advertisements to a user's interests is improved. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages.

That's Google AdSense in a nutshell: inserting targeted ads into web documents in an automated manner. So what is it about RSS/Atom files that make them different than plain old web pages and hence not covered under the 2002 AdSense patent? Nothing. This vocabulary of "feeds" and "syndication" is still misleading. RSS/Atom files, especially as they are described in the 12/2003 patent application, are XML files that sit on a web server waiting for someone with a web browser to come along to read them, just like XHTML files:

So, people access documents written in a markup language that have been published on a Web server with a software application. If this seems familiar to you, it should. It's called Web browsing and has nothing to do with syndication. RSS readers and newsreaders are just specialized Web browsers...

The 12/2003 application tries to explain the difference between HTML pages and "syndicated content formats" thusly:

Syndicated content, unlike web pages which are normally stored in an HTML format, are often stored and presented in what may be described as a syndicated content format. Syndicated content formats are often XML (eXtended Markup Language) based and include structured representations of content such as news articles, search results, and web log entries. Syndicated content formats are primarily intended for providing syndicated information, e.g., news headlines, weblogs, etc. in a structured format such as a list of items, with another device, e.g., a user device, usually controlling the ultimate presentation format of the items in the list. This is in contrast to HTML which usually includes a fair amount of presentation and formatting information within an HTML document such as a web page.

That's a pretty weak explanation and sounds a lot like what a web browser (the "user device" that controls the presentation) does with XHTML files (XML-based files without a "fair amount of presentation and formatting information"). It sounds to me like Google already has this covered with their previous patent.

[Long aside: Does the prior art of embedding AdSense ads in XHTML files invalidate this patent? Patents are tricky because they don't cover ideas, they cover specific implementations of ideas. While the 12/2003 application states that "said syndicated format is an XML compliant format" it also specifies that "said syndicated format is a format for listing items corresponding to a channel, said received information including a listing of at least two items and including for each item, a title and a link". That is, the XML files they're talking about have to be RSS/Atom-ish in nature. This doesn't rule out XHTML files in theory, but it does rule out many of them in practice.

But the really tricky part with these software patents is that the implementations of ideas are written so broadly that they might as well be patents of the ideas themselves. If you look at it that way (the patent-holding companies certainly seem willing to litigate on that basis), Google has already embedded automated, targeted advertising into XML-based files. According to news.com, Google launched their AdSense service in June 2003. When the first AdSense advertisement was embedded in an XHTML file soon after that, well, there's your prior art on the very thing that Google attempted to patent 6 months later.]

NewsRank but not particularly new

I missed this April article in New Scientist about Google's plans to rank news stories according to quality and credibility of the sources:

Now Google, whose name has become synonymous with internet searching, plans to build a database that will compare the track record and credibility of all news sources around the world, and adjust the ranking of any search results accordingly.

The database will be built by continually monitoring the number of stories from all news sources, along with average story length, number with bylines, and number of the bureaux cited, along with how long they have been in business. Google's database will also keep track of the number of staff a news source employs, the volume of internet traffic to its website and the number of countries accessing the site.

Google will take all these parameters, weight them according to formulae it is constructing, and distil them down to create a single value. This number will then be used to rank the results of any news search.

The second paragraph of the story mentions that this system has been patented by Google, but I don't see how it's much different than what PageRank does or what Metacritic has been doing with film, game, and book reviews:

This overall score, or METASCORE, is a weighted average of the individual critic scores. Why a weighted average? When selecting our source publications, we noticed that some critics consistently write better (more detailed, more insightful, more articulate) reviews than others. In addition, some critics and/or publications typically have more prestige and weight in the industry than others. To reflect these factors, we have assigned weights to each publication (and, in the case of film, to individual critics as well), thus making some publications count more in the METASCORE calculations than others.

I wonder if these systems will eventually let their users tweak the credibility algorithms to their liking. For instance, it won't take long for conservatives to start complaining about the liberal bias of Google News. In the case of Metacritic, I'd like them to ignore Anthony Lane's rating when he writes about summer blockbusters and put greater emphasis on whatever Ebert has to say. In the meantime, I'm readying my patent applications for RecipeRank, PhotoRank, ModernFurnitureRank, SoftDrinkRank, and, oooh, PatentRank. I'm sure they're brilliantly unique enough to be recognized by the US Patent Office as new inventions.

Google Wallet

Word on the street (via waxy) is that Google is set to release a PayPal competitor called Google Wallet. A thread at Techdirt notes that Yahoo!, Microsoft, and eBay have all tried to launch similar services that met with little or no success in the face of competition with PayPal.

I doubt Google is focused on competing with PayPal, at least in the short term. This move, if true, makes a lot of sense for Google. They already have an internal payment system set up to collect and distribute AdSense revenues, a store selling t-shirts, bean bags, search hardware, they sell software, and they've indicated that with Google Video, people will be able to charge others to view videos uploaded to Google's servers (with Google taking a small cut). Taking the core of that internal payment system, it would probably be technologically trivial** for them to open it up for anyone to pay money to anyone else (instead of just individual --> Google or Google --> individual). The line above about their Google Video plans -- "people will be able to charge others to view videos uploaded to Google's servers (with Google taking a small cut)" -- already sounds a lot like what PayPal does. This is the Andre Torrez school of product development...build something that solves a problem you're having and it'll probably be useful to a bunch of other people if you let them use it too.

Plus it leverages their existing user base. If you've already got an AdSense account or are going to charge for your video through Google Video, you're already a GWallet user...and signing people up through their GMail/Orkut/Blogger accounts would probably be pretty easy as well. This move may also indicate that Google is planning to charge a wider range of people for products/services -- maybe a "pro" version of Gmail, a robust, commercial API to their search results, or even a music store? GWallet would be needed infrastructure for ramping up from paying relatively few AdSense users to (potentially) anyone who uses Google. It makes sense for them beyond trying to gain a foothold in the online payments space.

** Getting the banking stuff sorted out is another story though...but as PayPal has shown, if you can get that set up, there's plenty of revenue to be had.

How to clean up maple syrup

Since my post about the maple syrup spill, my inbox has been buzzing with a number of different techniques that people have sent in for cleaning up maple syrup. As a service to future Google searchers or those of you that may have just spilled maple syrup all over the place, here are several of those techniques:

  • if you spill syrup, or drop a raw egg, the trick is to pour a bunch of salt all over the spill (kosher or table, whatever you have), then leave it for a minute or so. the salt will soak up most of spill, keep it from spreading and make it much more manageable for cleanup.
  • pour a bag of flour onto it and walk away for a while. come back with the dustpan and a spatula, scoop up the non-liquid mixture, and all you have left is a small sticky patch. buff to a shine
  • Freeze it, and then you can break it off. Alternately (and easier, heh), you can take a wet dish cloth, put it on the syrup and use an iron on the dish towel (the maple effectively gets "sucked" right up).
  • before you grab the broom (uh, you didn't really sweep syrup into a dustpan, did you?) find two hard, flat items (i.e. a dough scraper and a spatula) and put them at opposite 45 degree angles to your catastrophe, sweeping together until they are flush. voila, syrup squeegee. also works as a party trick with two cardboard coasters when someone spills honey mustard dressing on the table.
  • Place a large piece of wax paper directly on top of the puddle of syrup so you get a good stick. With a spatula or paint scraper or whatever you've got handy, begin to flip the wax paper over by spooning the syrup onto it. Pretty soon you will have transfered all the syrup onto the wax paper and you can dispose of it quickly *and* easily. (You might even be able to use a paper grocery bag or newspaper for this.)
  • Liquid nitrogen, and a chisel. Quickness depends on wether you keep liquid nitrogen around. Easy depends on how brittle the floor is. Fun however is pretty much assured. Or at least as much fun as cleaning ever gets.
  • For future reference, pour some diet coke on it - the carbonation cuts right through and is a great cleaning aid (make sure you're getting the liquid carbed part and not the fizzy, useless head). In all seriousness, a pour from a can yields less fizz than from a bottle.
  • the easiest way to clean up syrup (or anything sticky, for that matter) is to freeze it. take out some ice cube trays, or drop a bunch of cubes into a metal container along the lines of an 8x8 cake pan, then cover the pan with a dish towel to insulate. come back in an hour, and the syrup, while not frozen, should be much more manageable.
  • Your mention of the sticky maple syrup kind of taking over reminded me of "Curious George gets a Medal" which I've been reading to kids here lately. Trying to pen a letter, George spills ink on the floor and in the process of trying to clean it up ends up with a room full of suds. Next time you find yourself in the situation be sure to borrow the pump from the farmer down the street (and of course you'll need a cow to help you pull it home).

Thanks to Sarah, Tim, Jeremy, Eric, Josh, Yi, Rachel, Samuel, and Jack for sending in their tips. Who knew that my readers knew so much about cleaning up spilled maple syrup?

Google Toolbar AutoLink

I'm a bit wary about throwing myself in the middle of the whole Google Toolbar AutoLink business (Dan Gillmor has a good summary and lots of trackbacks to opinions, pro and con), but I'm sort of dumbfounded that so many people are so vehemently against it...at least for the reasons being given. The three main points I've heard articulated by those opposed to the feature are:

1. Browsers and toolbars should not modify the content or layout of Web pages...they should render them only as stored on the Web server.

2. Microsoft tried to do this with Smart Tags in Windows XP and everyone hated it so why are we willing to give Google a pass with a similar feature?

3. Google can unfairly use their growing clout to exploit AutoLink users.

I'll address the second point first because it's sort of beside the point and not an argument at all. One of the big reasons why people were so upset about Smart Tags is that Smart Tags were on by default in early preview releases of IE. The browser was automatically rewriting every single page you loaded, adding links here and there. I agree that this sucks (although users may become used to things like this in the future and not think it's such a big deal), but AutoLink is not on by default. It's optional...you have to specifically push a button to make something happen.

But the main reason people seem to be up in arms about AutoLink is that Google is modifying the content and display of other people's content and that browsers and toolbars should not be allowed to do that. Aside from the first part of that statement being factually incorrect (more on that below), browsers and toolbars already modify other people's content and no one really complains about it. In fact, people love it:

  • Firefox, Safari, Google Toolbar, IE, and several other browsers/toolbars all give end users the option to block JavaScript popups, which typically contain ads. This very much goes against the intention of the content provider and is a clear example of software that modifies a site from how it was intended to be displayed. But users love it so browser/toolbar makers include the feature.
  • Browsers allow users to use custom stylesheets when browsing sites, turn off JavaScript on pages, and browse without viewing images or other multimedia files.
  • There are tons of bookmarklets and browser extensions that let people modify the page they're viewing in interesting ways (this one inserts links to Feedster on NY Times and WaPo article pages).
  • Since the early days of the web right on up to the present, browsers have purposely misrendered badly written HTML so that people could view the pages instead of getting junk or a blank page.

All of these features break the supposedly cardinal sin of "thou shalt not modify the content providers content from the way it was intended by them to be viewed" and I don't hear anyone complaining about it. The fact is, once a user downloads a copy of a content provider's web page from their server, the page becomes just that, a copy. As a user, I should be able to use whatever software is available to me to manipulate, modify, or otherwise remix that copy which I've downloaded for my own personal use. If I can, for my own personal use, photocopy magazine articles, rip my CDs to mp3, make backup copies of my DVDs, and scribble in the margins of books, surely I can do the same with copies of web pages I've downloaded.

Now, if you're against AutoLink because you think Google is becoming too big, they're evil, they're abusing their power, or they bought another blog company instead of yours, then that's fine. Just be up front about why you're upset. It's a trust issue. Do you trust Google's software to do what it says its going to do and not take advantage of you? If the answer is no, don't use it. But if you're saying that Google should not provide this feature at all and that consenting adults in the privacy of their own homes can't choose to use the feature themselves, I don't think that's a good deal for the users. As content providers, let's not try and reach into our readers' computers and dictate what they can or can't do with the copies of our content that they've downloaded for their personal use...let's leave that sort of wishful thinking to the nutballs in Hollywood.

Google's switch to answers.com was driven by user experience

Last week, I wondered aloud whether Google's switch from dictionary.com to answers.com for their "definition" links was driven by concern for their users or was just a business deal:

The cynic in me feels like money had to have changed hands in order for this to have happened (maybe Google is an investor in GuruNet, maybe GuruNet paid for that placement), but the optimist in me says that Google is still a weird little company where the members of project teams can stumble across a better resource that will make their users happier and more productive and implement it on the live site quickly, even if the company that provides that resource could be considered a competitor.

Marissa Meyer, Product Manager for Google, was kind enough to respond to my query about it:

This decision was driven off of concern for our user experience. We are not paying answers.com for this service nor are they paying us. They were willing to work with us and design a website that we felt represented an improvement for our users over what was offered on dictionary.com (no pop-ups, dense information presentation).

That a $50 billion American company is so focused on the experience presented to its users, well, it's pretty impressive.

They've got answers

Just noticed the other day that Google switched their "definition" link from dictionary.com to answers.com. When I first saw it, I was irritated about the switch, but after I realized that answers.com is a better resource, my irritation lasted about two seconds. On one page, they list not only the definition of a word, but also the thesaurus entry, the Wikipedia entry (if applicable), translations into more than 10 languages (including Greek, Arabic, Chinese, and Hebrew), and some related topics. They've even got pages for terms that dictionary.com doesn't have, like "snoop dogg". And for a word like "reaganomics", answers.com brings in info from Investopedia, a useful-looking financial information site.

Walter Mossberg recently profiled answers.com in the WSJ. I wonder if the folks from Google read this (or had seen answers.com previously...more likely) and thought, hey, we should be linking to these guys instead of dictionary.com. The cynic in me feels like money had to have changed hands in order for this to have happened (maybe Google is an investor in GuruNet, maybe GuruNet paid for that placement), but the optimist in me says that Google is still a weird little company where the members of project teams can stumble across a better resource that will make their users happier and more productive and implement it on the live site quickly, even if the company that provides that resource could be considered a competitor. Would love to know which it is.

Update: it's the user experience, by a landslide.

60 Minutes wrong again!

This is a developing scandal folks...it threatens to bring down not just a bit player like Dan Rather, but all of network television. On the Jan 2, 2005 episode of 60 Minutes, internet search pundit John Battelle commented on Google employees not taking advantage of their newfound wealth because it's against Google's ethic:

If anybody got a Porsche or a Ferrari right now at Google, they’d probably be drummed out of the company

My sources deep inside Google (who shall, given recent legal jeopardy, remain anonymous) tell me that at least one employee has purchased a Porsche with the IPO monies and has not, repeat, has *not* been drummed, tubaed, celloed, or otherwise musically extricated from the company. If true, who knows what this could mean for the future of journalism as we know it!! The implications on podcasting alone are unfathomable at this time. More as it develops...

Update: Is this really Ben Affleck's Bentley in a Google parking space or is it some IPO bling? Who knows how deep what the press has dubbed "Googlegate" will go before we get to the truth?

Update #2: The car pictured in the photograph above may be a Rolls Royce instead of a Bentley. It's hard to sort through all the misinformation here...it's staggering.

Update #3: Confirmed: the car is a Rolls Royce, not a Bentley. But forget the car, I've heard rumors that both RR and Bentley are owned/manufactured by German car companies (VW and BMW). I'm working to track these rumors down, but if true, Germany's heavy investment in Google would be a bombshell.

Update #4: Matt, prominent media pirate, has video of the 60 Minutes episode in question. You can see their lies for yourself. No official denial as of yet from the German government on their outfitting of all Google employees with luxury motor coaches.

Google Desktop

(Rambling ahead...) Google Desktop beta. Early impressions anyone? I think it's pretty damn cool...a baby step towards the GooOS. Do a regular Google search and GD results are inlined right at the top (see screenshots for how it all works). How are they doing that technically?

I've cranked up the size of my browser cache...now that GD can index every page I've ever viewed in my browser, can I afford to throw any of it away? This one-ups what A9 is doing in caching visited sites and searching past search results.

Could this be Google's portal play? If they've got info on all the files on my computer, why not display my latest calendar items, emails, online buddies, etc. right on Google's home page?

But then there's the privacy issues. Is Google using information from my local drive to improve my search results? Should they? "Mr. Kottke, I see you've mentioned 'President Bush' in a recent email. Here are some Google News stories on that topic." Useful, but well, you know.

A co-worker wants to put Google Desktop on a Web server and use that as a search engine for a Web site. Not sure if that would work, but it's an interesting idea. I'm sure some smart hacker will soon figure out how to expose his/her GD search results to the outside world.

More evidence of a Google browser

Following up on last month's speculation on Google building their own Web browser:

Last summer, Anil Dash suggested that it would be a good move for Google to develop a Google browser based on Mozilla. Give that kid a gold star because it looks more than plausible. Mozilla Developer Day 2004 was recently held at the Google Campus. Google is investing heavily in JavaScript-powered desktop-like web apps like Gmail and Blogger (the posting inferface is now WYSIWYG). Google could use their JavaScript expertise (in the form of Gmail ubercoder Chris Wetherell) to build Mozilla applications. Built-in blogging tools. Built-in Gmail tools. Built-in search tools. A search pane that watches what you're browsing and suggests related pages and search queries or watches what you're blogging and suggests related pages, news items, or emails you've written. Google Toolbar++. You get the idea.

On April 26, 2004, Google registered gbrowser.com. Here's the relevent bit of the WHOIS for gbrowser.com:

Registrant:
Google Inc.
(DOM-1278108)
1600 Amphitheatre Parkway Mountain View
CA
94043 US

Domain Name: gbrowser.com

Created on..............: 2004-Apr-26.
Expires on..............: 2006-Apr-26.
Record last updated on..: 2004-Apr-26 16:46:39.

Thanks to Dave for the tip. Additionally, this NY Post article notes that Google is hiring folks formerly of Microsoft's IE team as well as other people that would be good bets to work on a browser.

Update: There was a bug in Mozilla's bug tracking system that was closed because "this is a duplicate of a private bug about working with Google. So closing this one." More info at Blogzilla. Thx, Phil.

The Google Browser

Last summer, Anil Dash suggested that it would be a good move for Google to develop a Google browser based on Mozilla. Give that kid a gold star because it looks more than plausible. Mozilla Developer Day 2004 was recently held at the Google Campus. Google is investing heavily in JavaScript-powered desktop-like web apps like Gmail and Blogger (the posting inferface is now WYSIWYG). Google could use their JavaScript expertise (in the form of Gmail ubercoder Chris Wetherell) to build Mozilla applications. Built-in blogging tools. Built-in Gmail tools. Built-in search tools. A search pane that watches what you're browsing and suggests related pages and search queries or watches what you're blogging and suggests related pages, news items, or emails you've written. Google Toolbar++. You get the idea.

Mozilla is currently getting some good press due to Microsoft's continuing troubles with their browser and the uptick in usage compared to IE is encouraging. But it's nothing compared to what could happen if Google decides to release a Mozilla-based browser. A Google Browser would give the Mozilla platform instant credibility and would be a big hit. The peerless Google brand & reputation and their huge reach are the keys here. Mom and Dad know about Google...if Google offered a browser that was as powerful and easy to use as their search engine and didn't scum up their system, they'd download it. IT departments wanting to switch away from IE would have some formidable firepower when pitching to upper management..."Mozilla? What? Oh, it's Google? Go for it!" Get good reasons in front of enough Google users and millions would switch from IE.

A Google Browser is a no-brainer for them and they have to be thinking about it. It's been obvious for awhile now that Google isn't a search company, nor are they an advertising company, despite what the experts have to say. Sorry to sound like a broken record, but I'm convinced they're building an operating system (of sorts) from which they will dispense all sorts of applications and data (as well as allow other people/companies to do the same in this fashion). What we could see is the next generation of office suite. Not Word, Excel, Powerpoint, and Outlook of Microsoft's Office or iPhoto, iDVD, iMovie, iTunes, and Garageband of Apple's iLife suite, but Google search, Gmail, Google Browser, Blogger, and perhaps even GIM. It'll be interesting to watch whether this happens or not.

Update: John Rhodes floated the idea of a Google Browser back in September 2001.

Google IPO price per share

I've seen several references to the price per share of Google stock being priced too high ($108-135) for people to buy it, most notably in this article by Dan Gillmor:

At least, I wouldn't consider [investing in Google] at the nosebleed-altitude prices that Google suggested to the world Monday. This is starting to feel frothy.

This is silly. If you buy 74 shares of Google stock for $10,000 and 352 shares of Microsoft stock for $10,000, your stake in each company is worth $10,000 even though you've got five times more MSFT shares. A more relevant question to ask is whether or not a company is worth its overall valuation...or better yet, whether it will be worth such and such a value in X number of years. There's a pyschological factor involved here. People think they're somehow getting more with a higher share price. Reminds me of stock splits...if a stock splits 2-for-1, you get twice the number of shares (woo hoo, I'm rich!) but at half the price per share (hey, wait a minute...).

The state of the report

Wired News has an article on the guerilla conversion of the 9/11 Commission Report (published by the government online only in PDF format) into various formats, including HTML, text, audio, and a more accessible PDF format. The HTML version I did of the executive summary is mentioned. On a whim yesterday afternoon, I googled for several variations of "9/11 commission report" and my site came up in the top 5-6 results for most of them (for instance). Usually a high ranking on a such a hot topic means lots of traffic from Google, but when I checked my stats this morning, there was almost nothing coming from Google for any of those search terms. So even though the book version is a bestseller, few seem to be looking for the online version.

Also, it looks as though Dave Winer might handily win his bet with Martin Nisenholtz of the NY Times...both kottke.org and Boing Boing rank above CNN, MSNBC, Time, and the Times in the search results for "9/11 commission report".

Gmail and fuzzy whitelisting

Many feel that Google's invite-only distribution of Gmail accounts is a shrewd marketing move designed to create artificial demand. More likely, they're just rolling the service out slowly; it is still in beta testing after all and 40 million people at once would probably have been a nightmare for them to deal with.

But wouldn't it be fun if the real reason that Google is distributing accounts the way they are is to build whitelisting into their system? With the Gmail economy that's sprung up to facilitate the trafficking of invites (now somewhat curtailed), not all Gmail users are known by the people who invited them, but certainly some fuzzy whitelisting could be utilized to improve Gmail's spam filtering.

NY Times duped by Google bomb

When I read this recap of Google's amended S-1 in the NY Times yesterday, the last two paragraphs struck me as a bit strange:

Separately, there was an indication yesterday that Google's vaunted corporate culture may be under stress as a result of competition and the stock offering. As of yesterday afternoon, typing the words "out of touch management" into Google caused the search engine to list as its first result a page describing the company's top management.

A person close to the company said that Google employees had engaged in the practice of "Google bombing." A Google bomb is an attempt by a group of people to cause a particular Web page to become the first result for a search phrase. The Google spokeswoman declined to comment.

The "out of touch management" search indeed works as stated, but how they got from that to "Google's vaunted corporate culture may be under stress as a result of competition and the stock offering" left me baffled. I knew that I'd seen this particular Google bomb before, but couldn't recall where. Chris Sherman, in a thread about the article on John Battelle's site notes that the Google bomb was initiated by Daniel Brandt back in March. It would seem that the "person close to the company" was not as close as the Times thought they were. If this were a sensationalistic news site, I might wonder why the New York Times is "press bombing" Google. But that would be silly, like tacking some ill-conceived speculation onto the end of a story about boring financial statements to juice it up a little. It's a forgivable error, but one that needs correcting. Paging Daniel Okrent.

File format searching

Andy notes that Google is now indexing Flash files. Search for "skip intro" to try it out. Upon seeing this, the gray-bearded conspiracy theorist in me wondered if Google was unfairly promoting the Flash format over Adobe's competing SVG format in order to crush Adobe into dust. I needn't have worried...you can search Google for SVG files just fine (because they're text files).

Of course, you can search Google for all kinds of filetypes, text and otherwise: .rdf (RSS, FOAF, etc.), .xml (RSS, Atom, etc.), .torrent (BitTorrent), .aspx (.NET), .php (PHP), .csv (comma-delimited data file), .vcf (vCard...look, global address book made easy!), etc.

Google, Yahoo fight it out for ping and porn exit page supremacy! HOTTT!!

Rael recently noted a shift among Unix folk in pinging google.com instead of yahoo.com when checking for network availability:

It's not, mind you, that yahoo.com has become unreachable or unstable; just the Google has so come to represent the very essence of stability and reachability that it's made its way to our every ping.

While Google rules the ping space, Yahoo is still tops of the Web in terms of porn alternatives with a solid #1 ranking in a search for "exit" and a #1 ranking to Google's #2 for "leave" (that is, when porn sites give you a chance to "exit" or "leave", they're still linking that word to Yahoo more often than Google).

A9

A9, a new search service from Amazon, has launched in beta. Amazon chose to break the story through John Battelle so that, in his words, "[the news would] move from the blogosphere out, as opposed the WSJ in". Battelle's got some good thoughts on it in his post. They're using Google's search results, display book search results alongside, have a search toolbar, keeps track of your past search results and what you've visited already, and more. Toolbar includes a diary feature with which you can annotate any Web page you visit (a la E-Quill). My first thought: how about some contrast? The cream background and gray text ain't working for me.

A9 has a generic version of their search service that doesn't track you via cookies or use your data in their analysis.

Steven has whipped up a Firefox search plugin for A9.

Erik Benson, an Amazon employee, has some thoughts on A9.

John Battelle's interview with Udi Manber, head honcho at A9, is now up at Business 2.0 and he has more thoughts on A9 on his site, including:

As an aside, I have to say the idea of a complete, lifetime record of a person's searches and browsing history - which by the way that person can edit - is an extraordinary concept. It's taking the idea of the database of intentions to the utmost granular level of history - the individual. What, I wonder, happens to a person's search history when they die? Do they have a right to own it? Does it get passed down as a keepsake to his or her children?

GooOS, the Google Operating System

Great post about what Google is up to by Rich Skrenta. He argues that Google is building a huge computer with a custom operating system that everyone on earth can have an account on. His last few paragraphs are so much more perceptive than anything that's been written about Google by anyone; Skrenta nails the company exactly:

Google is a company that has built a single very large, custom computer. It's running their own cluster operating system. They make their big computer even bigger and faster each month, while lowering the cost of CPU cycles. It's looking more like a general purpose platform than a cluster optimized for a single application.

While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.

This computer is running the world's top search engine, a social networking service, a shopping price comparison engine, a new email service, and a local search/yellow pages engine. What will they do next with the world's biggest computer and most advanced operating system?

I was thrilled reading this today because I had been thinking along the same lines as I wondered about Gmail (and the 1GB of storage in particular)...and that Skrenta had made the argument so well. This weekend, as I hacked through a bunch of XHTML and CSS for an upcoming site redesign, I jotted down a few notes for a follow-up on a post I made over a year ago called Google is not a search company. I was going to call it "GooOS, the Google Operating System".

My notes contained two of Skrenta's main points: the importance of the supercomputer and the scores of Ph.Ds being Google's main assets. A third key asset for Google is the data that they're storing on those 100,000 computers. As I said in that post:

Google's money won't be made with search...that's small peanuts compared to selling access to the world's biggest, best, and most cleverly-utilized map of the web.

So. They have this huge map of the Web and are aware of how people move around in the virtual space it represents. They have the perfect place to store this map (one of the world's largest computers that's all but incapable of crashing). And they are clever at reading this map. Google knows what people write about, what they search for, what they shop for, they know who wants to advertise and how effective those advertisements are, and they're about to know how we communicate with friends and loved ones. What can they do with all that? Just about anything that collection of Ph.Ds can dream up.

Tim O'Reilly has talked about various bits from the Web morphing into "the emergent Internet operating system"; the small pieces loosely joining, if you will. Google seems to be heading there already, all by themselves. By building and then joining a bunch of the small pieces by themselves, Google can take full advantage of the economies of scale and avoid the difficulties of interop.

Google isn't worried about Yahoo! or Microsoft's search efforts...although the media's focus on that is probably to their advantage. Their real target is Windows. Who needs Windows when anyone can have free unlimited access to the world's fastest computer running the smartest operating system? Mobile devices don't need big, bloated OSes...they'll be perfect platforms for accessing the GooOS. Using Gnome and Linux as a starting point, Google should design an OS for desktop computers that's modified to use the GooOS and sell it right alongside Windows ($200) at CompUSA for $10/apiece (available free online of course). Google Office (Goffice?) will be built in, with all your data stored locally, backed up remotely, and available to whomever it needs to be (SubEthaEdit-style collaboration on Word/Excel/PowerPoint-esque documents is only the beginning). Email, shopping, games, music, news, personal publishing, etc.; all the stuff that people use their computers for, it's all there.

Even though everyone's down on Google these days, they remain the most interesting company in the world and I'm optimistic about their potential and success (while also apprehensive about the prospect of using Google for absolutely everything someday...I'll be cursing the Google monopoly in 5 years time). If they stay on target with their plans to leverage their three core assets (which, if Gmail is any indication, they will), I predict Google will be the biggest and most important company in the world in 5-8 years.

Battle search engine

Intrigued by a stat that John Battelle pulled out of a Wired News story on search, that "the number of unique visitors to Yahoo Search trailed Google by a mere 10 percent", I checked my search referers to kottke.org for December 2003 and found a somewhat different story:

Google 60%
Yahoo 22%
AOL 14%
MSN 3%
Earthlink 0.5%

Now, inferring the market share of a search engine from the referers is tricky because you can't account for algorithm and display differences** (that is, Google may just love my site 3X more than Yahoo! does), so, you know, grain of salt and all that.

** Yahoo!, AOL, and Earthlink search are all currently powered by Google (making their effective search market share 97%), although they may determine and display the results in different ways.

Google Print

Looks like Google branching out into searching more than just web sites. The Google Print FAQ says they're experimenting with "publications" (books? magazines?):

Google's mission is to provide access to all the world's information and make it universally useful and accessible. It turns out that not all the world's information is already on the Internet, so Google has been experimenting with a number of publishers to test their content online. During this trial, publishers' content is hosted by Google and is ranked in our search results according to the same technology we use to evaluate websites.

Google Print isn't referenced anywhere else on their web site so it's unclear as to whether it's a planned beta, an ongoing effort, or already over, but it sounds like an effort to counter Amazon's full-text book search efforts.

Update: Reader Xavier writes that Google Print is still working. A search for "1,000 knock knock jokes for kids" (with the results restricted to the print.google.com domain) yields this page for the book. A search for a common word like "the" reveals that around 8000 books are available, including Tolkien's Fellowship of the Ring, David Foster Wallace's Infinite Jest, Crime and Punishment, and Kurzweil's The Age of Spiritual Machines.

Changed terms and conditions for Google AdSense

In response to some pressure from their customers and potential customers, Google has changed the terms and conditions for their AdSense program. I only took a quick peek at it, but it looks like they've dialed back some of the more draconian provisions. Nice to see Google trying to do the right thing here.

German interview on Google

I did an interview about Google for netzeitung.de, a German news web site. If you don't read German, I've included an English version of what I sent them:

Q: Mr. Kottke, how far away is Google from "being evil" in general in your opinion?

Google is doing better in the corporate morality department than a lot of other companies. From all accounts, their leadership wants to not be evil, they treat their employees well, they take great pride in the usefulness and relevancy of the results of their free search service. The terms of service for their AdSense program is definitely a step in the wrong direction, like they're letting marketing and legal determine their approach to business instead of the other way around. But Google is a long way from a Verisign/De Beers/Enron level of evil.

Q: But that was Google's rule number one, wasnt it?

I think every corporation's real #1 rule is "make money". That Google wants to make not being evil an equally important priority is commendable.

Q: How effective is Google still, given the springing up of Google Spam (lots of doorway pages leading to sponsored links) and the "noise" coming from weblogs etc.?

Google's search results are at least as relevant as they have always been, if not moreso. I can almost always find what I'm looking for in the first 10 results or so. I think part of the problem is perception. The perception is that their results should continue to improve as they refine their search methods and algorithms, but there's an inherent limitation in their approach that limits the maximum possible utility. There's only so much information about how pages relate to each other that you can glean from scraping web pages, and if Google is close to reaching that limit, any changes they make will only result in small changes in usefulness.

Q: Can Google do anything about this?

Perhaps they might want to start grouping web pages and sites into groups and analyzing the sites in each group in a unique way to improve the overall database. Weblogs are a good example of a group that could be analyzed differently. Weblogs consist of separate posts, which should be treated individually to get the best possible data from them. Weblog posts often contain more metadata than a typical web page, things like date and time of publication, categories, backlinks, etc. Google can use that post-level metadata to get better information about the sites that weblogs point to without having to pump the weblogs up in the overall rankings -- as many people have complained is not so good. Many weblogs also have RSS feeds with structured metadata that could be analyzed to improve general search results.

Q: Does Google need more competition?

A little competition for Google would be a good idea. Microsoft and Yahoo have both announced efforts to improve their search engines, but I don't see them developing anything to threaten Google's search.

Q: What will the Google of the future look like?

Given that the look of their site hasn't changed significantly since the beta version, I wouldn't look for it to change much in the near future. The biggest change will probably be more personalized search results where my results for a given search would be different than yours based upon our usage of the site.

[Hopefully that all makes more sense in German.]

The first rule of Google AdSense is, don't talk about Google AdSense

Have you noticed that Google is acting more and more like a stupid marketing/advertising company lately? It's one of the side effects of not really being a search engine company and seems to fly in the face of Sergey Brin's Google rule #1: "Don't be evil".

According to this post on Russell Beattie's site, Google recently changed their Terms and Conditions to prohibit criticism of their AdSense "service" terms and conditions on participating sites. Yuck. This move follows Russell's analysis of the AdSense T&C as a result of Erik Thauvin's removal from the program.

Since when is Google providing a service by paying people for advertising placed on their sites? This seems backwards; people are providing a service by placing the Google's ads on their sites. Google has every right to place whatever limits they wish on people who use their "service", but terminating said service without recourse when money is potentially owed by Google *and then* not allowing any site using Google AdSense (which may eventually include media sites like Salon, NY Times, MetaFilter, Slashdot, and even kottke.org) to comment on the Terms and Conditions that brought about the termination is just plain bad (evil?) and should give serious pause to anyone considering using any Google service.

You Google employees out there in weblog land, take a look at these links and see if it's worth taking this issue to someone internally who can do something about it. I might run into Larry Page at a retreat next weekend...we'll see what he thinks about it.

Update: Lest you think I'm aimlessly Google-bashing here, Cory Doctorow's comments on this matter sum up my feelings very well:

But that doesn't mean that they should get a free ride. Google wants to be a company that makes money wihtout being evil, and I support that goal! Being not-evil is good, and so's making some dough. But part of being not-evil is that you have to incur liability over and above that which your counsel recommends as the safest path -- just as a shop-owner can't reasonably ask all her customers to submit to a strip-search to contain shoplifting liability, Google shouldn't ask all its users to submit to an unreasonable restriction on their speech in order to contain the spread of negative information about its service.

Derek Powazek got the boot from AdSense for "inappropriate clicks" as did Kathy Shaidle. Kathy writes:

When I complained [about the ads that were showing up on the site], they explained: my blog, which deals with religion, politics and other non-dinner-table topics, was 'potentially negative'. I asked (on the blog) if there was gonna be a 0th Amendment drawn up to protect 'potentially negative' speech.

We back and forth'd a bit, my readers complained to them on my behalf, but Google wanted me to go through my archives, delete everything I'd ever said about them, good and bad, then republish. You can guess my response.

Free as in stagnation

According to news.com, Google is discontinuing Blogger Pro and folding the Pro features back into their free version of the software:

Google-owned Web log-creation site Blogger is eliminating its paid version and folding premium functions into its free service, bucking a trend toward making people pay for Web site extras.

The creation of Blogger Pro, which cost subscribers a yearly fee of $35, came about as a result of financial necessity, Blogger co-founder Evan Williams wrote in an e-mail to subscribers. Now that Google owns the service, that need has passed.

It's a good move...Pro never offered significant improvement over the free version and the proliferation of Blogger's various options (Blogger, Blogger Pro, Blog*Spot, ad-free Blog*Spot, etc.) was confusing.

But as I mentioned back in May, it makes me nervous when a big company releases for free software for which other smaller companies are charging. Just as Microsoft buried Netscape with a free browser (resulting in stagnation in overall browser development), Google could give away blogging tools and services (to what end?), make it difficult for Six Apart, UserLand, etc. to sell their products & services, and in two years time, we've got a single dominant blogging platform and innovation in blogging software goes to zero. Fortunately, the general excellence and feature-richness of TypePad and Movable Type in particular and Blogger's continuing uptime and support problems will probably override any advantage Blogger has in price.

Fun with the Google calculator

Instead of replying to my endless queue of unanswered email, I spent some time last night playing with Google's newest toy, the Google Calculator. Maybe if people would email back solutions to arithmetic problems included in my email replies to them I would more readily respond to my backlog. But I digress.

After verifying that 2+2=4 (contrary to popular belief), I tried to figure out the largest difference between the smallest and largest units of measurement on a given scale, finally ending up with ~3.08 x 10^26 angstroms in a parsec (26 orders of magnitude difference). If you delve into the world of obscure metric prefixes, you can get up to 64 orders of magnitude difference....there are ~3.08 x 10^64 yoctometers in a yottaparsec. If you want to get really ridiculous, you can find out how many yoctometers there are in one vigintillion parsecs (~3.08 x 10^103 if you're curious).

That got me thinking...what's the limit of the Google Calculator's computational ability? 170! (170! = 1*2*3*4* ... *168*169*170) is equal to ~7.26 x 10^306, but 171! doesn't work. 2^1023 = ~8.99 x 10^307, but 2^1024 doesn't work. After some trial and error, the upper limit of the calculator is ~1.797 × 10^308...or basically anything less than 2^1024. My binary math is a little rusty, but that limit seems to correspond to 32-bit double precision real arithmetic. Which makes sense, but it would have been more fun if the limit would have been a googol (1.0 x 10^100). (Regarding other large numbers, neither googolplex nor infinity return calculator results.)

In addition to playing with big numbers, the calculator can help you finally figure out the number of drams in a pennyweight (~0.878 drams/pennyweight), rods in a fathom (~0.364 rods/fathom), or the speed of light in knots (582,749,918 knots)...but unfortunately not the mileage of your automobile in rods/hogshead.

Andy's got some more calculator fun going.

Google and the Fabulous Googlettes

This Craigslist job posting (via Anil) describes a new initiative within Google called Googlettes:

What is a Googlette? It's a new business inside of Google that is just getting started – the start-up within the start-up. We're looking for an experienced, entrepreneurial manager capable of offering direction to a team of PMs working on a wide array of Googlettes. You will define Google's innovation engine and grow the leaders of our next generation of businesses.

From the description, it looks like Google is building a little Skunkworks to generate business ideas and leaders internally instead of relying so heavily on outside hires and ideas. Which is a fine idea.

But when I first read the description, I thought they might be doing something else that is potentially more interesting. Instead of generating ideas and people for internal use, what if they're incubating start-ups to spin off into companies of their own? Fast forward five years and instead of being a big huge company, Google is a big huge company at the center of a network of 10-20 large to medium-sized companies with similar goals, values, and business practices. Most of these spin-offs would be engaged in businesses similiar (and probably complementary) to each other and the Google Mother Ship, some of them maybe even directly competing with each other.

With the right balance of mutual effort and competition, the Google collective would be a formidable adversary for its competition -- a team of companies against single companies -- and would at the same time create an open business environment (say, the opposite of the current business environment in film, music, television, and radio) where competition creates more opportunities, value for customers, jobs, business, and innovation for everyone in that environment.

Again, I don't think that's Google's plan, but maybe it should be.

Topics on which Google thinks I am an expert

Matrix Reloaded (#14)
Matrix Reloaded discussion (#1)
addicting games (#2)
Christopher Guest (#7)
The Cognitive Style of Powerpoint (#2)
Gaetan Dugas (#1)
The Incredibles (#6)
September 11th photos (#3)
earthquake in Japan (#1)
Tom Hanks filmography (#2)
Calvin Klein dinnerware (#1)
NYC subway (#5)