Not your father's PageRank  FEB 26 2010

Steven Levy on how Google's search algorithm has changed over the years.

Take, for instance, the way Google's engine learns which words are synonyms. "We discovered a nifty thing very early on," Singhal says. "People change words in their queries. So someone would say, 'pictures of dogs,' and then they'd say, 'pictures of puppies.' So that told us that maybe 'dogs' and 'puppies' were interchangeable. We also learned that when you boil water, it's hot water. We were relearning semantics from humans, and that was a great advance."

But there were obstacles. Google's synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein's theories about how words are defined by context. As Google crawled and archived billions of documents and Web pages, it analyzed what words were close to each other. "Hot dog" would be found in searches that also contained "bread" and "mustard" and "baseball games" -- not poached pooches. That helped the algorithm understand what "hot dog" -- and millions of other terms -- meant. "Today, if you type 'Gandhi bio,' we know that bio means biography," Singhal says. "And if you type 'bio warfare,' it means biological."

Or in simpler terms, here's a snippet of a conversation that Google might have with itself:

A rock is a rock. It's also a stone, and it could be a boulder. Spell it "rokc" and it's still a rock. But put "little" in front of it and it's the capital of Arkansas. Which is not an ark. Unless Noah is around.

Read more posts on kottke.org about:
Google   PageRank   search   Steven Levy

this is kottke.org

   Front page
   About + contact
   Site archives

You can follow kottke.org on Twitter, Facebook, Tumblr, Feedly, or RSS.

Ad from The Deck

We Work Remotely

 

Enginehosting

Hosting provided EngineHosting