Latest InsideMicrosoft Posts: InsideGoogle: Google Suggests Guts Disassembled - Part 2 .comment-link {margin-left:.6em;}
Friday, December 24, 2004
Google Suggests Guts Disassembled - Part 2
Slashdot introduces us to this analysis of Google Suggest, which goes even deeper than previous dissections of the Google Suggest engine. Some of the interesting discoveries:The most startling thing is that Google Suggest is actually based more on searches than results. To explain: Google Suggest returns results that are not in Google's index, or for terms that Google can never get to, because it indexes searches made as well as searches found. What does this mean? If you have typed a UPS tracking number into Google (something typical, because Google has searches for tracking numbers built it), it can find its way into Google Suggest. Just go there and type in "1ze" and watch the numbers pop up (all from packages delivered in the last six weeks). Does this mean credit card numbers could be in there as well? Less likely, but possible. Ironically, if you've ever searched for your credit card number to make sure it wasn't publicly available, you may have inadverantly added it to Google Suggest. Oy.

Related posts:
Google Suggest - 12/10
Google Suggests Goooooooooooooooogle - 12/10
Google Suggest Tools - 12/11
The Google Suggest Complete My Sentence Game - 12/15
Google Suggests Guts Disassembled - 12/18
Google Suggest Poetry Generator - 12/20

Google Suggest aside, I remember that a while ago, probably in /. as well, someone pointed out that if you want to search for credit card numbers, you can try searches like 1000000000000000..9999999999999999. This will get you all the pages that contain the numbers in between - that is, 16-digit numbers.
Try entering such ranges in Google Suggest, and you'll see that other people have tried them as well.

That said, I hope Google will censor Suggest a bit, to avoid what you just mentioned. Until they manage to make the censoring algorithm smart enough, they can just remove searches with number tokens in them.
Well, since Google already censors the results for dirty words, it couldn't be too hard to cut out all 16 digit numbers and queries starting with 1ze.
Post a Comment

Links to this post:

Create a Link

<< Home

Powered by Blogger

Who Reads InsideGoogle?

The Seattle Times

Evan Williams

Most Popular Posts
A Look At Google's Secret Instant Messaging Product: Hello

New Gmail Features Include An Atom Feed

An Interview With Google's Marissa Mayer at Digital Life

Google And Microsoft: Neighbors