The Montoya Herald — ChristianMontoya.com
One of the big problems that exists with search today is the lack of disambiguation — the process of identifying a search term that has multiple possible meanings and determining which meaning a user is referring to. Let me give you an example: if I type "tex" into the Google, Yahoo!, or MSN search box, I could be referring to "a unit of measure for the linear mass density of fibers," "a typesetting system created by Donald Knuth," "a red and silver robot appearing in some animated THX trailers," or a number of other possibilities. All of these are entirely unrelated and while I can provide some disambiguation on my own by expanding my query with more keywords, it would be nice if I could just choose the specific meaning I am referring to and have my search results refined to apply solely to that meaning. None of the major search engines provide this functionality. The only service I know of that does is Wikipedia. When you enter a search into Wikipedia, it will give you the specific page for that keyword if it is a unique keyword with only one result. If, however, that keyword refers to multiple possible articles, you get a disambiguation page. For an example, check out the disambiguation page for "tex" on Wikipedia. That's where I got my example meanings from.
What Wikipedia does is nothing short of impressive; it takes only 2 steps to disambiguate my search term and reach the information I need. Obviously, disambiguation is a lot harder to achieve on search engines. For one thing, there's no human-editing involved like what Wikipedia has, so this process would have to happen automatically and be determined by machines. A search engine has to do a number of things to provide this:
This would be incredibly hard to do, but it sure would be cool if, after requesting results for "tex," your search engine returned a page much like Wikipedia's disambiguation, containing descriptions for each meaning that help you to decide which one you want? I think it would be. Until then, we do have search suggestions, which are partly sufficient for accomplishing the same task, though they are based mainly on popularity and not meaning.
That would be basically impossible to achieve. Search engines have a hard enough time finding relevant content (it would seem), never mind applying a meaning to said content and separating the results into a nice definition list for you.
Besides, a search engine's job is to find documents containing x, not to work out the various meanings of x.
Search engine. Not an encyclopedia. (Wikipedia is not a search engine.)
Unrelated note: I've always thought that 'disambiguation' is one hell of an ugly word.
Rich: I know it's impossible for search engines in the here and now, but down the road, as websites become more semantic, it could be very possible. And even today, it is possible to disambiguate some common terms and do some keyword matching on webpages to figure out which meaning a webpage might apply to.
As for what the job of a search engine is, a search engine's job is to make money. Anything that could improve the effectiveness and usefulness of a search engine is worth considering
Agreed. Search engines are evolving and it's no longer just a straightforward word search anymore. People want to find what they're looking for, not random results for a word.
It's definitely going to be difficult to implement, but then again, looks at where we are today versus years ago when Metacrawler and Altavista were the big search engines.
Umm… how is this impossible? It's been done (to varying extents) numerous times. Heck, I wrote a similar thing as an undergraduate. If you want a search engine just about a particular subject, you can do that (topic-driven search crawlers). If you want a search engine that categorises on ambiguous words, that's achievable from context.
It's not perfect, but then nothing in the natural language processing fields are (including human natural language). Why isn't it in Google today? It's a got a small computational overhead (not huge though), and the return on investment is poor (it's an edge case — not worth optimising for).
I don't agree that the return on investment would be poor. It's a feature that would make the search engine more usable and possibly attract users. I think if any of the major search engines implemented this feature they would not be wasting their time.
The biggest problem I can see is actually in the usability. The ultimate goal of all software from a UI perspective is to be invisible i.e. a user should only be asked to interact when absolutely necessary. By adding a disambiguation page, you get more precise topics at the expense of adding a second piece of interaction. Rather than just typing in a few words, you're asking them to type in some words, read some text on potentially quite a few options, and make an informed decision.
The method that search engines currently use (suggest the most popular, requiring more keywords for other topics) is a better approach from a usability perspective. Less UI, less clutter, fewer interactions.
I get your point, a 2 step process could be annoying. On the other hand, putting disambiguation options in the sidebar on the search results page would be unobtrusive and helpful to users, especially when you are saving them the trouble of explanding their query by typing in extra keywords.
An optional side-bar of the form "Did you mean…[X, Y, Z]?" seems like a reasonable compromise. There are probably a few minor issues around how you draw attention to the sidebar when the user is looking to refine without distracting from results when they're not, but that just needs some prototyping.
The way I see it, if a user is not satisfied with the first handful of results they get, then they have the intention of looking towards the sidebar for better options. It would take some eye-tracking studies to prove this, but it could work.
Surely this can't be too different to Google suggesting alternate spellings? Now, I haven't a clue how this is done because I'm just a code monkey and have no knowledge of algorithms and complex stuff but it sounds similar in my head.
Jem: It's completely different. What I am talking about is when you spell the word correctly and the search engine asks you which meaning of that word you are looking for, such as, did you mean "tex the cartoonist" or "tex the document format"? Same spelling, totally different meanings. That's disambiguation, and it can be really hard.
I know what you're saying Christian, and I know what disambiguation is — I'm not an idiot.
What I'm saying is that if Google is capable of storing alternate spellings (for words that aren't even real, e.g. it'll suggest "jemjabella" to you if you spell it wrong) then surely it would be capable of storing multiple meanings? The hard part would be distinguishing meaning on a page, but even then surely it would just be a case of picking up keywords from the rest of the content?
It was the process of asking the question that I was suggesting is similar, not that spelling and disambiguation are similar.
OH, ok, you had me fooled there. You are right, if Google can suggest alternate spellings then it wouldn't be too hard to suggest alternate meanings, though I'm certain the alternate spellings don't involve storing any information like alternate meanings would.