It's amazing how obvious it is that a text was written only a few years ago. A lot of the things they explain are just intuitively obvious for someone who's used Google for the past 5+ years.
Also, for so many of the issues with programming searches, it's sick how often my brain just jumps to:
"Hasn't someone already done all of this? Isn't it open source? Yes? Okay, sweet. Let's move on!"
Or, perhaps slightly more relevantly:
"Why go through all these guesses about what human beings mean? Why not start with a simple model, and record what search terms people use for Google/Wikipedia/Dictionary.com/whatever, and what page/word/article they ultimately choose? After 100 or 1000 people have done this, and the agreement rate is over, say, 95%, they add that association into your indexing relations. No more guesswork!"
But of course, 4 or 5 years from now, a programmer or information scientist reading that suggestion would probably consider it outdated.
Is there an application available, probably as a browser extension, that would allow users to tag web sites with whatever labels they think are relevant? I think it would be tremendously useful. I need to think more about that. There must be inherent problems or it would be used popularly by now, right?
I'm not sure I get the point of biwords and their extensions. I thought that the process was pretty much standardized:
If it's in quotation marks, then it's a phrase
If there are commas between the words, treat them as separate words (AFTER
the first step, so anything inside the quotations wouldn't apply)
If it's not in quotation marks, first treat it as a phrase and return the
results first, then afterwards treat them as separate words and return
those results, too
Isn't that pretty much agreed upon? I don't think it's ever steered me wrong using a popular search engine before.