Search
Interface
The search
algorithm and search interface are used to find the most relevant document in
the index based on the search query. First the search engine tries to determine
user intent by looking at the words the searcher typed in.
These terms
can be stripped down to their root level (e.g., dropping ing and other suffixes) and checked against a lexical database to
see what concepts they represent. Terms that are a near match will help you
rank for other similarly related terms. For example, using the word swims could
help you rank well for swim or swimming.
Search engines
can try to match keyword vectors with each of the specific terms in a query. If
the search terms occur near each other frequently, the search engine may
understand the phrase as a single unit and return documents related to that
phrase.
WordNet is the
most popular lexical database. At the end of this chapter there is a link to a
Porter Stemmer tool if you need help conceptualizing how stemming works. Searcher
Feedback
Some search
engines, such as Google and Yahoo!, have toolbars and systems like Google
Search History and My Yahoo!, which collect information about a user. Search
engines can also look at recent searches, or what the search process was for
similar users, to help determine what concepts a searcher is looking for and
what documents are most relevant for the user’s needs.
As people use
such a system it takes time to build up a search query history and a
click-through profile. That profile could eventually be trusted and used to
•
aid in search personalization
•
collect user feedback to determine how well an algorithm is
working
•
help search engines determine if a document is of decent
quality (e.g., if many users visit a document and then immediately hit the back
button, the search engines may not continue to score that document well for
that query).
I have spoken
with some MSN search engineers and examined a video about MSN search. Both
experiences strongly indicated a belief in the importance of user acceptance.
If a high-ranked page never gets clicked on, or if people typically quickly
press the back button, that page may get demoted in the search results for that
query (and possibly related search queries). In some cases, that may also flag
a page or website for manual review.
As people give
search engines more feedback and as search engines collect a larger corpus of
data, it will become much harder to rank well using only links. The more
satisfied users are with your site, the better your site will do as search
algorithms continue to advance.
Real-Time
versus Prior-to-Query Calculations
In most major
search engines, a portion of the relevancy calculations are stored ahead of
time. Some of them are calculated in real time.
Some things
that are computationally expensive and slow processes, such as calculating
overall inter-connectivity (Google calls this PageRank), are done ahead of
time.
Many search
engines have different data centers, and when updates occur, they roll from one
data center to the next. Data centers are placed throughout the world to
minimize network lag time. Assuming it is not overloaded or down for
maintenance, you will usually get search results from the data centers nearest
you. If those data centers are down or if they are experiencing heavy load,
your search query might be routed to a different data center.
0 comments:
Post a Comment