Parts
of a Search Engine
While there
are different ways to organize web content, every crawling search engine has
the same basic parts:
•
a crawler
•
an index (or catalog)
•
a search interface
Crawler
(or Spider)
The crawler
does just what its name implies. It scours the web following links, updating
pages, and adding new pages when it comes across them. Each search engine has
periods of deep crawling and periods of shallow crawling. There is also a
scheduler mechanism to prevent a spider from overloading servers and to tell
the spider what documents to crawl next and how frequently to crawl them.
Rapidly
changing or highly important documents are more likely to get crawled
frequently. The frequency of crawl should typically have little effect on
search relevancy; it simply helps the search engines keep fresh content in
their index. The home page of CNN.com might get crawled once every ten minutes.
A popular, rapidly growing forum might get crawled a few dozen times each day.
A static site with little link popularity and rarely changing content might
only get crawled once or twice a month.
The best
benefit of having a frequently crawled page is that you can get your new sites,
pages, or projects crawled quickly by linking to them from a powerful or
frequently changing page.
0 comments:
Post a Comment