Wednesday, December 23, 2015

Parts of a Search Engine

Parts of a Search Engine


Hasil gambar untuk crawl Parts of a Search Engine



While there are different ways to organize web content, every crawling search engine has the same basic parts:

       a crawler

       an index (or catalog)

       a search interface

Crawler (or Spider)

The crawler does just what its name implies. It scours the web following links, updating pages, and adding new pages when it comes across them. Each search engine has periods of deep crawling and periods of shallow crawling. There is also a scheduler mechanism to prevent a spider from overloading servers and to tell the spider what documents to crawl next and how frequently to crawl them.

Rapidly changing or highly important documents are more likely to get crawled frequently. The frequency of crawl should typically have little effect on search relevancy; it simply helps the search engines keep fresh content in their index. The home page of CNN.com might get crawled once every ten minutes. A popular, rapidly growing forum might get crawled a few dozen times each day. A static site with little link popularity and rarely changing content might only get crawled once or twice a month.


The best benefit of having a frequently crawled page is that you can get your new sites, pages, or projects crawled quickly by linking to them from a powerful or frequently changing page.

0 comments:

Facebook  Google+ Instagram Linkedin

Featured Post

Common Keyword Problems

PageRank Checker