The person that publish Dan Crow, product manager
Reprint fromWebsite of cereal song Chinese governs a rich settle or live in a strange place
Textual:
Controlling How Search Engines Access And Index Your website
Publish at: On January 26, 2007 morning 11:36:00
I often am asked about Google (Gu Ge) all alone with search how engine runs. Problem of a core is: How is Gu Ge to know the what part of a website is a website possessory want to let them show in searching a result? Can website content promulgator appoint partial website to be not searched? Good news is, the webpage that website content promulgator can use a lot of methods to control you is searching a result to show mediumly.
File Robots.txt is the leading role here, this file has made industry standard a lot of years. It makes a website possessory control search engine to visit their website. You can use ground of Robots.txt disparate arrangement of ideas to dominate a visit, let search engine to visit whole website only for instance, individual catalog, the webpage of specific type, or some webpage. Use Robots.txt effectively to be able to let you control the search of the website to you better, but how achieving this one goal is not very of clearly. This is the first of a series of cards of the access that we control the website to you about how using Robots.txt.
What does file Robots.txt have to use?
Internet is very big really. You just do not believe it is how the ground is endless the ground is big. You perhaps think to safeguard a website to expend a lot of energy, but that is only ” net Hai Yisu ” . (pay cordial apology to Douglas Adams)
Such search engine can read song elephantine cereal to take all news and index to these information, index is to let engine of a search show the webpage that matchs to it to inquiry of a few users.
To accomplish a few computer of this cereal song ceaseless capture website, they have the list of all websites that a Gu Ge knows, they read all webpages of every website. All these machines are called ballad Googlebot. Generally speaking, you are the website that is willing to let Googlebot visit you, such your webpages just can be gone to in Gu Ge search by people.
However, to some a few pages on your website, you may not want to let Gu Ge index. E.g. , you may have a catalog to include in-house daily record, or you may have some of news article to be able to be in only visit of the ability after paying fee. You can pass the file that founds name of file of a text to be Robots.txt to place it in root catalog to come down to tell Gu Ge reptile does not go capture they. File Robots.txt included the detailed account of the page that engine of a search cannot visit. Founding a Robots.txt is blunt, it promises you to use a nicety statified the visit of control search engine to your website.
Careful control
The one pile document that allows you to be you besides Robots.txt file is concise and clear give out outside the instruction, you also can use Robots META label to dominate the visit of individual to yours webpage accurately. To accomplish this, you need to add specific META label to be controlled how to index to individual webpage to your HTML webpage only. Return and anyhow, robots.txt and META label make you more agile and upper amount to complex visit provision to become quite easily.
A simple case
Here is a simple case, it is file of a Robots.txt.
User-Agent: Googlebot
Disallow: / Logs/
An instruction of facial cent below demonstrate of User -Agent travel just is Googlebot. Engine of all and main search is read and be subject to the statement that you put in Robots.txt. Want you to be willing only, you can think different search engine appoints different regulation. Disallow tells Googlebot to cannot enter the subdirectory of record daily record of your website all right. The content that you put in log catalog will not show in cereal the song searchs a result in.
Prohibit visitting individual file
If you have to register the article that user ability visits only, you want to let it eliminate to be besides Gu Ge’s result certainly. To accomplish this, you need to add label of a META in the HTML file of this article correspondence only. Of the code of this file most begin to resemble a bit:
<html>
<head>
<meta Name= "googlebot" content= "noindex" >
. . .
This paragraph of code can prevent Gu Ge to index this file. Label META is extraordinary and useful, if you are promised to compile individual document only, is not the Robots.txt of complete station. They still promise you to chase page ground to appoint complex visit to control a regulation.
Understanding is more
You can be inHttp://www.robotstxt.org/ And
Gu Ge’s stationmaster helps a centerFind the more and useful information of more concerned Robots.txt, include:
How to construct file of a Robots.txt
To the description of every User-agent that Gu Ge uses
How to use mode to match
How long to want us is ability new capture your Robots.txt file?
We also are in ourStationmaster rich guestA few posts about Robots.txt were kept in, useful to you perhaps. For example:
Use Robots.txt file
Googlebot collect
Here still has the useful detailed account of the robot that engine of a main search uses: Http://www.robotstxt.org/wc/active/html/index.html
Next time…
Be about to roll out: About the card that expatiatory Robots and Metatags use, with the specific example about a few pairs of common circumstances.
Tags: , Crow, Dan, managerReprint, person, product, publish, The