” Robots.txt agreement standard ” introduce
Author: Yahoo searchs log group
The near future has a lot of stationmaster to seek advice, set “robots.txt” file how correctly. To resolve the question of broad stationmaster friends, we are right ” Robots.txt agreement standard ” undertook an interpreter, hope this translation can help everybody have more intimate knowledge to “robots.txt” file.
Robots.txt is to deposit file of a below site root catalog simple text. Although its setting is very simple, but effect is very powerful however. It can appoint search engine spider the content that capture assigns, perhaps prohibit searching the part of website of engine spider capture or entire content.
We come to specific introducing below its use method:
Robots.txt file should be put below website root catalog, and this file is OK have a visit through Internet.
For example:
If your website address isHttp://www.yourdomain.com/
So, this file must can passHttp://www.yourdomain.com/robots.txt
Open and see the content inside.
Format:
User-agent:
With the name that searchs engine spider at the description, in ” Robots.txt ” file, if many User-agent records a specification to have many searches,engine spider can get the limitation of this agreement, to this file, want to have an User-agent record at least. If this value is set,be * , criterion this agreement alls alone to any search engine spider all effective, in ” Robots.txt ” file, “User-agent:*”Such record can have only.
Disallow:
With an URL that does not hope to be visited at the description, this URL can be a whole way, also can be a part, any URL with Disallow begin all won’t be visited by Robot.
Citing:
Exemple one: “Disallow:/help “
It is to point to / Help.html and / Help/index.html capture of spider of disapprobation search engine.
Exemple 2: “Disallow:/help/ “
It is to point to promise to search engine spider capture / Help.html, and cannot capture / Help/index.html.
Exemple 3: Disallow record is empty
All pages that show this website promise to be indexed prop up capture by search, in “/robots.txt” file, want to have record of a Disallow at least. If “/robots.txt” is an empty file, prop up a spider to all search index, this website is those who open is OK by capture.
# :
The annotate accord with in Robots.txt agreement.
Citing:
Exemple one: Prohibit through “/robots.txt” catalog of “/bin/cgi/” of capture of spider of all search engine, and “/tmp/” catalog and / Foo.html file, setting method is as follows:
User-agent: *
Disallow: / Bin/cgi/
Disallow: / Tmp/
Disallow: / Foo.html
Exemple 2: Promise capture of engine of a certain search only through “/robots.txt” , and prohibit searching engine capture otherly.
Be like: Promise a name to be the capture of search engine spider of “slurp” only, and the content that rejects other search to all alone catalog of “/cgi/” of engine spider capture falls, setting method is as follows:
User-agent: *
Disallow: / Cgi/
User-agent: Slurp
Disallow:
Exemple 3: Prohibit any search engine capture my website, setting method is as follows:
User-agent: *
Disallow: /
Exemple 4: Prohibit only capture of engine of a certain search my website
Be like: Prohibit only the name is ” Slurp ” capture of search engine spider, setting method is as follows:
User-agent: Slurp
Disallow: /
More, ask referenced translation source: Http://www.robotstxt.org/wc/norobots.html
No comments yet. Be the first to comment this post.