Robots Exclusion Protocol: now with even more flexibility
Posted by Dan Crow, Product
Manager
This is the third and last in my series of blog posts about the
Robots Exclusion Protocol (REP). In the
href="http://googleblog.blogspot.com/2007/01/controlling-how-search-engines-access.html" >
first post, I introduced robots.txt and the robots
META tags, giving an
overview of when to use them. In the
href="http://googleblog.blogspot.com/2007/02/robots-exclusion-protocol.html" >
second post, I shared some examples of what you can do with the
REP. Today, I'll introduce two new features that we have
recently added to the protocol.
As a product manager, I'm always talking to content providers
to learn about your needs for REP. We are constantly looking for
ways to improve the control you have over how your content is
indexed. These new features will give you flexible and convenient
ways to improve the detailed control you have with Google.
Tell us if a page is going to
expire
Sometimes you know in advance that a page is going to expire in the
future. Maybe you have a temporary page that will be removed at the
end of the month. Perhaps some pages are available free for a week,
but after that you put them into an archive that users pay to
access. In these cases, you want the page to show in Google search
results until it expires, then have it removed: you don't want
users getting frustrated when they find a page in the results but
can't access it on your site.
We have introduced a new META tag that allows you to
tell us when a page should be removed from the main Google web
search results: the aptly named unavailable_after tag. This
one follows a similar syntax to other REP META tags. For example, to
specify that an HTML page should be removed from the search results
after 3pm Eastern Standard Time on 25th August 2007, simply add the
following tag to the first section of the page:
<META
NAME="GOOGLEBOT" CONTENT="unavailable_after:
25-Aug-2007 15:00:00 EST">
The date and time is specified in the
href="http://www.ietf.org/rfc/rfc0850.txt" >RFC 850
format.
This information is treated as a removal request: it will take
about a day after the removal date passes for the page to disappear
from the search results. We currently only support unavailable_after for
Google web search results.
After the removal, the page stops showing in Google search results
but it is not removed from our system. If you need a page to be
excised from our systems completely, including any internal copies
we might have, you should use the existing URL removal tool which
you can read about on our
href="http://googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html" >
Webmaster Central blog.
Meta tags everywhere
The REP META tags
give you useful control over how each webpage on your site is
indexed. But it only works for HTML pages. How can you control
access to other types of documents, such as Adobe PDF files, video
and audio files and other types? Well, now the same flexibility for
specifying per-URL tags is available for all other files
type.
We've extended our support for META tags so they can now
be associated with any file. Simply add any supported META tag to a new X-Robots-Tag directive in
the
href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html" >HTTP
Header used to serve the file. Here are some illustrative
examples:
Don't display a cache link or snippet for this item in the
Google search results:
X-Robots-Tag: noarchive,
nosnippet
Don't include this document in the Google search
results:
X-Robots-Tag:
noindex
Tell us that a document will be unavailable after 7th July
2007, 4:30pm GMT:
X-Robots-Tag:
unavailable_after: 7 Jul 2007 16:30:00 GMT
You can combine multiple directives in the same document. For
example:
Do not show a cached link for this document, and remove it from
the index after 23rd July 2007, 3pm PST:
X-Robots-Tag:
noarchive
X-Robots-Tag: unavailable_after: 23 Jul 2007 15:00:00
PST
Our goal for these features is to provide more flexibility for
indexing and inclusion in Google's search results. We hope you
enjoy using them.
height="1" width="1" />
Tags: , Crow, Dan, Last, ManagerThis, Posted, product, third