Using log data to help keep you safe

Filed under: Official Google Blog — Wrote by Lees on Sunday, May 4th, 2008 @ 2:29 am

Posted by Niels Provos, Google Security
Team

We recently began two new series
of posts. The first, which explains how we harness data for our
users, started with href="http://googleblog.blogspot.com/2008/03/why-data-matters.html"
id="wkxh" >this post. The second, focusing on how we secure
information and how users can protect themselves online, style="font-style: italic;"
href="http://googleblog.blogspot.com/2008/03/how-google-keeps-your-information.html"
id="b3rt" >began here. This
post is the second installment in both series.- Ed.

We sometimes get questions on what Google does with server log
data, which registers how users are interacting with our services.
We take great care in protecting this data, and while we've
talked previously about href="http://googleblog.blogspot.com/2008/03/why-data-matters.html"
id="gjr1" >some of the ways it can be
useful, something we haven't covered yet are the ways it can
help us make Google products safer for our users.

While the Internet on the whole is a safe place, and most of us
will never fall victim to an attack, there are more than a few
threats out there, and href="http://googleblog.blogspot.com/2008/03/how-google-keeps-your-information.html"
id="syc9" >we do everything we
can to help you stay a step ahead of them. Any information we
can gather on how attacks are launched and propagated helps us do
so.

That's where server log data comes in. We analyze logs for
anomalies or other clues that might suggest malware or phishing
attacks in our search results, attacks on our products and
services, and other threats to our users. And because we have a
reasonably significant data sample, with logs stretching back
several months, we're able to perform aggregate, long-term
analyses that can uncover new security threats, provide greater
understanding of how previous threats impacted our users, and help
us ensure that our threat detection and prevention measures are
properly tuned.

We can't share too much detail (we need to be careful not to
provide too many clues on what we look for), but we can use
historical examples to give you a better idea of how this kind of
data can be useful. One good example is the href="http://www.citi.umich.edu/u/provos/papers/search_worms.pdf"
id="ujjv" >Santy search worm
(PDF), which first appeared in late 2004. Santy used combinations
of search terms on Google to identify and then infect vulnerable
web servers. Once a web server was infected, it became part of a href="http://en.wikipedia.org/wiki/Botnet" id="or.e"
>botnet and started searching Google for more
vulnerable servers. Spreading in this way, Santy quickly infected
thousands and thousands of web servers across the Internet.

As soon as Google recognized the attack, we began developing a
series of tools to automatically generate " href="http://en.wikipedia.org/wiki/Regular_expression" >regular
expressions" that could identify potential Santy queries
and then block them from accessing Google.com or flag them for
further attention. But because regular expressions like these can
sometimes snag legitimate user queries too, we designed the tools
so they'd test new expressions in our server log databases
first, in order to determine how each one would affect actual user
queries. If it turned out that a regular expression affected too
many legitimate user queries, the tools would automatically adjust
the expression, analyze its performance against the log data again,
and then repeat the process as many times as necessary.

In this instance, having access to a good sample of log data meant
we were able to refine one of our automated security processes, and
the result was a more effective resolution of the problem. In other
instances, the data has proven useful in minimizing certain
security threats, or in preventing others completely. In the end,
what this means is that whenever you use Google search, or Google
Apps, or any of our other services, your interactions with those
products helps us learn more about security threats that could
impact your online experience. And the better the data we have, the
more effectively we can protect all our users.

Tags: , , , , , , ,

  -

No comments yet. Be the first to comment this post.

Leave your comment

Copyright © 2007 Google Adsense College.
Powered by GoogleSchool. All Rights Reserved.