Robots and Web Crawlers

So the thing I hate more than anything about web statistics and analytics is that there is always some robot or spider that crawls your site. This inflates the statistics and lowers the presumed conversion rate.  The spiders also request bogus web pages and trip my error monitoring emails giving off false alarms which causes me to ignore the real alarms when they come.  Well recently I have made some major steps in the auto classification and elimination of robots from my web statistics.  Up to this point i had been manually adding IPs to an ignore list.

The key to this was writing a script that requested the company/organization name for the IP address directly from Arin.  When i get back goolge, microsoft, APNIC, RIPE, or LACNIC, or a few of the other web spiders I’ve tagged then I’m able to auto tag the IP as a robot and the bogus error messages stop! I’m also able to use the same classifications to filter out the local traffic and robots so i get a better count for my actual visitors.  It may not be perfect but it has greatly reduced the number of bogus error emails I receive (I have set up a catch in the application error that emails me every time there is an error with all of the page details, referral page, user name, and IP info).

I’ve paired this auto classify with my custom 404 error page to do some real cool stuff, I’ll talk about the 404 page tomorrow :)


Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>