home links tools blog about

GoogleProxyblocker : gpb

home

 

Google Proxy Blocker: GPB

Google Rules. Admit it.
Only, the Google Web Accelerator doesn't. Sure, it's a nice idea, but it really just doesn't make sense for the following reasons:
1) Caching. Caching works fine for WEENIE websites, but for your MACHO interactive interactive site you don't want your content cached by a third party. It's completely unacceptable.
2) Issues. There have been a number of reported issues with GWA spidering interactive sites, and wreaking havoc as they do so. As it does this, it 'clicks' all of the links on the site available. But a link that says: "Delete this" or "remove this" is something you don't want a spider to click -- especially when the spider ignores the javascript warning you may have put in place to protect your end-users from their own stupidity. Furthermore, on sites where information is personalized, or user specific, you don't want cached content shared among all users.
3) Analytics. If all of your users are going to Google for their content instead of your site, how will you know what kind of traffic you are getting? Again, unacceptable.
4) Nobody wants Google to BE THE INTERNET. We just want it to keep doing the insane job it does of indexing it. Because GWA is effectively just a proxy, it means that all requests by users running this software will be sent into Google -- given Google's penetration, this has some less than cool possibilities.

So, the need to block the Google Proxy exists. That's where the GoogleProxyBlocker comes in. It's an ASP.NET HttpModule, so it's simple to install and configure. It's bloody fast and performant, and you'll never know that it's there -- except GPB tells the Google Proxy Bots to beat it -- meaning that you don't have to worry about them caching your site. Of course, JUST the proxy bots are told to scram. Google will still spider your site for content/ads/etc.

Kudos

Shout out to Carson McComas who instantly forsaw the problems that the GWA would create and told me I needed to use my ReverseDOS skills to implement a quick and easy to implement fix.

License

Non-Lawyer-ese license: Use this code as you want. The code has been tested, works fine, and shouldn't cause you any problems. However, by downloading and using this software, you agree that I can not be held liable for any issues you encounter with it. Oh, and please don't try and take credit for my efforts.

Download

Download it here. Free of charge. No registration or other evil either.

Installation

Installation is very simple. It consists of just copying the downloaded binaries into your /bin/ directory, and then tweaking your web.config to signal ASP.NET to include the HttpModule during request processing.
Basic Installation Instructions:
1)
Copy the files to your site's /bin/ directory.
2) The Web.Config has a the following structure:

Just insert the following code where directed in the image above:


(If you already have an <httpModules> config section, just add the following code: )

Save your changes to the Web.Config.

That's it. Installation is complete.


Advanced Options:
1) Allowing GWA access to some directories but not others: If you do nothing above the base install, GWA will be blocked from your entire site. If you'd like to allow it access to some directories, and ban it from others, you just need to add a small snippet of code to your web.config. The snippet to add is just an AppSettings key-value pair that signals to the GPB to allow the WebAccelerator bots access to the current directory.
If you don't already have a web.config in the directory you want to modify, just paste the following code into the directory into a new Web.Config file:


If you already have a web.config, just add the following line:
Setting the value to "true" means that GWA CAN access the directory. With the value set to false, or not present, GWA will NOT be allowed access.

2) Blocking file types (like images and .exes) other than ASP.NET file types: If you need to prevent GWA from caching images, html, custom extensions, etc. you just need to bind those extensions to ASP.NET for processing. Doing this, of course, incurs the overhead of having the ASP.NET pipeline serve this content in the future (and may break some custom processing, etc.), but will then allow subsequent requests for the specified content to be scanned for requests by GWA. To configure this option, you just need to configure IIS to bind the requested content (extension) to ASP.NET. Here's how:

  1. Open the Internet Information Service MMC on your webserver.
  2. Navigate down into the websites folder, find the site where GPB is installed and right click it, then select Properties.
  3. Select the Home Directory tab, then click Configuration.
  4. A list of extensions and their mapped executable paths exists. Find and select the .aspx extension and then click the Edit button.
  5. Select the entire text of the Executable: field, right click it and select Copy. (CTRL+C won't work).
  6. Close the Add/Edit Application Extension Mapping dialogue by clicking Cancel.
  7. Find the extension you wish to map to ASP.NET and click Edit to edit it, or click Add to create it if it doesn't exist (for .gif for example).
  8. Right click the Executable: field and select Paste (CTRL+V won't work either). Specify the Extension: if it's not already specified. (Specify it with the . in front of it e.g. ".gif").
  9. Click OK. The extension is now mapped/bound for this site, and will be served by ASP.NET -- and will filter through the GPB HttpModule.