Sunday, May 24, 2009

How to setup an ad-blocking proxy in 10 minutes and block ads in all your favorite browsers!

For a long time now I've been content using Opera's built-in ad blocker, an example of which you can see below. This approach has worked for me for a while, but has some major shortcomings: I can't use expressions to filter multiple URLs from the same server, and obviously all my hard work blocking ads in Opera doesn't do anything for me in Firefox or Safari.

Opera's Ad Blocker

The best solution to these problems lies in setting up a proxy to filter requests by URL. I had read about doing something like this a couple years ago, but couldn't be bothered to figure out all the details of setting it up. It turns out the be remarkably easy, as I'll show you! With my setup files and instructions you can be browsing ad-free in 10 minutes. And trust me, it's worth it!


The benefits of the proxy approach

  • Block ads in every browser with a single configuration file.
  • Filter URLs using Regular Expressions (I've added some methods so you don't have to).
  • Configuration files are in JavaScript for easy scripting and customization.
  • Blocks HTML ads, image ads, flash ads, you name it.
  • Faster browsing because you don't wait for ads to load (and ads are usually given first priority in the loading sequence when you visit commercial websites, so your pages render faster).

You're going to need to be running a webserver that will serve a blank page for the blocked ads. Without this your browser will report a 404 error because it can't access the blocked pages, and your webpages will have 404 errors rendered where the ads used to be - not so pretty to look at. If you're on Mac, you've already got Apache installed and ready to go. If you're on Windows, you're going to have to set that up and then come back here when you're done.

[ Aside: I've been thinking that you don't really need to run a local webserver at all, if you have access to one on the web. That would allow any number of people on any network to benefit from a single filter file. ]

Getting started

Webpage with Ads
Webpage without Ads
First off, here's a little taste of how your new browsing experience will look. On the left is a regular webpage followed by the same page minus the ads. Ad free goodness!

The setup is simple.

  1. You have an Apache virtual host listening on a non-standard port, say 61111. The virtual host does only one thing: it rewrites all requests to point to an empty HTML file, which I have named response.html.
  2. You also have a proxy configuration file, or PAC, which you configure your browser to use. All requests from the browser are sent to this PAC file. The PAC file examines the request and if it looks like it's an advertisement it sends the request to our apache virtual host (which will render a blank page...so no ad will display). Otherwise the request is served by the browser in the normal fashion.

Setup Apache

You're going to need to modify the paths according to where you put these files, and the setup of your machine. These instructions are particular to Mac OS X, but the Windows setup is analagous.

On Mac, I think the easiest place to put our PAC file and the response.html file is in your Public/ directory which is under your home directory. Extract this archive (ad-blocker.zip) which contains four files: favicon.ico, proxy.conf, response.html as well as a copy of ad-blocker.conf (the Apache virtual host configuration file).

Configure the virtual host
You should have a directory under /etc/apache2/ called other/. You can define virtual hosts in their own conf files here and they will be read by Apache on startup.

In /etc/apache2/other/ create a file called ad-blocker.conf with the following contents (this file is also contained in ad-blocker.zip):

  ## Use Listen directives and not Port directive if server will handle
  ## requests from multiple ports.
  Listen 127.0.0.1:61111

  <VirtualHost *:61111>
  DocumentRoot "/Users/Karl/Public/ad-blocker"
  <directory "/Users/Karl/Public/ad-blocker">
    Order allow,deny
    Allow from all
    Options +Indexes
  </directory>
  ErrorLog /var/log/apache2/error_log_ads
  SetEnvIf Request_URI .* no-access-log
  CustomLog /dev/null common env=!no-access-log
  RewriteEngine on
  RewriteRule ^(.*) /Users/Karl/Public/ad-blocker/response.html
  </VirtualHost>
Now modify the DocumentRoot and <directory ...> with the full path to the ad-blocker directory you added to Public, and modify the RewriteRule directive to point to the response.html file.

Start Apache
Mac OSX: Enable Web Sharing
On Mac, just enable Web Sharing under System Preferences -> Sharing. Apache will start, and if you visit http://127.0.0.1:61111 you should see a blank page (with no errors or anything else). If you get an error like You don't have permissions to access / on this server you may permissions problems on your ad-blocker directory, you may have messed up your <directory ...> directive path, or you may need to enable File Sharing on the directory and give permissions to Everyone to Read.

Apache should be up and running and ready to serve ads.

Now the easy part.

Configure your browsers

We need to configure the browser to use our proxy configuration file (PAC file). For each browser, open your Preferences and configure the path to the proxy.conf file for your machine using my screenshots as a guide.


Firefox
Open Preferences->Advanced->Network. Then under Connection, click 'Settings...'
Select 'Automatic proxy configuration URL' and enter the path to the proxy.conf file

Opera
Open Preferences->Advanced->Network and click on 'Proxy Servers'

Select 'Use automatic proxy configuration' and enter the path to the proxy.conf file

Safari
Open Preferences->Advanced and click 'Proxies: Change Settings...'

This should bring up your OS X Network->Proxies settings. Select 'Configure Proxies: Using a PAC file' and enter the path to the proxy.conf file

Alternatively you can get there from OS X System Preferences->Network click on 'Advanced' then 'Proxies'

The PAC file

Here's the proxy configuration (PAC) file if you just want to have a look at how it's setup. I've modified the original file that I reference in the comments by adding two really useful JavaScript functions, domainMatch() and subdomainMatch(). These perform regular expression matching on the URL and can filter URLs by domain e.g. domainMatch('doubleclick.net') or by subdomain e.g. subdomainMatch('ads'). This makes it much cleaner and more powerful than calling shExpMatch(host, ...) every time and using sub-par pattern matching.

Here's the code...

  /**
   * Proxy configuration file to block ads.
   * 
   * From <http://hydra.nac.uci.edu/indiv/ehood/gems/ad-blocking.html>
   * Host matching adapted from hostname list at
   *    <http://www.ecst.csuchico.edu/~atman/spam/adblock.shtml>
   * Regular-expression matching functions added by Karl Varga
   *    <kjvarga.blogspot.com>
   *
   * The proxy is assumed to be listening on 127.0.0.1:61111.  Change
   * the return "PROXY ..." statement near the end of the file to
   * suit your local configuration.
   */



  /**
   * Add an escape function to the RegExp object.
   * 
   * @see http://simonwillison.net/2006/Jan/20/escape/
   */
  RegExp.escape = function(text) {
    if (!arguments.callee.sRE) {
      var specials = [
        '/', '.', '*', '+', '?', '|',
        '(', ')', '[', ']', '{', '}', '\\'
      ];
      arguments.callee.sRE = new RegExp(
        '(\\' + specials.join('|\\') + ')', 'g'
      );
    }
    return text.replace(arguments.callee.sRE, '\\$1');
  }

  /**
   * A domain is matched if it is preceeded by one or more subdomains,
   * or no subdomains.  A domain does not match a similar domain with text
   * prepended.
   */
  function domainMatch(host, domain) {
    var regex = new RegExp('^([^.]+\\.)*' + RegExp.escape(domain) + '$', 'i');
    return regex.test(host);
  }

  /**
   * Match any subdomain.
   */
  function subdomainMatch(host, subdomain) {
    var regex = new RegExp('^.*[.]' + RegExp.escape(subdomain) 
        + '[.].*$|^' + RegExp.escape(subdomain) + '[.].*$', 'i');
    return regex.test(host);
  }

  /**
   * Called by the browser to determine whether to proxy the request or not.
   */
  function FindProxyForURL(url, host) {

    if (

      /**
       * Domains.
       */
      domainMatch(host, "PostMasterBannerNet.com") ||
      domainMatch(host, "adbureau.net") ||
      domainMatch(host, "admaximize.com") ||
      domainMatch(host, "admex.com") ||
      domainMatch(host, "alladvantage.com") ||
      domainMatch(host, "avenuea.com") ||
      domainMatch(host, "bizservers.com") ||
      domainMatch(host, "burstnet.com") ||
      domainMatch(host, "click2net.com") ||
      domainMatch(host, "clicktrade.com") ||
      domainMatch(host, "commision-junction.com") ||
      domainMatch(host, "digitalriver.com") ||
      domainMatch(host, "doubleclick.net") ||
      domainMatch(host, "eads.com") ||
      domainMatch(host, "extreme-dm.com") ||
      domainMatch(host, "flycast.com") ||
      domainMatch(host, "focalink.com") ||
      domainMatch(host, "freestats.com") ||
      domainMatch(host, "hitbox.com") ||
      domainMatch(host, "iadnet.com") ||
      domainMatch(host, "imaginemedia.com") ||
      domainMatch(host, "imgis.com") ||
      domainMatch(host, "link4ads.com") ||
      domainMatch(host, "mediaplex.com") ||
      domainMatch(host, "netdirect.nl") ||
      domainMatch(host, "ngadcenter.net") ||
      domainMatch(host, "oneandonlynetwork.com") ||
      domainMatch(host, "preferences.com") ||
      domainMatch(host, "targetshop.com") ||
      domainMatch(host, "teknosurf2.com") ||
      domainMatch(host, "teknosurf3.com") ||
      domainMatch(host, "trix.net") ||
      domainMatch(host, "valueclick.com") ||
      domainMatch(host, "websitefinancing.com") ||
      domainMatch(host, "2mdn.net") ||
      domainMatch(host, "brandreachsys.com") ||
      domainMatch(host, "fastclick.net") ||
      domainMatch(host, "eyewonder.com") ||
      domainMatch(host, "clicktorrent.info") ||
      // domainMatch(host, "yimg.com") ||
      domainMatch(host, "pop6.com") ||
      domainMatch(host, "adinterax.com") ||
      domainMatch(host, "atdmt.com") ||
      domainMatch(host, "fling.com") ||
      domainMatch(host, "serving-sys.com") ||
      domainMatch(host, "fuelbuck.com") ||
      domainMatch(host, "blogads.com") ||
      domainMatch(host, "doublepimp.com") ||
      domainMatch(host, "etology.com") ||
      domainMatch(host, "adshuffle.com") ||
      domainMatch(host, "awempire.com") ||
      domainMatch(host, "adjuggler.com") ||
      domainMatch(host, "atdmt.com") ||
      domainMatch(host, "edgesuite.net") ||

      /**
       * Subdomains
       */
      subdomainMatch(host, "ads") ||
      subdomainMatch(host, "ads0") ||
      subdomainMatch(host, "ads1") ||
      subdomainMatch(host, "ads2") ||
      subdomainMatch(host, "ads3") ||
      subdomainMatch(host, "ads4") ||
      subdomainMatch(host, "ads5") ||
      subdomainMatch(host, "banners") ||
      subdomainMatch(host, "banner") ||
      subdomainMatch(host, "adcontroller") ||
      subdomainMatch(host, "click") ||

      /**
       * Hostname Patterns
       */
      shExpMatch(host, "*-ad.*") ||
      shExpMatch(host, "*adlink.*") ||
      shExpMatch(host, "ad-*.com") ||
      shExpMatch(host, "ad.*") ||
      shExpMatch(host, "ad0*") ||
      shExpMatch(host, "adcontroller*") ||
      shExpMatch(host, "adcreatives*") ||
      shExpMatch(host, "adex*") ||
      shExpMatch(host, "adforce*") ||
      shExpMatch(host, "adfu.*") ||
      shExpMatch(host, "adimage*") ||
      shExpMatch(host, "adimg*") ||
      shExpMatch(host, "admedia*") ||
      shExpMatch(host, "adpick*") ||
      shExpMatch(host, "adremote*") ||
      shExpMatch(host, "ngads*") ||
      shExpMatch(host, "nsads*") ||
      shExpMatch(host, "ph-ad*") ||
      shExpMatch(host, "realads*") ||

      /**
       * URLs
       */
      shExpMatch(url, "*.weather.com/*/ads/*") ||
      shExpMatch(url, "*/adimages/*") ||
      shExpMatch(url, "*/adsmanager/*") ||

      false   
    ) {

      // Proxy the request
      return "PROXY 127.0.0.1:61111";

    }

    // Let the browser handle it
    return "DIRECT";
  }  

Friday, May 22, 2009

makePositioned: A jQuery extension function to dynamically position an element near another element

I've recently been working on a dynamic select/auto-complete list (which I'll post about soon) and I had to position the dynamically-created div under the input when the user enters some text. jQuery makes it quite easy to position elements in this manner because you have access to elements position and size, but who wants to go to the hassle every time?

So I've created a jQuery extension function called makePositioned which is called on the element you want to position and accepts two arguments: the alignment position (either top, right, bottom or left) and the element to position it against. You would usually use this for positioning dynamically created content, for instance help popups beside form input fields, ajax feedback icons etc.

Let me know if you find this useful, or if you add support for more alignment options. top aligns above left, right aligns top right, bottom aligns bottom left, and left aligns top left. It doesn't do any fancy checking to see if there is room in the viewport below the element, but that would be a nice feature to add.

Here's a demo. Check out the code below.

Click the buttons to position this div.

The Code

<script type="text/javascript">
/** 
 * Extend jQuery.
 *
 */
jQuery.fn.extend({

  /**
   * Position the first element in the jQuery list near another element 
   * using absolute positioning. The element should already have the 
   * proper z-Index set.
   * 
   * @param string align 'bottom' for bottom left, or 'right' for top right,
   *    'left' for top left, 'top' for above left.
   */
  makePositioned: function(align, element) {
    var first = this.eq(0);
    var pos, height, width, left, top, thisHeight, thisWidth;
    pos = element.offset();
    height = element.outerHeight(), width = element.outerWidth();
    left = pos.left, top = pos.top;
    thisHeight = first.outerHeight(), thisWidth = first.outerWidth();
    
    switch (align) { 
      case 'bottom':
        top += height;
      break;
      case 'right':
        left += width;
      break;
      case 'left':
        left = left - thisWidth;
      break;
      case 'top':
        top = top - thisHeight;
      break;
    }

    first.css({ 
      top: parseInt(top)+'px', 
      left: parseInt(left)+'px',
      position: 'absolute'
    });
    
    return this;
  }

});
</script>

Wednesday, May 6, 2009

Keep using TextMate unregistered, indefinitely (forever, aka eternity)

Ok, this one is real easy. I didn't come up with it, but I often forget where the file is that I'm supposed to delete so I thought I'd put a note here and never forget again...until next month :)

This is a one-step process. Not two, not three. Just one. Delete the following file:

~/Library/Preferences/com.macromates.textmate.plist
Done already? You'll lose your preferences, but if you're too skint to buy this stuff then you don't deserve to have preferences.

If you're wondering what will happen next, you'll get 30 more days to evaluate TextMate. See you in 30 days :)