Sunday, May 24, 2009

How to setup an ad-blocking proxy in 10 minutes and block ads in all your favorite browsers!

For a long time now I've been content using Opera's built-in ad blocker, an example of which you can see below. This approach has worked for me for a while, but has some major shortcomings: I can't use expressions to filter multiple URLs from the same server, and obviously all my hard work blocking ads in Opera doesn't do anything for me in Firefox or Safari.

Opera's Ad Blocker

The best solution to these problems lies in setting up a proxy to filter requests by URL. I had read about doing something like this a couple years ago, but couldn't be bothered to figure out all the details of setting it up. It turns out the be remarkably easy, as I'll show you! With my setup files and instructions you can be browsing ad-free in 10 minutes. And trust me, it's worth it!


The benefits of the proxy approach

  • Block ads in every browser with a single configuration file.
  • Filter URLs using Regular Expressions (I've added some methods so you don't have to).
  • Configuration files are in JavaScript for easy scripting and customization.
  • Blocks HTML ads, image ads, flash ads, you name it.
  • Faster browsing because you don't wait for ads to load (and ads are usually given first priority in the loading sequence when you visit commercial websites, so your pages render faster).

You're going to need to be running a webserver that will serve a blank page for the blocked ads. Without this your browser will report a 404 error because it can't access the blocked pages, and your webpages will have 404 errors rendered where the ads used to be - not so pretty to look at. If you're on Mac, you've already got Apache installed and ready to go. If you're on Windows, you're going to have to set that up and then come back here when you're done.

[ Aside: I've been thinking that you don't really need to run a local webserver at all, if you have access to one on the web. That would allow any number of people on any network to benefit from a single filter file. ]

Getting started

Webpage with Ads
Webpage without Ads
First off, here's a little taste of how your new browsing experience will look. On the left is a regular webpage followed by the same page minus the ads. Ad free goodness!

The setup is simple.

  1. You have an Apache virtual host listening on a non-standard port, say 61111. The virtual host does only one thing: it rewrites all requests to point to an empty HTML file, which I have named response.html.
  2. You also have a proxy configuration file, or PAC, which you configure your browser to use. All requests from the browser are sent to this PAC file. The PAC file examines the request and if it looks like it's an advertisement it sends the request to our apache virtual host (which will render a blank page...so no ad will display). Otherwise the request is served by the browser in the normal fashion.

Setup Apache

You're going to need to modify the paths according to where you put these files, and the setup of your machine. These instructions are particular to Mac OS X, but the Windows setup is analagous.

On Mac, I think the easiest place to put our PAC file and the response.html file is in your Public/ directory which is under your home directory. Extract this archive (ad-blocker.zip) which contains four files: favicon.ico, proxy.conf, response.html as well as a copy of ad-blocker.conf (the Apache virtual host configuration file).

Configure the virtual host
You should have a directory under /etc/apache2/ called other/. You can define virtual hosts in their own conf files here and they will be read by Apache on startup.

In /etc/apache2/other/ create a file called ad-blocker.conf with the following contents (this file is also contained in ad-blocker.zip):

  ## Use Listen directives and not Port directive if server will handle
  ## requests from multiple ports.
  Listen 127.0.0.1:61111

  <VirtualHost *:61111>
  DocumentRoot "/Users/Karl/Public/ad-blocker"
  <directory "/Users/Karl/Public/ad-blocker">
    Order allow,deny
    Allow from all
    Options +Indexes
  </directory>
  ErrorLog /var/log/apache2/error_log_ads
  SetEnvIf Request_URI .* no-access-log
  CustomLog /dev/null common env=!no-access-log
  RewriteEngine on
  RewriteRule ^(.*) /Users/Karl/Public/ad-blocker/response.html
  </VirtualHost>
Now modify the DocumentRoot and <directory ...> with the full path to the ad-blocker directory you added to Public, and modify the RewriteRule directive to point to the response.html file.

Start Apache
Mac OSX: Enable Web Sharing
On Mac, just enable Web Sharing under System Preferences -> Sharing. Apache will start, and if you visit http://127.0.0.1:61111 you should see a blank page (with no errors or anything else). If you get an error like You don't have permissions to access / on this server you may permissions problems on your ad-blocker directory, you may have messed up your <directory ...> directive path, or you may need to enable File Sharing on the directory and give permissions to Everyone to Read.

Apache should be up and running and ready to serve ads.

Now the easy part.

Configure your browsers

We need to configure the browser to use our proxy configuration file (PAC file). For each browser, open your Preferences and configure the path to the proxy.conf file for your machine using my screenshots as a guide.


Firefox
Open Preferences->Advanced->Network. Then under Connection, click 'Settings...'
Select 'Automatic proxy configuration URL' and enter the path to the proxy.conf file

Opera
Open Preferences->Advanced->Network and click on 'Proxy Servers'

Select 'Use automatic proxy configuration' and enter the path to the proxy.conf file

Safari
Open Preferences->Advanced and click 'Proxies: Change Settings...'

This should bring up your OS X Network->Proxies settings. Select 'Configure Proxies: Using a PAC file' and enter the path to the proxy.conf file

Alternatively you can get there from OS X System Preferences->Network click on 'Advanced' then 'Proxies'

The PAC file

Here's the proxy configuration (PAC) file if you just want to have a look at how it's setup. I've modified the original file that I reference in the comments by adding two really useful JavaScript functions, domainMatch() and subdomainMatch(). These perform regular expression matching on the URL and can filter URLs by domain e.g. domainMatch('doubleclick.net') or by subdomain e.g. subdomainMatch('ads'). This makes it much cleaner and more powerful than calling shExpMatch(host, ...) every time and using sub-par pattern matching.

Here's the code...

  /**
   * Proxy configuration file to block ads.
   * 
   * From <http://hydra.nac.uci.edu/indiv/ehood/gems/ad-blocking.html>
   * Host matching adapted from hostname list at
   *    <http://www.ecst.csuchico.edu/~atman/spam/adblock.shtml>
   * Regular-expression matching functions added by Karl Varga
   *    <kjvarga.blogspot.com>
   *
   * The proxy is assumed to be listening on 127.0.0.1:61111.  Change
   * the return "PROXY ..." statement near the end of the file to
   * suit your local configuration.
   */



  /**
   * Add an escape function to the RegExp object.
   * 
   * @see http://simonwillison.net/2006/Jan/20/escape/
   */
  RegExp.escape = function(text) {
    if (!arguments.callee.sRE) {
      var specials = [
        '/', '.', '*', '+', '?', '|',
        '(', ')', '[', ']', '{', '}', '\\'
      ];
      arguments.callee.sRE = new RegExp(
        '(\\' + specials.join('|\\') + ')', 'g'
      );
    }
    return text.replace(arguments.callee.sRE, '\\$1');
  }

  /**
   * A domain is matched if it is preceeded by one or more subdomains,
   * or no subdomains.  A domain does not match a similar domain with text
   * prepended.
   */
  function domainMatch(host, domain) {
    var regex = new RegExp('^([^.]+\\.)*' + RegExp.escape(domain) + '$', 'i');
    return regex.test(host);
  }

  /**
   * Match any subdomain.
   */
  function subdomainMatch(host, subdomain) {
    var regex = new RegExp('^.*[.]' + RegExp.escape(subdomain) 
        + '[.].*$|^' + RegExp.escape(subdomain) + '[.].*$', 'i');
    return regex.test(host);
  }

  /**
   * Called by the browser to determine whether to proxy the request or not.
   */
  function FindProxyForURL(url, host) {

    if (

      /**
       * Domains.
       */
      domainMatch(host, "PostMasterBannerNet.com") ||
      domainMatch(host, "adbureau.net") ||
      domainMatch(host, "admaximize.com") ||
      domainMatch(host, "admex.com") ||
      domainMatch(host, "alladvantage.com") ||
      domainMatch(host, "avenuea.com") ||
      domainMatch(host, "bizservers.com") ||
      domainMatch(host, "burstnet.com") ||
      domainMatch(host, "click2net.com") ||
      domainMatch(host, "clicktrade.com") ||
      domainMatch(host, "commision-junction.com") ||
      domainMatch(host, "digitalriver.com") ||
      domainMatch(host, "doubleclick.net") ||
      domainMatch(host, "eads.com") ||
      domainMatch(host, "extreme-dm.com") ||
      domainMatch(host, "flycast.com") ||
      domainMatch(host, "focalink.com") ||
      domainMatch(host, "freestats.com") ||
      domainMatch(host, "hitbox.com") ||
      domainMatch(host, "iadnet.com") ||
      domainMatch(host, "imaginemedia.com") ||
      domainMatch(host, "imgis.com") ||
      domainMatch(host, "link4ads.com") ||
      domainMatch(host, "mediaplex.com") ||
      domainMatch(host, "netdirect.nl") ||
      domainMatch(host, "ngadcenter.net") ||
      domainMatch(host, "oneandonlynetwork.com") ||
      domainMatch(host, "preferences.com") ||
      domainMatch(host, "targetshop.com") ||
      domainMatch(host, "teknosurf2.com") ||
      domainMatch(host, "teknosurf3.com") ||
      domainMatch(host, "trix.net") ||
      domainMatch(host, "valueclick.com") ||
      domainMatch(host, "websitefinancing.com") ||
      domainMatch(host, "2mdn.net") ||
      domainMatch(host, "brandreachsys.com") ||
      domainMatch(host, "fastclick.net") ||
      domainMatch(host, "eyewonder.com") ||
      domainMatch(host, "clicktorrent.info") ||
      // domainMatch(host, "yimg.com") ||
      domainMatch(host, "pop6.com") ||
      domainMatch(host, "adinterax.com") ||
      domainMatch(host, "atdmt.com") ||
      domainMatch(host, "fling.com") ||
      domainMatch(host, "serving-sys.com") ||
      domainMatch(host, "fuelbuck.com") ||
      domainMatch(host, "blogads.com") ||
      domainMatch(host, "doublepimp.com") ||
      domainMatch(host, "etology.com") ||
      domainMatch(host, "adshuffle.com") ||
      domainMatch(host, "awempire.com") ||
      domainMatch(host, "adjuggler.com") ||
      domainMatch(host, "atdmt.com") ||
      domainMatch(host, "edgesuite.net") ||

      /**
       * Subdomains
       */
      subdomainMatch(host, "ads") ||
      subdomainMatch(host, "ads0") ||
      subdomainMatch(host, "ads1") ||
      subdomainMatch(host, "ads2") ||
      subdomainMatch(host, "ads3") ||
      subdomainMatch(host, "ads4") ||
      subdomainMatch(host, "ads5") ||
      subdomainMatch(host, "banners") ||
      subdomainMatch(host, "banner") ||
      subdomainMatch(host, "adcontroller") ||
      subdomainMatch(host, "click") ||

      /**
       * Hostname Patterns
       */
      shExpMatch(host, "*-ad.*") ||
      shExpMatch(host, "*adlink.*") ||
      shExpMatch(host, "ad-*.com") ||
      shExpMatch(host, "ad.*") ||
      shExpMatch(host, "ad0*") ||
      shExpMatch(host, "adcontroller*") ||
      shExpMatch(host, "adcreatives*") ||
      shExpMatch(host, "adex*") ||
      shExpMatch(host, "adforce*") ||
      shExpMatch(host, "adfu.*") ||
      shExpMatch(host, "adimage*") ||
      shExpMatch(host, "adimg*") ||
      shExpMatch(host, "admedia*") ||
      shExpMatch(host, "adpick*") ||
      shExpMatch(host, "adremote*") ||
      shExpMatch(host, "ngads*") ||
      shExpMatch(host, "nsads*") ||
      shExpMatch(host, "ph-ad*") ||
      shExpMatch(host, "realads*") ||

      /**
       * URLs
       */
      shExpMatch(url, "*.weather.com/*/ads/*") ||
      shExpMatch(url, "*/adimages/*") ||
      shExpMatch(url, "*/adsmanager/*") ||

      false   
    ) {

      // Proxy the request
      return "PROXY 127.0.0.1:61111";

    }

    // Let the browser handle it
    return "DIRECT";
  }  

Friday, May 22, 2009

makePositioned: A jQuery extension function to dynamically position an element near another element

I've recently been working on a dynamic select/auto-complete list (which I'll post about soon) and I had to position the dynamically-created div under the input when the user enters some text. jQuery makes it quite easy to position elements in this manner because you have access to elements position and size, but who wants to go to the hassle every time?

So I've created a jQuery extension function called makePositioned which is called on the element you want to position and accepts two arguments: the alignment position (either top, right, bottom or left) and the element to position it against. You would usually use this for positioning dynamically created content, for instance help popups beside form input fields, ajax feedback icons etc.

Let me know if you find this useful, or if you add support for more alignment options. top aligns above left, right aligns top right, bottom aligns bottom left, and left aligns top left. It doesn't do any fancy checking to see if there is room in the viewport below the element, but that would be a nice feature to add.

Here's a demo. Check out the code below.

Click the buttons to position this div.

The Code

<script type="text/javascript">
/** 
 * Extend jQuery.
 *
 */
jQuery.fn.extend({

  /**
   * Position the first element in the jQuery list near another element 
   * using absolute positioning. The element should already have the 
   * proper z-Index set.
   * 
   * @param string align 'bottom' for bottom left, or 'right' for top right,
   *    'left' for top left, 'top' for above left.
   */
  makePositioned: function(align, element) {
    var first = this.eq(0);
    var pos, height, width, left, top, thisHeight, thisWidth;
    pos = element.offset();
    height = element.outerHeight(), width = element.outerWidth();
    left = pos.left, top = pos.top;
    thisHeight = first.outerHeight(), thisWidth = first.outerWidth();
    
    switch (align) { 
      case 'bottom':
        top += height;
      break;
      case 'right':
        left += width;
      break;
      case 'left':
        left = left - thisWidth;
      break;
      case 'top':
        top = top - thisHeight;
      break;
    }

    first.css({ 
      top: parseInt(top)+'px', 
      left: parseInt(left)+'px',
      position: 'absolute'
    });
    
    return this;
  }

});
</script>

Wednesday, May 6, 2009

Keep using TextMate unregistered, indefinitely (forever, aka eternity)

Ok, this one is real easy. I didn't come up with it, but I often forget where the file is that I'm supposed to delete so I thought I'd put a note here and never forget again...until next month :)

This is a one-step process. Not two, not three. Just one. Delete the following file:

~/Library/Preferences/com.macromates.textmate.plist
Done already? You'll lose your preferences, but if you're too skint to buy this stuff then you don't deserve to have preferences.

If you're wondering what will happen next, you'll get 30 more days to evaluate TextMate. See you in 30 days :)

Wednesday, April 29, 2009

jQuery extensions to support highlight effects on hidden elements

I was disappointed with the lack of support for show, hide, fade and highlight effects on hidden elements in jQuery. By hidden I mean that they have visibility: hidden and not display: none (so they still take up space in the DOM but are not "visible"). Almost all of jQuery's functions to hide and show elements toggle their states between display: block; and display: none;. This is alright in some cases, but sometimes you want the element to still take up DOM space.

Even the :hidden selector doesn't work on visibility: hidden elements. So I've added an extension function called hidden() which returns a jQuery list of elements with visibility: hidden. So if you want to test if something is hidden you can do if ($('#element').hidden().length) { ... }, as well as call regular jQuery methods on the result like so $('#table td:hidden').show();.

Because most of the time I want to highlight elements when I show/hide, the extension functions I have written use the jQuery Highlight effect. So you will need that to run this code. You will also need the UI Core (required for all effects) and of course the jQuery library.

Another annoying thing is that the jQuery UI highlight effects don't work on table rows. I suspect it's because table rows have a display mode of display: table-row; as opposed to the usual display: none;. Because I often need to highlight show/hide table rows, I've added support for them in my extension functions.

There are four highlight methods:

  • highlightShow: highlight and fade in (show) an element or elements.
  • highlightFade: highlight and fade out (to visibility: hidden) an element or elements.
  • highlightHide: highlight, fade out and hide (display: none) an element or elements.
  • highlightRemove: highlight, fade out and remove from the DOM an element or elements.
Let me know if you've found this useful! Enjoy the slick JavaSripty goodness!

The Demo

A Regular DIV

Wrapper
This is the text we will fade in and out.

A Table

NameOccupationHeightRemoveFade
Karl VargaProgrammer6'
Karl VargaProgrammer6'
John EdwardStudent10' 5"

The JavaScript code for the table effects uses event delegation to bind events to the checkboxes. This means that you can bind an event handler to a top-level element. When sub-elements receive events they bubble up the DOM to the parent, which handles it. This way, I just have to write a handler for the remove and fade events on the table and when I add new rows to the table those new rows will also support the remove and fade events because the handler is not bound to each individual checkbox, but rather to the parent table. Neat!

It uses the jQuery Listen plugin and here is the JavaScript snippet:

<script type="text/javascript">

$().ready(function() {

  $('#demo-table').listen('click', 'input.remove', function(e) {
    $(this).closest('tr').highlightRemove(2000); 
  });
  
  $('#demo-table').listen('click', 'input.fade', function(e) {
    $(this).closest('tr').highlightFade(2000); 
  });
  
  $('#add-row').click(function() {
    var hidden = $('#demo-table tr:hidden'); 
    var clone = hidden.clone(); 
    hidden.after(clone); 
    clone.highlightShow(2000); 
  });
  
});
</script>

The Code

Here are my jQuery extensions. If you are using jQuery in no conflict mode you will have to include this JavaScript before your noConflict call, or do a search-and-replace on the $ function.

<script type="text/javascript">

jQuery.fn.extend({
  
  /**
   * Highlight and fade out an element to visibility: hidden so that it still takes up DOM space.
   * Works for table rows (which use display: table-row as opposed to the usual display: block).
   *
   */
  highlightFade: function(speed) {
    this.each(function() { 
    $(this).effect('highlight', { mode: 'hide' }, speed, function() {
      // retore the elements visibility and display type
      this.style.visibility = "hidden";
      if ($(this).is('tr')) {
        this.style.display = 'table-row';
      } else {
        this.style.display = 'block';
      }
    });
  });
  return this;
  },

  /**
   * Highlight and fade out an element to display: none.  Just a wrapper for effect('highlight', ...)
   * for completeness.  Use highlightFade() to highlight and fade but still maintain visibility.
   */
  highlightHide: function(speed) {
    this.effect('highlight', { mode: 'hide' }, speed);
  return this;
  },

  /**
   * Highlight and fade in an element.  Works for visibility: hidden elements as well 
   * as elements with display: none.  jQuery highlight doesn't work on table rows, so
   * we apply the effect to the row cells.
   */
  highlightShow: function(speed) {
    this.each(function() {
    if ($(this).hidden().length) {
      this.style.display = "none";         // highlight only works when display is none
      this.style.visibility = "visible";
    }
    var apply_to = $(this);
    if ($(this).is('tr')) {
      apply_to = apply_to.find('td');
    }
    apply_to.effect('highlight', {}, speed);
  });
  return this;
  },

  /**
   * Highlight fade out and remove.  jQuery highlight doesn't work on table rows, so
   * we apply the effect to the row cells.
   */
  highlightRemove: function(speed, callback) {
  this.each(function() {
    var original_target = $(this);
    var apply_to = original_target;
    if ($(this).is('tr')) {
      apply_to = apply_to.find('td');
    }
    apply_to.effect("highlight", {mode: 'hide'}, speed, function() { 
      original_target.remove();
      if (callback != undefined) {
        callback.call(this);
      }
    });
  });
  return this;
  },

  /**
   * Return elements which have visibility: hidden.
   * The jQuery :hidden selector only matches elements with display: none.
   */
  hidden: function() {
  var hidden = [];
  this.each(function() {
    if (this.style.visibility == 'hidden') {
      hidden.push(this);
    }
  });
  return $(hidden);
  }
});'

</script>

Wednesday, April 22, 2009

Output JavaScript date in human-readable format (Day Month DD YYYY HH:MM:SS PM)

You often need to display the user's current date and time on a website in a custom format. Unfortunately JavaScript doesn't have (many) date formatting functions built-in so you either have to use someone else's JS date formatting "library" or roll your own.

For simplicity I rolled my own. And since this is common thing that we all need to do from time to time I thought I would share my code in case you find it useful.

The Demo

(The format the date appears in is Day Month DD YYYY HH:MM:SS PM.)

The date and time right now is

The Code


Tuesday, April 21, 2009

Simple JavaScript expanding text input

Today my colleague had a requirement for a simple expanding text input on a website form. By expanding I don't mean that it gets bigger as you type, just that when you click/focus into the text input it appears to grow giving you a larger multi-line textarea in which to post a comment. Unfortunately I couldn't whip out jQuery (which is my new favourity JS library after using Prototype for some time) because we just needed this one bit of DHTML.

Requirements

  • When user clicks/focuses on the input show a larger textarea.
  • The textarea must appear above the other form elements but leave some of the submit button showing.

The Demo

Click in the input to see it grow.

The Code



<style type="text/css">

/**
 * Styles for the expanding / contracting input
 */

#smallInput {
 height: 30px;
 width: 250px;
 overflow: hidden;

}
#largeInput {
 display: none;
 height: 80px;
 width: 350px;
 overflow-x: hidden;
 overflow-y: auto;
 word-wrap: break-word;
 z-index: 1000;
}

</style>


Tuesday, March 24, 2009

AppleScript / Automator: Import AVI video to iTunes as a TV Show or Movie

This is a sweet AppleScript / Automator Workflow that I created which totally automates the task of importing an AVI video into iTunes.
  • Select one or more video files to import. Works on AVI as well as other formats like M4V.
  • Adds metadata to the AVI files to make them iTunes compatible.
  • Prompts if you would like to import into Movies or TV Shows.
  • Sets iTunes metadata to identify the video as Movie or TV Show depending on your choice.
Here is a screenshot of my workflow and here is the workflow file (in a tarball...workflows actually are directories it seems...).

Installing

  1. To install as a Finder plugin (so you can select files in Finder then secondary-click / control-click and choose Automator -> Import to iTunes), open the .workflow file in Automator and choose Save As Plug-in, then in Plug-in for select Finder and Save.
  2. To install as an application so you can drag and drop files onto the application, open the .workflow file in Automator and choose File -> Save As and in File Format choose Application.

The AppleScript code is here if you want to copy and paste it:

on run {input, parameters}
 set videoType to button returned of (display dialog ("What type of video are you importing?") buttons {"Movie", "TV Show"} default button {"TV Show"})
 repeat with i in input
  try
   tell application "Finder" to set file type of file i to "MooV"
  end try
  tell application "iTunes"
   set newAddition to (add (i as alias))
   if videoType = "TV Show" then
    tell newAddition to set video kind to TV show
   end if
  end tell
 end repeat
 return input
end run

Monday, March 23, 2009

Recursive chmod on files or directories

This is useful when, for example, you need to set execute permissions on every PHP or Ruby file in all directories below your current directory.

Just replace the '*.php' with the name pattern you want to match. Works on Mac OS X 10.5.6.

find . -type f -name '*.php' -exec chmod 755 {} \;

To see which files will be changed use:

find . -type f -name '*.php' -exec echo {} \;

The {} braces are replaced with each filename as find finds each file that matches your pattern. To find directories use -type d.

Friday, December 19, 2008

Using iText to convert TIFF to PDF and to combine multiple PDFs into one PDF

We recently had the requirement to take a Microsoft Excel file as well as one or more PDF "attachments" and 1) convert the Excel file to PDF, and 2) combine all the PDFs into a single PDF.

Unfortunately we were unable to find an open source Java solution for converting the Excel file to PDF.  If you want to extract information from the Excel file you can do so using Apache POI.  You can then generate a PDF file from this data using iText.  This wasn't an appropriate solution for us.  An alternative solution in the interim was to print the Excel file to TIFF using the Microsoft Document Image Writer and then convert the TIFF to PDF using iText.  The results look alright, not ideal, but workable.

Here are two Java methods to do it.  Let me know if you find a free Java solution to converting Excel to PDF.

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.List;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Image;
import com.lowagie.text.PageSize;
import com.lowagie.text.pdf.PRAcroForm;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.pdf.SimpleBookmark;

public class FileConversions {

    /**
     * Convert a TIFF file to a PDF.
     * 
     * @param tiffFile
     * @return
     * @throws DocumentException
     * @throws MalformedURLException
     * @throws IOException
     */
    public static byte[] convertTiffToPdf(byte[] tiffFile) 
            throws DocumentException, MalformedURLException, IOException {

        ByteArrayOutputStream outfile = new ByteArrayOutputStream();
        Document document = new Document(PageSize.A4.rotate());
        PdfWriter writer = PdfWriter.getInstance(document, outfile);
        writer.setStrictImageSequence(true);
        document.open();
        Image tiff = Image.getInstance(tiffFile);
        tiff.scaleToFit(800, 600);
        document.add(tiff);
        document.close();
        outfile.flush();
        return outfile.toByteArray();
    }

    /**
     * Combine multiple PDFs into a single PDF.
     *
     * @param pdfs
     * @param combinedPdfFile TODO
     * @throws IOException
     * @see http://itext.ugent.be/library/com/lowagie/examples/general/copystamp/Concatenate.java
     */
    public static void combinePdfFiles(List<byte[]> pdfs, File combinedPdfFile) throws Exception {

        PdfReader reader = null;
        Document document = null;
        PdfCopy  writer = null;
        ArrayList master = new ArrayList();
        int pageOffset = 0;

        for (byte[] pdf : pdfs) {
            int size = pdf.length;
            reader = new PdfReader(pdf);
            reader.consolidateNamedDestinations();
            int n = reader.getNumberOfPages();
            List bookmarks = SimpleBookmark.getBookmark(reader);
            if (bookmarks != null) {
                if (pageOffset != 0) {
                    SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset, null);
                }
                master.addAll(bookmarks);
            }
            pageOffset += n;

            if (document == null) {
                // step 1: creation of a document-object
                document = new Document(reader.getPageSizeWithRotation(1));
                // step 2: we create a writer that listens to the document
                writer = new PdfCopy(document, new FileOutputStream(combinedPdfFile));
                // step 3: we open the document
                document.open();
            }
            // step 4: we add content
            PdfImportedPage page;
            for (int i = 0; i < n; ) {
                ++i;
                page = writer.getImportedPage(reader, i);
                writer.addPage(page);
            }
            PRAcroForm form = reader.getAcroForm();
            if (form != null) {
                writer.copyAcroForm(reader);
            }
        }
        if (!master.isEmpty()) {
            writer.setOutlines(master);
        }
        if (document != null) {
            document.close();
        }
    }
}

Thursday, December 18, 2008

Simple request benchmarking of a Ruby on Rails application using ApacheBenchmarker

You can use ApacheBenchmarker which comes with your default Apache install.  You can find the ab.exe executable in C:\Program Files\Apache Group\Apache2\bin on Windows.

Usage: ab [options] [http://]hostname[:port]/path
Options are:
  -n requests     Number of requests to perform
  -c concurrency  Number of multiple requests to make
  -t timelimit    Seconds to max. wait for responses
  -p postfile     File containing data to POST
  -T content-type Content-type header for POSTing
  -v verbosity    How much troubleshooting info to print
  -w              Print out results in HTML tables
  -i              Use HEAD instead of GET
  -x attributes   String to insert as table attributes
  -y attributes   String to insert as tr attributes
  -z attributes   String to insert as td or th attributes
  -C attribute    Add cookie, eg. 'Apache=1234. (repeatable)
  -H attribute    Add Arbitrary header line, eg. 'Accept-Encoding: gzip'
                  Inserted after all normal header lines. (repeatable)
  -A attribute    Add Basic WWW Authentication, the attributes
                  are a colon separated username and password.
  -P attribute    Add Basic Proxy Authentication, the attributes
                  are a colon separated username and password.
  -X proxy:port   Proxyserver and port number to use
  -V              Print version number and exit
  -k              Use HTTP KeepAlive feature
  -d              Do not show percentiles served table.
  -S              Do not show confidence estimators and warnings.
  -g filename     Output collected data to gnuplot format file.
  -e filename     Output CSV file with percentages served
  -h              Display usage information (this message)

My simple benchmarking tests for my Ruby on Rails website.  I wanted to compare the performance of RoR over CGI with a new server instance created on each request versus requests over CGI proxied to a single long-running mongrel_rails server.  These tests do 10 individual requests, then 100 requests, 5 concurrently. Results are output in HTML.

ab -n 10 -c 1 -w http://new.varzyfamily.com/ > 10-requests.html ab -n 100 -c 5 -w http://new.varzyfamily.com/ > 100-5-concurrent-requests.html

For your information I'm running my mongrel_rails using God on port 3000 and I am proxying requests using the standard RoR .htaccess file as follows:

RewriteEngine On RewriteCond %{HTTP_HOST} ^new.varzyfamily.com$ RewriteRule ^(.*)$ http://127.0.0.1:3000%{REQUEST_URI} [P,QSA,L]

I have to adopt such an arcane setup because my host HostGator only supports RoR over CGI, which is not very performant.

The test results basically tell me what I already: never, ever run a RoR app over CGI where you're starting the server on every request! Holy smokes! For 100 requests (5 concurrent) over CGI average total request time was 16437 ms (ouch!!!) serving 0.3 reqs/sec. Talking to a proxied mongrel server fared much better with average total request time being 565 ms serving 8.61 reqs/sec.

Here is the output from the latter test.


This is ApacheBench, Version 2.0.41-dev <$Revision: 1.121.2.12 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 2006 The Apache Software Foundation, http://www.apache.org/

..done

Server Software:Mongrel
Server Hostname:new.varzyfamily.com
Server Port:80
Document Path:/
Document Length:4174 bytes
Concurrency Level:5
Time taken for tests:0.11609 seconds
Complete requests:100
Failed requests:0
Total transferred:457200 bytes
HTML transferred:417400 bytes
Requests per second:8.61
Transfer rate:39.38 kb/s received
Connnection Times (ms)
 minavgmax
Connect: 62 107 3062
Processing: 188 458 703
Total: 250 565 3765