Archive

Posts Tagged ‘affiliate systems’

Writing A Web Analytics Engine From Scratch

April 2nd, 2009

If you’re building a system that needs to track affiliate sales, you’ll need to integrate some form of analytics into your software. Your affiliates will want to see how many total visitors hit their link, total uniques, how they got there (search terms or referrer URL) and if they made a product purchase.

There are a few ways of doing web analytics - writing tracking code right into your application (affiliate landing page), using JavaScript, or looking at web logs. I’ll be focusing on the JavaScript version, which works similarly to Google analytics and Mint. I won’t even go into web log processing here, since although there is interesting information there, it’s not real time enough for our use but is a powerful way to “check” the other methods or even gather information on spider visits (frequency, times of day, etc.)

Our tracking code will be fairly simple:

  1. Grab any information from the URL (GET parameters) and server data (user agent, remote IP, referer [sic] URL)
  2. Record the information in a database
  3. Continue processing the page

Decide on a Tracking Method

If you’re embedding the tracking code directly into your application, it’s a matter of adding some code to your controller and creating a model (and associated tables) to store the visitor data. The reporting backend will work exactly the same. The pros here are you don’t have to deal with JavaScript and/or cross-browser problems, and there may be a performance benefit since there are fewer HTTP requests being made to your server. The cons are that any time you want to change the tracking code, you need to change the controller, and you lose the ability to use the same tracking code on different sites, or sites that aren’t yours. Typically you set up one application (and domain) for doing analytics and reporting, and you have multiple websites. If you only have one website, and don’t mind running your analytics and reporting there, I’d recommend embedding the tracking code in your controller.

Using JavaScript to record visitor information is relatively simple. We need to write a controller to handle the requests to record visitor information, and a model to do the actual recording. The client side is a small JavaScript snippet, which will extract some variables and make a GET request to our controller. We won’t be using any AJAX here, since we need to deploy this code to multiple sites and have only one analytics site (i.e. we run the code on www.domain1.com but have our analytics requests hitting analytics.anotherdomain.com) - this is cross-site scripting (XSS), and although we want to allow it in this case, your browser won’t! Pros of this method are the ability to deploy to multiple sites and consolidate analytics/reporting to one server, and the ability to change tracking code without re-deploying your application. Cons are JavaScript browser incompatibility and increased complexity and load due to many (small) requests.

My Analytics Solution

We’ll be writing a controller and model using the Kohana PHP framework, and the client-side JavaScript without a framework, since all it does is generate a request for a 1×1 pixel GIF. This is the same way Google analytics and Mint do it. So, on to the code.


Web Analytics Model

Our model will store time, IP, request and referer [sic] URL information. Here is the MySQL table:

CREATE TABLE IF NOT EXISTS Hits (
    id                  INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
    recorded_time        TIMESTAMP NOT NULL, -- Time the record was created
    ip                  INTEGER UNSIGNED NOT NULL,
    ua          VARCHAR( 200 ) NOT NULL DEFAULT '-', -- User agent
    request         VARCHAR( 200 ) NOT NULL DEFAULT '/',
    referer        VARCHAR( 200 ) NOT NULL DEFAULT '-', -- Referer
    is_unique           BOOLEAN NOT NULL DEFAULT TRUE
);

Hopefully there isn’t anything unclear there. I’ve created fields to record the time at which the hit happened, the IP (stored as an integer for compactness), the user agent string, the original request, the referer URL and whether this is a unique or not (has this person already visited our site.)

The model is equally simple:

class Click_Model extends Model {

    function __construct() {
        parent::__construct();
    }

    function create( $ip, $ua, $request, $referer, $is_unique=1) {
        $ret = false;
        $row = array();

        // Convert IP to integer
        $ip = $this->_ip_to_integer( $ip );

        // Either 0 or 1
        if( $is_unique > 0 ) {
            $is_unique = 1;
        } else {
            $is_unique = 0;
        }
        if( $ip > 0  ) {
            $row[ 'ip' ] = (int)$ip;
            $row[ 'ua' ] = $ua;
            $row[ 'request' ] = $request;
            $row[ 'referer' ] = $referer;
            $row[ 'is_unique' ] = $is_unique;
            $ret = $this->_create_if_not_exists( 'Clicks', $row );
        }
        return $ret;
    }

    /**
     * Converts a text IP address to an integer.
     **/

    function _ip_to_integer( $ip ) {
        $octets = split( '\.', $ip );
        return (int)( $octets[ 3 ] + $octets[2]*256 +
                      $octets[1]*256*256 + $octets[0]*256*256*256 );
    }

    /**
     * Inserts the row if it's new and returns the ID, or just returns the
     * ID if it already exists. The table must have a column called 'id'
     * that is the INTEGER AUTO_INCREMENT PRIMARY KEY style.
     **/

    function _create_if_not_exists( $table, $row ) {
        // Try to insert - if it doesn't exist we'll get an ID of zero
        $columns = join( ',', array_keys( $row ) );
        $placeholders = join( ',', array_fill( 0, count( $row ), '?' ) );
        $q = $this->db->query( "INSERT IGNORE INTO $table ($columns) ".
                               "VALUES ($placeholders)", array_values( $row ) );
        $ret = $q->insert_id();
        if( $ret == 0 ) {
            $q = $this->db->getwhere( $table, $row );
            if( $q->count() > 0 ) {
                $result = $q->result_array( false );
                $ret = $result[ 0 ][ 'id' ];
            }
        }
        return $ret;
    }
}

The model class is pretty straightforward. Since Kohana doesn’t support “INSERT IGNORE”, I had to roll my own version. The model only handles inserts - actual reporting and such are left out.


Web Analytics Controller

The controller only does one thing - validate and record the data passed to it, then return a 1×1 pixel GIF:

class Hit_Controller extends Controller {

    private $gif_data = "\x47\x49\x46\x38\x39\x61\x01\x00\x01".
                                "\x00\x80\xFF\x00\xFF\xFF\xFF\x00\x00".
                                "\x00\x2C\x00\x00\x00\x00\x01\x00\x01".
                                "\x00\x00\x02\x02\x44\x01\x00\x3B\x00";

    function __construct() {
        parent::__construct();
    }

    /**
     * Basically grab all the parameters, record in the database and return
     * some content.
     **/

    function index() {
        if( isset( $_GET[ 'ru' ] ) ) {

            $h_model = new Hit_Model();
            $ip = $this->input->server( 'REMOTE_ADDR' );
            $ua = $this->input->server( 'HTTP_USER_AGENT' );
            $h_model->record_click( $ip,
                                    $ua,
                                    $this->_get_elem( $_GET, 'ru' ),
                                    $this->_get_elem( $_GET, 'rf' ),
                                    $this->_get_elem( $_GET, 'u' ) );
        }

        // Return a 1x1 pixel transparent gif
        header( 'Content-Type: image/gif' );
        echo( $this->gif_data );
    }

    function _get_elem( $a, $k ) {
        $ret = '';
        if( isset( $a[ $k ] ) ) {
            $ret = $a[ $k ];
        }
        return $ret;
    }
}

The only validation we do here is check that the referer URL was passed (the ru variable in the GET string.)


Client-side JavaScript

The JavaScript that acts as our view (although nothing is displayed) and executes in the user’s browser is quite simple. It marshals the require parameters, then munges this into a request for a GIF. In order to tell the difference between a unique visitor and a pageview, we set a cookie upon first visit, which is then checked upon subsequent pageviews. Here’s our JavaScript:

function track() {
    var days = 7; // Number of days to keep cookie alive
    var ru = document.location.href;
    var rf = document.referrer;

    var rest = '';
    if( ru.length > 0 ) {
        if( rf == '' ) {
            rf = '-';
        } else {
            rf = urlencode( rf );
        }

        // If there's a query string, grab it and stick all the parameters on the
        // end.
        var qstring = ru.split( '?' );
        if( qstring.length > 1 ) {
            rest = qstring[ 1 ];
        }

        ru = urlencode( ru );
        rf = urlencode( rf );
        var clicked_time = new Date();
        clicked_time = Math.round(clicked_time.getTime()/1000);

        // Build data.
        var d = 'rf=' + rf;
        if( ru.length > 0 ) {
            d += '&ru=' + ru;

       }
        if( rest.length > 0 ) {
            d += '&' + rest;
        }
        d += '&ct=' + clicked_time;

        // If the cookie already exists for this bonus code, this isn't a unique hit
        var unique = 1;
        old_cookie = readCookie( 'analytics_unique' );
        if( old_cookie != null && old_cookie != "" ) {
            unique = 0;
        }

        // Set cookie.
        setCookie( 'analytics_unique', 'visited', days, '/' ); // For uniqueness

        d += '&u=' + unique;
        // Now request the 1x1 pixel gif to record the click.
        (new Image()).src =  'http://your.analytics.site.com/click.gif?' + d;
    }
    return true;
}

function setCookie( name, value, days, path ) {
    var date = new Date();
    date.setTime( date.getTime() + ( days*24*60*60*1000 ) );
    var expires = "; expires=" + date.toGMTString();
    document.cookie = name + '=' + value + expires + '; path=' + path;
}

function readCookie(cookieName) {
    var theCookie=""+document.cookie;
    var ind=theCookie.indexOf(cookieName);
    if (ind==-1 || cookieName=="") return "";
    var ind1=theCookie.indexOf(';',ind);
    if (ind1==-1) ind1=theCookie.length;
    return unescape(theCookie.substring(ind+cookieName.length+1,ind1));
}

function deleteCookie( cookieName ) {
    if( readCookie( cookieName ) ) {
        setCookie( cookieName, '', 0, '/' );
    }
}

function urlencode(str) {
    str = escape(str);
    str = str.replace(/\+/g, '%2B');
    str = str.replace(/%20/g, '+');
    str = str.replace(/\*/g, '%2A');
    str = str.replace(/\//g, '%2F');
    str = str.replace(/@/g, '%40');
    return str;
}

There are a few convenience methods for reading/writing cookies and encoding the data so things don’t get screwed up when we request the image. The final piece is to add a rewrite rule so our controller gets hit with any requests to click.gif:

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_URI} ^/click\.gif\?
RewriteRule ^/click\.gif\?(.*) /hit?$1 [L]

The above just strips off all our GET parameters and feeds them to our hit controller, which we know returns a 1×1 pixel gif.


Extensions

You could extend the above to include more information about the user’s browser such as platform, Java-enabled, Flash version, JavaScript version or screen resolution. With some post-processing you’d be able to do geolocation on the user’s IP, and strip out keywords from search engines or PPC campaign variables. If you added a little more information to the uniqueness cookie, you’d be able to record bounce rate and time on page.

I’ve completely glossed over how the data should be presented to the users (your affiliates.) Most affiliate systems show total clicks, uniques and sales grouped by date, time of day or campaign ID. Of course, the main benefit of writing your own engine from scratch is you can offer affiliates things that other programs don’t show them such as referrer URL, search terms, PPC campaign variables and geographic location.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Google

Programming ,

Affiliate Links With No IDs

December 6th, 2008

I sent my previous article on avoiding link juice loss with affiliates to a friend and he sent me in the direction of linkconnector.com which has something they call “Naked Links” which are basically just links from the affiliate site back to the merchant site directly, with no subdomains or GET parameters. So I started thinking about how to do this, since it seems like an awesome idea. You end up with truly clean links to your merchant index page or even deeper, without any redirections at all.

One way of adding this functionality to your affiliate system (you being the merchant) would be to get your affiliates to register their sites in their affiliate accounts. You can get affiliates to prove that they own a site by getting them to upload a file with specific HTML content or add a CNAME record to their DNS (a la Google Apps for enterprise.) Then when a request comes in your server side script looks at the HTTP_REFERER (yes, spelled incorrectly just like the RFC) and sees which affiliate (if any) should get the cred. Then the script simply sets the variable in a cookie and gives the content with a 200. Affiliate tracking with no IDs or funky links. This method won’t work for cases where the affiliate is doing forum or article marketing, unless the affiliate registers the URL and adds an HTML comment or something to verify. And that won’t work in all cases, such as posting in forums and commenting in blogs - it would be first to post/comment and register their link.

If you wanted to do something like what link connector is doing, the merchant site behaves as a middleman here, sending a request to you to check the referrer URL for which affiliate ID to use. Pretty simple, although fault tolerance should be high on the priority list. The merchant site should not cause a denial-of-service attack against itself by opening too many remote connections.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Google

Building Sites

Avoiding Link Juice Loss With Affiliates

December 4th, 2008

Although geared more toward those building or working with larger systems that use affiliate links, I’m astounded how often I see large sites throw away link juice that their affiliates are giving them. In all honesty, building a simple affiliate system isn’t all that hard, and basically amounts to dropping a cookie based on the parameters of a given link. The parameters are typically an affiliate ID with some optional tracking variables to allow your affiliates to track specific campaigns and traffic sources (PPC, organic, banner.) The parameters can be embedded however you want, although typically they are in a query string of a GET request, such as:

http://www.somesitesellingstuff.com/affiliate?affid=bigsteve&campaign_id=first&media_id=ppc&product=footsoak

The contrived example above shows affiliate ‘bigsteve’ with a link from his ‘first’ ‘ppc’ campaign for a ‘footsoak’ product. Typically the links are more gross, but you get the picture. The parameters can also be encoded in the hostname itself. Clickbank encodes the product name and affiliate ID as a subdomain and a single 8-character tracking code into a “hoplink”:

http://bigsteve.footsoak.hop.clickbank.net/?tid=firstppc

The affiliate script simply strips out the appropriate variables, drops a cookie containing the variables and then returns the appropriate content. Pretty straightforward, except for that last part: delivering the content. On most sites, they return the content with an HTTP 200, which isn’t a good idea - it’s returning the same or really similar content for several different URLs. Don’t forget, most affiliates drop their link directly on their sites, forums and articles, rather than bouncing through a redirect. This effectively dilutes a bunch of free link juice!

The solution is to change your affiliate script slightly so instead of setting the cookie and serving the content, it redirects (301 please) to a logical page. In fact, the whole operation can be done using Apache mod_rewrite! A lot of this depends on what you’re selling and your site structure, but here’s an example that should make good sense:

RewriteEngine on
RewriteCond %{REQUEST_URI} /affiliate?productid=(\d+)&affid=(\d+)$
RewriteRule ^(.*)$ http://www.somerandomsite.com/product_detail/$1 [L,R=301,CO=affid:$2:.somerandomsite.com:20160]

The above rewrite rule determines if it’s a request for a product page with an attached affiliate ID. If so, it will 301 the user to the appropriate product detail page and set the cookie for the affiliate id (with a 14 day expiry, in minutes = 20160) at the same time. If you need to get more complicated, I’d recommend a script. The above rewrite will break if the parameters aren’t in the right order, and it does no error checking. You can technically redirect to where ever you want, but give the user continuity with an overlay. For example, if you want the link juice to go to your index page but want to show the product page to the person who clicked the link. The simplest way of doing that is to set a cookie for the ‘real’ page to display, redirect (301!) to the index page, and then have a snippet of code in the index page that does a Javascript overlay. Super easy to do with JQuery’s BlockUI plugin.

If you’re an affiliate and you’re interested in keeping your link juice instead of passing it on, that is possible too. Set up a link on your own site that will always do a 302 redirect to your affiliate link. This can be done very easily using mod_rewrite (example below.) Now if you want to drop a link somewhere, simply use your new link. Not only does it hide your affiliate link from the wandering eye, but it’s shorter to type, and looks more “friendly”. Here’s an example that rewrites any links to http://www.myaffiliatesite.com/footsoak-review/ to the appropriate affiliate link. Just add a line to your .htaccess file and edit accordingly:


RewriteEngine On
RewriteCond %{REQUEST_URI} ^/footsoak-review/$
RewriteRule ^(.*)$ http://www.somerandomsite.com/affiliate?product=footsoak&affid=12345 [L,R=302]

Just remember, if you’re the publisher you want to direct the link juice to the same page. Focus it with a 301 redirect. If you’re the affiliate, you want to stop the link juice at your own site (any site you own) by using a 302 redirect.

If you need help or have questions, please contact me.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Google

Building Sites ,