Pelican feed analytics

Fri 01 November 2013

There was always this annoying thing about RSS/Atom feeds - it's hard to track views or visits. You cannot directly use Google Analytics for feeds cause it wants to run JavaScript. Feedburner could be a solution but it looks like it's not maintained - the list of readers is the same since it got acquired by Google. I haven't seen anything new in it for a while - it looks like it is going to have the same fate as the late Google Reader. It's not particularly good at analytics anyway ...

We could make Google Analytics register page views in one of these ways:

  • Include a <img> tag linking to __utm.gif. Problem is that visitors cannot be tracked. All page views will show up as unique visits.
  • Make a request yourself to Google Analytics. There are libraries for this in Python: pyga and ga_app. Sadly, both of them use the old __utm.gif API.
  • Use some __utm.gif redirector service like nojsstats. It looks like it could work but it's closed source and neglected. It tries to set cookies to track visitors (so views don't show up as uniques) but doesn't succeed (it sets expired cookies - makes no sense).

Now neither of those are really likable solutions since they all use the old outdated (and plain horrible) __utm.gif API. If you use Universal Analytics you can use the Measurement Protocol which is way better. Compare that to those obscure __utm cookies ...

The Measurement Protocol documentation give POST requests as example, however, GET request work just as well and we can use that to implement a simple, better redirector.

I've implemented a simple redirector that sets a unique visitor cookie (the cid, see bellow) before redirecting. It's hosted here. You would use it like this:

<img src="//t.ionelmc.ro/UA-123456/domain.com?dp=/path/to/page&t=pageview&dr=http%3A//domain.com/referral/path/" width="1" height="1">

Note that the protocol (http: or https:) is missing. This will cause browsers to use the current page's protocol. This is very useful when you don't know what protocol you will use.

The rest of the parameters:

UA-123456
This is the Tracking ID
domain.com
This is the domain that you see in your Universal Analytics javascript tracking code.
t
Type of event.
dp
Page url that should show as viewed. You want this to be the page path that the feed item points to. In other words, the URL without the protocol and domain. Don't forget to quote it! [1]
dr
You can specify a different referral URL here. If you don't specify this then the Referer header is used. Set dr to empty value if you don't want this. Don't forget to quote it! [1]

None of the GET parameters are really required - you can add any parameters you want - they get passed to the collector API. I've just outlined the minimum set of parameters required for pageviews.

If you feel adventurous and want something else read the reference.

Notes:

  • cid is generated in the app (and saved with cookies).
  • The Tracking ID and domain are not in the arguments because they have a special role: they are the path for the visitor cookie. So each different combination of Tracking ID and domain will have a different cookie.

And please don't use my instance for anything serious. Make your own - the sourcecode is here and simple enough to understand.

Pelican integration *

Now, because I'm using Pelican I'm going to show how to make it add that little image to each item in the feeds.

The easiest way appears to be using a custom Writer that looks like this:

# tracker.py:
from os.path import sep
from pelican import Pelican
from pelican.writers import Writer, Markup, set_date_tzinfo

class TrackingWriter(Writer):
    def _add_item_to_the_feed(self, feed, item):

        title = Markup(item.title).striptags()
        feed.add_item(
            title=title,
            link='%s/%s' % (self.site_url, item.url),
            unique_id='tag:%s,%s:%s' % (self.site_url.replace('http://', ''),
                                        item.date.date(), item.url),
            description=item.get_content(self.site_url) + """
            <img src="//t.ionelmc.ro/%s/%s?t=pageview&dp=%s&referer=%s" width="1" height="1">
            """ % (
                self.settings['GOOGLE_ANALYTICS_ACCOUNT'],
                self.settings['GOOGLE_ANALYTICS_DOMAIN'],
                '/' + item.url.lstrip('/'),
                self.feed_url.replace(sep, '/'),
            ),
            categories=item.tags if hasattr(item, 'tags') else None,
            author_name=getattr(item, 'author', ''),
            pubdate=set_date_tzinfo(item.date,
                self.settings.get('TIMEZONE', None)))


class TrackingPelican(Pelican):
    def get_writer(self):
        return TrackingWriter(self.output_path, settings=self.settings)

And in the settings we put this:

PELICAN_CLASS = 'tracker.TrackingPelican'

Tested to work on Pelican 3.2 and 3.3, pageviews show fine in Google Analytics. Remeber that you need the Universal Analytics, not the Classic Analytics!

[1](1, 2) http://docs.python.org/2/library/urllib.html#urllib.quote

This entry was tagged as analytics appengine pelican python