Proxying objects in Python

Mon 12 January 2015

A lazy object proxy is an object that wraps a callable but defers the call until the object is actually required, and caches the result of said call.

These kinds of objects are useful in resolving various dependency issues, few examples:

  • Objects that need to held circular references at each other, but at different stages. To instantiate object Foo you need an instance of Bar. Instance of Bar needs an instance of Foo in some of it methods (but not at construction). Circular imports sound familiar?
  • Performance sensitive code. You don't know ahead of time what you're going to use but you don't want to pay for allocating all the resources at the start as you usually need just few of them.

There are other examples, I've just made up a couple for context.

If you've used Django you may be familiar with SimpleLazyObject. For simple use-cases it's fine, and if you're already using Django the choice is obvious. Unfortunately it's missing many magic methods, most glaring omissions: __iter__, __getslice__, __call__ etc. It's not too bad, you can just subclass and add them yourself.

But what if you need to have __getattr__? The horrors of the infinite recursive call beckon.

Meanwhile I've noticed that wrapt has a quite complete object proxy. Unfortunately it's not really amendable to adding a lazy behavior in a subclass due to the C extension (I wouldn't make bets on sub-classing the pure-python proxy implementation either without some unwanted overhead :-).

Thus I forked the code and changed everything to have the lazy behavior. You can see the results here:

Part of that is a C extension packaging exercise but that's for another blog-post [2].

I've also done some benchmarks (with pytest-benchmark) [1]:

-- benchmark: min 5 rounds (of min 25.00us), 30.00s max time, timer: time.perf_counter --
Name (time in ns)                Min         Max      Mean    StdDev   Rounds  Iterations
test_perf[slots]            606.8182  26084.0909  627.7139   89.5553  1111112          44
test_perf[cext]              84.7701   2830.4598   86.2741    9.6827  1006712         348
test_perf[simple]           328.9474  11456.5790  334.8236   41.8470  1195220          76
test_perf[django]           409.5238  17969.8413  417.4172   49.9735  1158302          63
test_perf[objproxies]       880.0000  31256.6666  923.1323  106.3637  1111112          30

The slots and cext implementations are based on wrapt's code. I've named the pure Python implementation slots because that is the distinguishing implementation technique. And that was all I had in the beginning. I've wondered why Django's SimpleLazyObject is faster, by a significant margin even.

To find out what exactly is different I've made a primitive tracer:

import sys
import os
import linecache

from lazy_object_proxy.slots import Proxy
from django.utils.functional import SimpleLazyObject

def dumbtrace(frame, event, args):
    sys.stdout.write("%015s:%-3s %06s %s" % (
        linecache.getline(frame.f_code.co_filename, frame.f_lineno)
    return dumbtrace  # "step in"

for Implementation in Proxy, SimpleLazyObject:
    print("Testing %s ..." % Implementation.__name__)
    obj = Implementation(lambda: 'foobar')
    sys.settrace(None)  # we don't want to trace other stuff

And from that I've got:

Testing Proxy ...   call     def __str__(self):   line         return str(self.__wrapped__)    call     @property    line         try:    line             return __getattr__(self, '__target__')  return             return __getattr__(self, '__target__') return         return str(self.__wrapped__)
Testing SimpleLazyObject ...   call     def inner(self, *args):   line         if self._wrapped is empty:   line         return func(self._wrapped, *args) return         return func(self._wrapped, *args)

Essentially, the biggest difference is an extra function call (the __wrapped__ property).

Now I've thought to myself: I can do that too, using the cached property technique I could remove the second function call. But that trick needs a __dict__ - it can't work with __slots__. So I've proceeded to make an implementation that doesn't have that (the "simple" from the previous benchmark table). It was faster indeed but then I finally understood why Graham Dumpleton used __slots__ (while the tests started to fail).

Turns out he had replaced the normal __dict__ with a property [3], and proxying vars(obj) relies on having dict__ as a proxy property. In other words, you can't use vars on an object without a __dict__ (like most builtin types).

Interestingly enough, the implementation with __slots__ is much faster on PyPy [4]:

-- benchmark: 4 tests, min 5 rounds (of min 25.00us), 30.00s max time, timer: monotonic --
Name (time in ns)                   Min         Max     Mean   StdDev   Rounds  Iterations
test_perf[slots]                 2.1267    139.0987   2.3513   0.4176  1003345       13824
test_perf[simple]               24.0000   9981.7000  29.9561  37.2147  1250001        1000
test_perf[django]               25.1000  10186.4000  29.5746  26.3704  1195220        1000
test_perf[objproxies]           25.6000   9509.6000  30.2238  20.0922  1176471        1000

Now I'm a bit broken up about this, which implementation should be the default? Should the simple one be the default on PyPy?

[1]HTML output generated with ansi2html --inline --scheme=xterm. You can capture output with all the ANSI escapes codes by running script -c "command" output.txt.
[2]You can take a look at cookiecutter-pylibrary for now.
[3]See: wrapt/
[4]In case you're wondering what's with the different timer, the tests are done on PyPy (not PyPy3). That means no high precision timer, so I had to implement my own using clock_gettime(CLOCK_MONOTONIC) from __pypy__.time.

This entry was tagged as benchmark debugging django python