Dealing with eval'd code

Thu 08 August 2013

eval and exec are usually frowned upon, for good reason. The main issues people complain about are:

  • Security: cause user input might end up in the eval'd string. And users are not to be trusted.
  • Slow: cause of the code parsing that eval/exec may need to do.
  • Hard to debug: tracebacks can get weird.

What about those tracebacks? Let's take this naive example:

func = eval('lambda a, b: a + b')

Not if we try func(1, "a") we'll get:

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    func(1, "a")
  File "<string>", line 1, in <lambda>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Notice that the traceback does not show the executed code from the lambda function. That's because there's no such file <string>. Of-course this is trivial and it's obvious what's wrong but what if you have more code there and seeing it would be really helpful?

The traceback module will use the linecache module to load the file. If we prefill the cache then we could get the line shown in the traceback. Something like this would work:

import linecache
import traceback
code = 'lambda a, b: a + b'
name = '<string-1>'
func = eval(compile(code, name, 'eval'))
linecache.cache[name] = len(code), None, [code], name

try:
    func(1, "a")
except Exception:
    traceback.print_exc()

Now we get:

Traceback (most recent call last):
  File "test.py", line 25, in <module>
    func(1, "a")
  File "<string-1>", line 1, in <lambda>
    lambda a, b: a + b
TypeError: unsupported operand type(s) for +: 'int' and 'str'

This seems fine. But what if we have many of these functions? linecache.cache would just leak. We can fix this with a weakref callback:

import weakref
func._cleanup = weakref.ref(func, lambda _: linecache.cache.pop(name, None))

Ok, but some tracebacks aren't displayed with the traceback module. If that exception would make the interpreter exit then it would be shown with sys.excepthook. The default sys.excepthook will use a function _Py_DisplaySourceLine that just tries to open the file. What now?

You could patch sys.excepthook function. This is the only solution on Python 2:

import sys
sys.excepthook = traceback.print_exception

But that will disable python-apport (it insists on overriding sys.excepthook for some reason, even if disabled). You probably don't care about that anyway.

Python 3 seems to use io.open to load the file. Since we've been naughty and played with eval, how about we patch io.open?

import io
import tempfile
filecache = {}
filecache[name] = code

def _patched_open(filename, *args, **kwargs):
    if filename in filecache:
        fh = tempfile.TemporaryFile()
        fh.write(filecache[filename].encode('utf8'))
        fh.seek(0)
        return fh
    else:
        return _old_open(filename, *args, **kwargs)

_old_open, io.open = io.open, _patched_open
# this is horrible compared to patching sys.excepthook

Why use tempfile when we could have used a io.StringIO object instead? Turns out that _Py_DisplaySourceLine from 3.3 will open the file with io.open and strangely enough will work with the file descriptor directly, instead of calling the right methods on the file object.

I think patching io.open is a bad idea in general. It will create some overhead for opening files and it might violate some expectation in 3rd party code like introspection tools (by having different function signature, module and file).

Here's the complete code:

import linecache
import traceback
import weakref
import io
import tempfile
import sys
sys.excepthook = traceback.print_exception # poor apport, no longer works now ...

count = 0
def get_function(code):
    global count
    count += 1
    name = '<string-%s>' % count
    func = eval(compile(code, name, 'eval'))
    linecache.cache[name] = len(code), None, [code], name
    return func

func = get_function('lambda a, b: a + b')

try:
    func(1, "a")
except Exception:
    traceback.print_exc()

func(1, "a")

You could apply the same tricks for something using exec, but what's this useful for? How about if you want to compile something like a mongo query to a python function (so you can run it on a list instead of a mongo collection without having the overhead of processing the query for each row)?

You will notice that there's a similar issue with zipimport: you won't see the sourcecode when displaying the traceback with the default sys.excepthook. However, it will show up when displaying with the traceback module because the linecache module will use the PEP302 imports hook that zipimport implements.

This entry was tagged as debugging python