Notes on debugging

Thu 18 February 2016

Debugging is a favourite subject of mine. It's an incredibly interesting area of programming but unfortunately not always a trivial one. It's also very important - it plays a significant part in programmer's productivity. No matter how many QA tools you have at your disposal, and no matter how hard you try to create bug-free code you will need to do debugging. It's just one of the intractable facets of programming.

Here's my attempt to define some concepts and guidelines. What are the tenets of debugging?

Find the cause first *

This is the most important, for many reasons. Yes, some issues can be quite hard to understand but with time, as you build up knowledge, it gets easier. If you just try random changes (shotgun debugging) in hopes that it will fix the problem then you're in fact depriving yourself of knowledge and assurance that you actually fixed the problem. It may seem easier, especially if you get lucky but overall you'll waste inordinate amounts of time with that "technique". Focus on finding the cause first.

Down the rabbit hole *

Unfortunately you have to accept the harsh reality that there are no shortcuts:

  • You'll have to read lots of code to understand what's going on. Yes, that includes other peoples code. You're using a framework? You'll have to read the code.

    Lots of people avoid this because it's hard and many frameworks are complicated. But understanding your libraries will pay off.

  • You won't see the problem by merely looking at the code [1] so you'll have to inspect the execution of the code and reason about it.

  • Not understanding the cause can be grueling and frustrating; it can take a while. But once you figure it out it's pleasant. It's totally worth the pain, plus next time it get easier - the accumulated knowledge helps. Don't give up!

Rooting it out *

Assuming you got the right tools, there are various "techniques" you can use.

Simpler programs are easier to understand than complicated programs with lots of moving parts.

One of the best approaches is to isolate the issue. Some call it divide & conquer. The idea is to reduce the amount of code you need to comprehend. It's similar to making a very good bug report:

  • Remove code or don't run code that's unnecessary or unrelated. If you're not sure, take it out. If the issue doesn't reproduce anymore then you know what to focus on. Just remove half of the code till you see where the problem is.

    For example, you'd do it when execution stops at some unknown point and you need to figure out where it is. It may indicate you don't have good tooling (like an adequate tracer).

  • If you're having a regression you can use git bisect (or hg bisect) to find the offending commit. That tool will run a command of your choosing till the first commit that makes it fail is found.

  • Isolate any environment issues. Try a different OS, library or Python version. Note that this can be misleading - if the problem doesn't reproduce with a different Python it doesn't mean that the issue is in the Python interpreter - it may very well be an issue with your code (example: misusing some API).

A more risky way to approach things is "proving the hypothesis": you make some theory about what might be wrong and then test it. There are some pitfalls:

  • It's only good if you got some intuition in the problem space. Otherwise you'd practically be doing shotgun debugging.
  • If you find a way to make the bug disappear but don't understand why do not stop there. Understanding the cause is paramount.

Another way is the "secret police" method: assume everything is bad, vet any component, including external ones. Usually this is a "last resort" thing for crazy issues.

Plan for disaster *

Everything is easier when you're prepared. When you write code, or choose what code to reuse have in mind that one fateful day you will have to deal with the choices you've made. There are two aspects you should consider.

Tooling *

Think about what tooling and information you'll have available when you need to:

  • Do post-mortem analysis. Things like logs, core dumps, stacktraces and other details you might not have access to otherwise.
  • Inspect process state (memory, object, files, sockets and other resources).
  • Trace or step through execution.
  • Quickly change code and run it again.
  • Inspect code (jump to definition, usage search)
  • Look at logs. Reconfigure logging easily to get the firehose [2] when you need it.

Some design choices make some tools harder to use. Examples:

  • Highly dynamic code make inspecting code harder. Not to mention reasoning about it ...
  • Some tools don't work with threads or other concurrency implementations like gevent or eventlet.
  • Some tools may depend on threads, which may interfere with other libraries (like gevent or eventlet).
  • Hardcoded configuration for logging.
  • Availability of debug builds for various things. Example: can you deploy your Python app with a debug build of the interpreter?
  • Can you install tools or dependencies easily?
  • Can you elevate permissions for debugging purposes? Various tools may need root access or special permissions.

Code *

Most certainly you'll have to read code, possibly not your own code. Factor that in when writing code or when choosing a framework or library, along other factors like documentation quality, number of users and how well it is maintained.

Code you don't understand might as well be technical debt!

Have these ideas in mind when writing/reviewing code, or when choosing what library to use:

  • If it's big, complicated and hard to understand it will cost you. Plan the time you'll need to spend understanding it. It's surprising how much time you'll waste on a library that's advertised to save time.

    It's not just the effort to understand the internals. Inordinate amounts of time are often wasted on shoehorning slightly different (but unplanned for) use cases. Big fancy frameworks optimize for typical use-cases and make everything else hard to accomplish. There are exceptions of course, but rare.

  • A community can help a lot. Having a place to ask for help means that you don't need that much upfront knowledge.

  • Code quality. Less bugs means less debugging right?

    There are many practices but here are few ones:

    • Code reviews.

    • Good exception handling. Don't discard or cripple exceptions. Give descriptive errors that indicates what needs to be fixed.

      Good error handling makes debugging more predictable, and less of a bottomless hole to sink your time in.

Plan for disaster, but live dangerously *

Safety don't present the same opportunities for acquiring knowledge. It's often necessary to do more tricky things, at least for the sake of getting the information you need while debugging, if not for figuring why that metaclass-using ORM don't work as you expect.

For example, you wouldn't normally do monkey-patching in your code, but maybe it's an option when otherwise you'd have to patch up lots of files.

On the other hand, experimentation is prone to failure so there's the open question of where's a good place to experiment ...

Ask for help but understand *

You got a bug, you ask your colleague and 5 minutes later it's sorted out. Don't leave it at that. Ask how and why. Understand the process used to find the case. Identify best practices if possible.

Integration tests pay off *

The interesting thing about tests it's not that they make debugging easier, it's that they force you to resolve bugs early. They give you a warning early so you can fix it at a convenient time. Compare that to a production issue that you gotta fix on a Friday night.

To make an analogy: it's like owning a car - you'd want to get warning indicators and service it at an opportune time rather than having it break down in the middle of nowhere.

Integration tests, and not unit tests because they reach out those areas between various components (interactions) where the nasty bugs are in. Simply put, production won't run just a function, it will a combination thereof. Debate about what's the right balance of unit tests vs functional tests still rages on, but you need to be aware that you can't simply go for a pure unit test suite and hope for the best.

Other peoples misery is gold *

Reading what issues other people had [3] is pure gold. There is simply no way you can accumulate all that knowledge only from the software you develop yourself. It's a huge boost to your intuition as well. Plus you won't feel that bad when you have a problem - there's always someone you had it worse.

If you have the time you can also watch videos or read other peoples blogs. Every so often there something describing an interesting problem or various obscure but useful details.

[1]If you do code reviews that is ...
[2]firehose noun Something that carries a crushing stream of DEBUG level logging messages.
[3]Make sure you look over the other lists too: https://github.com/danluu/post-mortems#other-lists-of-postmortems

This entry was tagged as debugging python