Performance bottlenecks in Lintian

Thanks to a heads up from Bastian Blank, I learned that Lintian 2.5.7 and 2.5.8 were horribly slow on the Linux binaries.  Bastian had already identified the issue and 2.5.9 fixed the performance regression.

But in light of that, I decided to have a look at a couple of other bottlenecks.  First, I added a simple benchmark support to Lintian 2.5.10 (enabled with -dd) that prints the approximate run time of a given collection.  As an example, when running lintian -dd on lintian 2.5.10, you can see something like:

N: Collecting info: unpacked for source:lintian/2.5.10 ...
N: Collection script unpacked for source:lintian/2.5.10 done (0.699s)

When done on linux-image, the slowest 3 things with 2.5.10 are (in order of appearance):

N: Collection script strings for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (12.333s)
N: Collection script objdump-info for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (15.915s)
N: Finished check: binaries (5.911s)

(The mileage (and order) probably will vary a bit.)

These 3 things makes up about 22 seconds of a total running time on approximately 28-30s on my machine.  Now if you wondering how 12, 16 and 6 becomes 22 the answer is “parallelization”.  strings and objdump-info are run in parallel so only the “most
expensive” of the two counts in practise (with multiple processing units).

The version of linux-image I have been testing (3.2.20-1, amd64) has over 2800 ELF binaries (kernel modules).  That makes the runtime of strings and objdump-info much more dominating than in “your average package”.   For the fun of it – I have done a small informal benchmark of various Lintian versions on the binary.

I have used the command line:

# time is the bash shell built-in and not /usr/bin/time
$ time lintian -EvIL +pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null
# This was used with only versions that did not accept -L +pedantic
$ time lintian -EvI --pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null

With older versions of Lintian (<= 2.5.3) Perl starts to emit warnings; these have been manually filtered out.  I used lintian from the git repository (i.e. I didn’t install the packages, but checked out the relevant git tags).  I had libperlio-gzip-perl installed (affects the 2.5.10 run).

Most results are only from a single run, though I ran it twice on the first version (hoping my kernel would cache the deb for the next run). The results are:

real    0m28.836s
user    0m36.982s
sys     0m3.280s

real    1m9.378s
user    0m33.702s
sys     0m11.177s

real    4m54.492s
user    4m0.631s
sys     0m30.466s

2.5.7 (not tested, but probably about same as 2.5.8)

real    1m20s   - 1m22s
user    0m19.0s - 0m20.7s
sys     0m5.1s  - 0m5.6s

I think Bastian’s complaint was warranted for 2.5.{7,8}.  🙂

While it would have been easy to attribute the performance gain in 2.5.10 on the new parallelization improvements, it is simply not the case. These improvements only apply to running collections when checking multiple packages.  On my machine, the parallelization limit for a package is effectively determined by the dependencies between the collections on my machine.

Instead the improvements comes from reducing the number of system(3) (or fork+exec) calls Lintian does.  Mostly through using xargs more, even if it meant slightly more complex code.  But also, libperlio-gzip-perl shaved off a couple of seconds on “binaries” check.

But as I said, linux-image is “not your average package”.  Most of the improvements mentioned here are hardly visible on other packages.   So let’s have a look at some more other bottlenecks.  In my experience the following are the “worst offenders”:

  • unpacked (collection)
    • Seen on wesnoth-1.9 source. Here the problem seems to be tar+bzip2, so there is not really a lot to do (on the Lintian side). Though feel free to prove me wrong. 🙂
  • file-info (collection)
    • Seen in eclipse/eclipse-cdt source. file(1) appears to spend a lot of time classifying some source files. For eclipse-cdt, I experience an approx. 10 second speed up (from 40s to 30s) if file are recompiled with -O2. (That would be #659355).  However, even if file is compiled with -O2, the file-info collection is still the dominating  factor.
  • manpages (check)
    • Running man on manpages can be a dominating factor in certain doc packages. This is #677874 and suggestions for fixing it are more than welcome.

But enough Lintian for now… time to fix some RC bugs!

This entry was posted in Debian, Lintian. Bookmark the permalink.

One Response to Performance bottlenecks in Lintian

  1. Pingback: Niels Thykier: Performance bottlenecks in Lintian |

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s