Getting out of the way

I have decided to step down as main maintainer of Lintian and pass the baton to Bastien Roucariès.  This is actually “fairly” old news, since I announced this almost a month ago.  However, I was not very vocal about it until now (so do not be surprised if you had not heard of this before).

In the past month, I never once regretted having stepped down.  If anything, I should probably have done it a bit earlier.  Beyond the lack of motivation, I also realised that I had become an “all talk and no do”-maintainer.  The kind that I have been advocating against publicly.  This may sound weird to a lot of people, who knows me as “the Lintian guy” or “Mr. Lintian” (or whatever Lintian-related nickname people gave me).  But the fact is that Bastien has done more for Lintian in the past month than I have in the past two.

Despite stepping down as main developer, I have not left Lintian completely.  I am still around for helping/explaining, uploading and keeping running.

Posted in Debian, Lintian | Leave a comment

Wheezy was brought to you by …

During the Wheezy freeze, the Debian release team deployed 3254 hints[1].  This number may include some duplicates (i.e. where two members of the team hinted the same package), it certainly does not include a lot of rejected requests <insert more disclaimers here>.

The top hinter was *drum roll*… Adam, who did 1799 hints(That is 55% of all hints during the freeze).  For comparison, the second and third runner ups added together did 1023 hints (or 31.4%).  Put in a different way, on average Julien Cristau and I would both add about 1.5 hints each day and Adam would on his own add 5.6 hints a day.

Of course, this is not intended to diminish the work other the rest of the team.  Reviewing and unblocking packages is not all there is to a release.  Particularly, a great thanks to Cyril Brulebois for his hard work on the Debian Installer (without which Debian Wheezy could not be released at all).


[1] Determined by:

  egrep -c 'done 201(3|2-?(07|08|09|10|11|12)) $HINT_FILE

It does not count hints, but the little “header” we add above our hints.  One header can apply to multiple hints (which is good, because udeb unblocks are usually paired with a regular unblock and we don’t want to count them as two separate hints).

Posted in Debian, Release-Team | 4 Comments

Getting space for more packages

In 2011, I wrote about how small files could consume a lot of space. I meant to do a follow-up on the savings but I forgot about it until now.

In 2.5.7, we started compressing some of the collected data files. Some of these are ridiculously compressable (#664794).  Even better, compressing them is sometimes faster than writing them directly to the disk, so in some cases it is a pure win/win.  For lintian.d.o, we also see a vast size reduction in overall size of the laboratory.

I have taken a few samples occasionally. The samples were done with du(1):

$ du -csh [--apparent-size] laboratory/*

Version/date du -csh –apparent-size
N/A – around 20 Mar 2012 (#664794) 16G 13G
2.5.6 (Fri Apr 27 2012) 14GB N/A
2.5.6 (Mon Jun 04 2012)) N/A 12G (Fri Sep 21 2012) 12G 8.3G
2.5.11 (Wed Jan 2 2013) 10G 6.1G

And the most awesome part of this? The comparison is quite biased against the 2.5.11 entry, which is the only entry to also process experimental (approx. 10% extra packages).  Some of the early entries (2.5.6 and “older”) might also have suffered from the “too many links” issue[1].  I only wish I had been better at collecting data points, so I could have made a proper graph of it.  :)

It sounds almost too good to be true, but if you look at the size of one of the linux-image packages[2], the space usage dropped from 27M to 15M between 2.5.5 to 2.5.9.   Currently it is squeezed down to 14M (tested with head of the git master branch).

[1] I believe is about 5-10% less binary packages processed for those runs.

[2] linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb

Posted in Debian, Lintian | Leave a comment

Performance bottlenecks in Lintian

Thanks to a heads up from Bastian Blank, I learned that Lintian 2.5.7 and 2.5.8 were horribly slow on the Linux binaries.  Bastian had already identified the issue and 2.5.9 fixed the performance regression.

But in light of that, I decided to have a look at a couple of other bottlenecks.  First, I added a simple benchmark support to Lintian 2.5.10 (enabled with -dd) that prints the approximate run time of a given collection.  As an example, when running lintian -dd on lintian 2.5.10, you can see something like:

N: Collecting info: unpacked for source:lintian/2.5.10 ...
N: Collection script unpacked for source:lintian/2.5.10 done (0.699s)

When done on linux-image, the slowest 3 things with 2.5.10 are (in order of appearance):

N: Collection script strings for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (12.333s)
N: Collection script objdump-info for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (15.915s)
N: Finished check: binaries (5.911s)

(The mileage (and order) probably will vary a bit.)

These 3 things makes up about 22 seconds of a total running time on approximately 28-30s on my machine.  Now if you wondering how 12, 16 and 6 becomes 22 the answer is “parallelization”.  strings and objdump-info are run in parallel so only the “most
expensive” of the two counts in practise (with multiple processing units).

The version of linux-image I have been testing (3.2.20-1, amd64) has over 2800 ELF binaries (kernel modules).  That makes the runtime of strings and objdump-info much more dominating than in “your average package”.   For the fun of it – I have done a small informal benchmark of various Lintian versions on the binary.

I have used the command line:

# time is the bash shell built-in and not /usr/bin/time
$ time lintian -EvIL +pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null
# This was used with only versions that did not accept -L +pedantic
$ time lintian -EvI --pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null

With older versions of Lintian (<= 2.5.3) Perl starts to emit warnings; these have been manually filtered out.  I used lintian from the git repository (i.e. I didn’t install the packages, but checked out the relevant git tags).  I had libperlio-gzip-perl installed (affects the 2.5.10 run).

Most results are only from a single run, though I ran it twice on the first version (hoping my kernel would cache the deb for the next run). The results are:

real    0m28.836s
user    0m36.982s
sys     0m3.280s

real    1m9.378s
user    0m33.702s
sys     0m11.177s

real    4m54.492s
user    4m0.631s
sys     0m30.466s

2.5.7 (not tested, but probably about same as 2.5.8)

real    1m20s   - 1m22s
user    0m19.0s - 0m20.7s
sys     0m5.1s  - 0m5.6s

I think Bastian’s complaint was warranted for 2.5.{7,8}.  :)

While it would have been easy to attribute the performance gain in 2.5.10 on the new parallelization improvements, it is simply not the case. These improvements only apply to running collections when checking multiple packages.  On my machine, the parallelization limit for a package is effectively determined by the dependencies between the collections on my machine.

Instead the improvements comes from reducing the number of system(3) (or fork+exec) calls Lintian does.  Mostly through using xargs more, even if it meant slightly more complex code.  But also, libperlio-gzip-perl shaved off a couple of seconds on “binaries” check.

But as I said, linux-image is “not your average package”.  Most of the improvements mentioned here are hardly visible on other packages.   So let’s have a look at some more other bottlenecks.  In my experience the following are the “worst offenders”:

  • unpacked (collection)
    • Seen on wesnoth-1.9 source. Here the problem seems to be tar+bzip2, so there is not really a lot to do (on the Lintian side). Though feel free to prove me wrong. :)
  • file-info (collection)
    • Seen in eclipse/eclipse-cdt source. file(1) appears to spend a lot of time classifying some source files. For eclipse-cdt, I experience an approx. 10 second speed up (from 40s to 30s) if file are recompiled with -O2. (That would be #659355).  However, even if file is compiled with -O2, the file-info collection is still the dominating  factor.
  • manpages (check)
    • Running man on manpages can be a dominating factor in certain doc packages. This is #677874 and suggestions for fixing it are more than welcome.

But enough Lintian for now… time to fix some RC bugs!

Posted in Debian, Lintian | 1 Comment

Kudos to Jakub Adam, Miguel Landaeta and James Page

Credit where it is due and I believe it is due for Jakub Adam for packaging eclipse packages.  If you use any of the eclipse packages provided the apt repositories for Wheezy or sid, it is very likely you have Jakub Adam to thank for it.

I also believe that Miguel Landaeta and James Page deserve praise for their work.  Miguel is to thank for the removal of libservlet2.4-java and updating its reverse dependencies – not in that order ;-).  James Page, on the other hand, has been introducing and updating a lot of packages, noticeably the jenkins packages.

Thank you and keep up the good work.

Posted in Debian | 1 Comment

Some sponsors are “evil and pedantic”

If you want to enable all Lintian tags, just remember the phrase:

Some sponsors are “evil and pedantic”

Or on the command-line:

 $ lintian -EvIL +pedantic ...

It works for Lintian 2.5.5 (and newer), which handles “pedantic” like other severities.  If you need help understanding the tags, you can add an extra “i” (-i) to “evil”.  That being said, remember that experimental (-E) and pedantic (-L +pedantic) tags are what they are for a reason. Also quite a few people will probably find verbose (-v) too noisy. However, leaving any of them out would have ruined the mnemonic. :)

Of course, you can also ask Lintian to enable all tags via your lintianrc file.  Here is a quick-start:

display-info = yes # or no
display-experimental = yes # or no
pedantic = yes # or no
verbose = yes # or no

Posted in Debian, Lintian | 2 Comments

Testing testing migration

If you have been following Lintian’s development closely, you will probably have noticed that I have not really done anything there for the past week. Instead I have turned my focus on our testing migration script, britney2. First, I have created a minimal test suite[1]. It started as 4 simple tests and by now it contains about 30 tests.

The size of each test is rather small; the largest tests are about 1600 binary packages in total[2], but most are 2-20 binary packages in total. Thus the test suite is rather fast compared to a “live data sample”, which easily takes more than 10 minutes for a single run. Unfortunately, hand-crafting the test data is somewhat annoying and easy to get wrong.

The test suite has a somewhat unfair focus on “auto-hint”[3] cases, so the current britney2 fails up to 14 tests. Some of these appears to fail because the auto-hinter (for some reason) receives incomplete information about the situation. To my knowledge we not been able to debug the situation, but Adam has a refactor branch that does not seem to have this issue. Personally I am hoping it will soon be merged into the master branch, especially because it seems to simplify a lot of common operations.

Joachim Breitner (who has been working on a SAT-solver based britney) also contributed a couple of test cases[4]. Allegedly, SAT-britney does rather well on the test suite, failing only 2 tests as far as I can tell[5]. On the other hand, it does solve a some of the more interesting cases britney2 does not solve.

On a more mathematical note, the britney2 implementation behaves like a function[6] with an attractive fixed point[7]. This is interesting, because for some cases it may take britney2 a couple of iterations to reach the right solution. This fixed point is somewhat simple to find by using the following steps (pseudo-code):

// Runtime complexity O(n * br * diff), where "n" is the number of iterations until
// a fixed point is reached, "br" is the complexity of "run_britney" and "diff" is
// the runtime of the "last != current" comparison.
function find_fixed_point(initial);
    last = run_britney(initial)
    current = run_britney(last)
    while last != current ; do
        last = current
        current = run_britney(last)
    return current

This gives us a simple way to test if britney will eventually solve the issue herself (and when she will do it). Currently britney2 is automatically run twice a day, so for every 2 iterations (beyond the first) roughly translates to a 24-hours delay. So far the test suite does not have a lot of problems that requires more than one iteration. Personally I would be pleased if it turned out to stay that way as the test suite coverage grows.

If you are interested in playing around with this, you can get sources from:

  • britney2
    • Currently only works in stable (i.e. requires python2.5 and python-apt < 0.8 or so)
    • See the INSTALL file for instructions
    • Adam’s branch
      • use the “p-u” branch.
  • SAT-britney
    • I haven’t tested this one and I do not know the requirements here
  • britney-tests
    • See the README file for instructions



[2] These tests are auto-generated, so it is merely an “up-scaled pattern”.

[3] Basically if two (or more) packages needs to migrate into testing at the exact same time, they need to be hinted in.

[4] Not to mention all the copy-waste errors he pointed out in mine. Apparently, SAT-britney has stricter requirements to the data than britney2. :P

[5] I assume the test called “sat-britney-death” (created by Joachim) was named that way for a reason. The second failure is caused by SAT britney not reading hints (yet?), so the “approve tpu package” test case should fail.

[6] A function that maps an “archive” into another “archive”… erh, I mean, it maps a set of packages into another set of packages… :P


Assuming my claim to be true, the function will have more than one fixed point. The obtained fixed point depends on the initial state of testing.

As an example:
– y depends on x
– x in testing has RC bugs

If x is not in testing, it cannot migrate to testing (due to its RC bugs). If x is not in testing, then y cannot migrate into testing. But if x starts in testing, then y may be able to migrate. This can happen if x migrated to testing before an RC bug was filed against it.

(Dis-)Proving my claim is an exercise left for the reader.

Posted in Debian, Release-Team | 3 Comments

Upcoming changes to Lintian (in 2.5.22)

The next version of Lintian, 2.5.22, is long overdue – mostly because 2.5.21 FTBFS in sid. Besides fixing test failures, 2.5.22 also fixes:

  • False-negative for most/all JAR related tags caused by file(1) changing its output for these files.
  • A performance regression in checks/ For large source packages, this could severely affect performance of Lintian.  This is #738342.
  • A regression where lintian would not parse options appearing after file arguments.
  • Some false-positives in the “GFDL with invariants” check.

Besides bug fixes, we also added a couple of new features/nice changes:

  • The –color option now defaults to “auto” instead of “never”.
  • The homepage now links to the generated API-docs and thus, also the developer tutorial.
    • Patches extending/improving either/both are more than welcome.
  • The test-suite runner now caches test artifacts meaning that subsequent test runs become vastly faster.
  • The test-suite runner can now run tests under “Devel::Cover“, so we can now collect statistics about how much of the code is covered by the test suite.
  • The “html_reports” tool (used e.g. on lintian.d.o) has been optimised. In particular, it only calls gnuplot twice, instead of once per data file (of which there was basically one per tag).
  • The html-pages generated by the “html_reports” tool now uses “content”-based names for resources (e.g. images and the .css file). This allows them to be cached vastly longer, since any content changes will rename the resource and force browser to refetch them.
    • All resources handled this way are now stored in “resources/”.  As a side-effect, this means that images files are moved from “images/” to “resources/”.
    • A notable exception to this is the auto-generated (.svg) graphs.

We just need to do some final tweaks and get the tag descriptions reviewed before we are ready to release the next version.

Posted in Debian, Lintian | Leave a comment

Release architecture meeting 2014-01-26

Today, we held an one-hour IRC-meeting to debate the status of the current architectures in sid.  During that one hour, we brought up all 13 architectures.  We made a decision on 9 of the architectures.  The remaining 4 will be revisited within 2 months.  As for the actual results, we will announce these on debian-devel-announce with our next bits mail.  I suspect the announcement to happen within a couple of days.

I am quite pleased that we managed to keep the meeting to one hour.  Especially considering that it was planned less than a week ago.  One of the things, that I believe helped us, was that we had summarised the status for each architecture prior to the meeting.  In my view, the summary allowed us to finish at least half of the architectures within their allocated 5 minutes.  In fact, we had 20 minutes for the last 2 architectures – partly because amd64 + i386 combined took about a minute.

A second thing that helped us stay on time was cutting debates short.  Obviously, with less than 5 minutes per architecture, there was little time for debate and, I admit, that was by design.  This meant that we had to defer 4 decisions for later.  We are currently waiting for information, which we hope to have soon.  More on that in the announcement later.
I would like to thank the attendees for contributing to a productive and short meeting.  It was a pleasure and I hope we can repeat the success next time. :)


Posted in Debian, Release-Team | Leave a comment

Jessie finally has less than 500 RC bugs

I am very pleased to say that RC bug counter for Jessie finally dropped below 500 RC bugs.  For comparison, we broke the 700 RC bug curve in the start of November, so that is about 200 RC bugs in about 7-8 weeks (or about 25-28 RC bugs per week).

Today, we have about 162 RC bugs on the auto-removal list.  Though, I suspect many of the affected packages have reverse dependencies, so their removal from testing may be up 30 days away.  Nevertheless, by this time in January, we could be looking at no more than 350 RC bugs left for Jessie.


Posted in Debian, Release-Team | 5 Comments

Automated reprocessing of packages on

Yesterday, I have managed to finish up an old pet peeve of mine.  I wrote a series of patches to have harness reprocess all packages, which were last checked by an older version of Lintian than the current.  It is not completed to the level I would like, but it is good enough for now.

The implementation works by processing up to 1024 groups after each daily run.  It also have a timer to skip this, if the run has taken more than 4 hours.  Packages to be reprocessed are sorted by the version they were last processed – should a new version of Lintian be upload before all packages have been reprocessed, the oldest results are refreshed first.  :)

A rough estimate suggests than in about 3-4 weeks, the entire archive will have been reprocessed with Lintian 2.5.20.  It is by no means faster than a full run (which is about a week), but it is automatic and removes the “human” factor in keeping Lintian results up to date.

Posted in Debian, Lintian | 2 Comments

Breaking the 700 RC bug threshold for Jessie

It looks like we have dropped (or will drop) below 700 RC bugs affecting Jessie today[1].  :)

On a related note, I got my eye on libfso-glib + fso-gsmd, samba + -samba4, mono and haskell.  By my count, each of the four groups will (once they migrate) bring at least 4 RC bug fixes with them to testing.

Besides that, we still have ~97 RC bugs in the autoremoval queue[2], so we could be looking at 600 RC bugs in a week or two.

Please keep up the steady flow of RC bug fixes; but also, please make sure your packages will actually migrate.  If you need help interpreting the excuses page, do not hesistate to ask – I will not bite.

If you want to help, please consider having a look at the “cleaner” or the “Release Team“-view on the UDD bug tracker.  Sadly not everything there contains actionable problems[3].

[1] says 700 while the IRC bot in #debian-devel-changes says we are at 692.



[3] E.g. the kfreebsd-8 bugs appear because kfreebsd-8 is listed in “Built-Using” for d-i, but the package has already been removed from testing.  So there is nothing to do with that (except possibly waiting for a new version of d-i).


Posted in Debian, Release-Team | 1 Comment

Breaking the RC bug curve

Early this month, the first batch of packages were automatically removed from testing due to RC bugs.  Personally, I find that the results stand out pretty well on the RC bug graph[1].  There has been a very rapid decline in the number of RC bugs affecting testing – a week or two ago, we had over 1000 RC bugs affecting testing and now it has dropped to something like 760.

There has also been quite a few RC bug fixes in this time period.  I spent a bit of time finding and aging some of the RC bug fixes in sid, so they would migrate to testing faster[2].  On the first batch, we saw something like 40 RC bugs disappear and we still have more RC bugs fixes in the sid->testing pipeline.

While looking for packages I could age, I also noticed that we have quite a few RC bug fixes blocked by “out of date” binaries (which is usually either build issues or uncoordinated transitions).  I have been filing a couple of RC bugs for these, but it does not scale do this kind of work manually.  If you have uploaded a package recently, please take the 5 minutes to check that it actually built and it has no “out of date” binaries issues in its excuses.


[2] The aging was equivalent to an urgency=medium, so reduced the requirement from 10 days to 5 days.


Posted in Debian, Release-Team | Leave a comment