Lintian 2.5.7 and 2.5.8

The new version of Lintian (2.5.8) can pretty much be summed up as:

Its like 2.5.7, only with less false positives and no FTBFS.

Especially people annoyed by the hardening flags will hopefully find that 2.5.8 greatly reduces the number of false positives.  I believe this is best demonstrated with an example:

$ lintian --print-version && lintian -q -C binaries amarok_2.5.0-1_i386.deb | wc -l 
2.5.7
94
[...]
$ lintian --print-version && lintian -q -C binaries amarok_2.5.0-1_i386.deb | wc -l
2.5.8
4

However, nothing comes for free.  We dropped hardening-no-stackprotector from the default profile (and demoted it to an “I” tag).  For hardening-no-fortify-functions we made a “false positive -> false negative” trade-off by ignoring binaries if their only unprotected function is memcpy.  For more information, please refer to #673112.

hardening-no-stackprotector is still available and can be used via the debian/extra-hardening profile (or via the –tags argument).  For the 2.5.7 behaviour of hardening-no-fortify-functions, you have to use hardening-check directly.

But 2.5.7 had other changes besides the myriad of false-positives:

  • Around 19 redundant tags were removed.

A consequence of this is that we no longer warn if you use things like “dpkg –assert-working-epoch” or your postinst creates a “usr/doc transition symlink”.

  • The last of the Lintian’s (perl) modules have been put under the Lintian name space.

With all the Lintian modules under the Lintian namespace, we can install them in the standard perl @INC path.  Admittedly, I am not certain we are ready to commit to the current API in these modules, which is one of the reasons why they are still installed in /usr/share/lintian.  But in a couple of releases things may look differently. 🙂

  • Lazy loading of data files

Lintain 2.5.7 ships over 50 data files of various kinds (usually white- or blacklists of some kind).  Before 2.5.7, all data files with a few exceptions would be loaded eagerly (i.e. as soon as the check was run for the first time).  A few data files had been special cased with “manual laziness”.

In 2.5.7, Lintian::Data was updated to lazily load the data.  So depending on the packages being checked, Lintian may now load fewer data files.

  • Create proper data files for tables embedded within checks.

While this is not something exclusive to 2.5.7, we have separated quite a few checks from their data tables by now.  My personal favorites are the table of known interpreters and (from 2.5.8) the table of known versioned interpreters.

  • Support for Vendor specific data files.

I am certainly biased here.  But this is probably the most awesome feature in Lintian 2.5.7.  It is now possible for vendors to extend or simply “shadow” the core Lintian data files.  Allow me to demonstrate how this can be used:

$ lintian --tags build-depends-on-obsolete-package a2ps_4.14-1.1.dsc 
E: a2ps source: build-depends-on-obsolete-package build-depends: dpatch
$ cat ~/.lintian/profiles/local/main.profile 
Profile: local/main
Extends: debian/main
$ cat  ~/.lintian/vendors/local/main/data/fields/obsolete-packages 
@include-parent
@delete dpatch
bison
$ lintian --profile local/main --tags build-depends-on-obsolete-package a2ps_4.14-1.1.dsc
E: a2ps source: build-depends-on-obsolete-package build-depends: bison
$

It is a toy example, but I believe it is a good demonstration of the feature.

The full documentation of Vendor specific data files can be found in the Lintian User manual (in lintian/2.5.7 or newer).  It will also be on lintian.debian.org when we find time to update lintian there.  🙂

 

Posted in Debian, Lintian | 1 Comment

Kudos to Jakub Adam, Miguel Landaeta and James Page

Credit where it is due and I believe it is due for Jakub Adam for packaging eclipse packages.  If you use any of the eclipse packages provided the apt repositories for Wheezy or sid, it is very likely you have Jakub Adam to thank for it.

I also believe that Miguel Landaeta and James Page deserve praise for their work.  Miguel is to thank for the removal of libservlet2.4-java and updating its reverse dependencies – not in that order ;-).  James Page, on the other hand, has been introducing and updating a lot of packages, noticeably the jenkins packages.

Thank you and keep up the good work.

Posted in Debian | 1 Comment

Some sponsors are “evil and pedantic”

If you want to enable all Lintian tags, just remember the phrase:

Some sponsors are “evil and pedantic”

Or on the command-line:

 $ lintian -EvIL +pedantic ...

It works for Lintian 2.5.5 (and newer), which handles “pedantic” like other severities.  If you need help understanding the tags, you can add an extra “i” (-i) to “evil”.  That being said, remember that experimental (-E) and pedantic (-L +pedantic) tags are what they are for a reason. Also quite a few people will probably find verbose (-v) too noisy. However, leaving any of them out would have ruined the mnemonic. 🙂

Of course, you can also ask Lintian to enable all tags via your lintianrc file.  Here is a quick-start:

display-info = yes # or no
display-experimental = yes # or no
pedantic = yes # or no
verbose = yes # or no


Posted in Debian, Lintian | 2 Comments

Python optimization and profiling

For the past 5 days, I have worked on replacing a part of Britney (our beloved testing migration tool).  The part I am trying replacing is her “installability tester”, which is probably one of the most used parts of Britney.

I know premature optimization is the root of all evil, but I felt compelled to be a bit “evil” this time.  The reason is that a naïve algorithm could easily spend decades on computing the result of our most complete tests (each consists of roughly 2 *  35 000 packages).

Within a couple of days, I felt I had a reasonable candidate to use on the large tests.  On the large tests, it ran with a noticeable speed regression and it produced one different result.  The result difference was about 20-40 source packages and their binaries.

You may have noticed I used “difference” and not “regression”.  In the given test, our current implementation at some point gives up with an “AIEEE” error.  So I decided to turn my attention to the speed regression first.

As any “good” programmer, I decided to use a profiler (cProfile) to determine my bottle-necks.  This turned out to be a good idea, as the bottleneck was not quite where I thought it was.  I wish I could say I just patched it out the issue, but… unfortunately not.

I played around with various changes such as using “implied” for loops rather than explicit ones etc.  Some times I would manage to cut the runtime in half only to be unable to reproduce it later.  Yesterday I finally realized what caused this.  I was working in 3 different chroots, two 64-bit sid chroots[1] and a 32-bit stable chroot.

Turns out that my replacement is only slow in the stable chroot.  As soon as I moved to a sid chroot, the runtime was more or less cut in half.  I have not bothered to check if it is the 32 vs 64 bit, the stable vs sid part or maybe the python profile in stable is just slow[2].  I simply moved to my sid chroots for now and continued working.

With that taken care of, I figured I would remind myself of what I was up against.  On the largest test I have available, I profiled my implementation to around 4:30 – 5:00 minutes.  Regardless of my changes, I always got times in that range.  I did manage to patch a bug in my code that reduced by only diff to 10 source packages (plus their binaries) at the price of 30 seconds.

So I was back to 5 minutes according to the profiler, but I noticed that my test runner disagreed.  According to my test runner my implementation had a runtime of approx. 3 minutes and 40 seconds.  For comparison my test runner puts the original implementation at 3 minutes and 20 seconds for that test.

In short, I have spent most of my day trying to optimize my code to compensate for the +20% overhead the profiler introduced.

Admittedly, my replacement still needs some speed improvement on some other tests where it still has a factor 2 runtime regression. I also need to fix the last diff, which now I suspect is in my replacement.

[1] In case you were wondering, yes those chroots are on two different machines.

[2] Of course, it could be a combination of that as well…  Anyhow, I am too lazy to research it at the moment.

Posted in Debian, Release-Team | 2 Comments

Testing migration and package relations

While looking at the dpkg Breaks-field[0]…

$ aptitude show dpkg
Package: dpkg
[...]
Version: 1.16.1.2
[...]
Breaks: apt (< 0.7.7), aptitude (< 0.4.7-1), dpkg-dev (< 1.15.8), libdpkg-perl (< 1.15.8), pinfo (< 0.6.9-3.1), tkinfo (< 2.8-3.1)

… it occurred to me that most (all?) of these relations were irrelevant to Britney when she migrated dpkg 1.16.1.2 to testing.  Right now, at least the APT relations are only relevant if you are doing something like a distribution upgrade from Lenny/Squeeze to Wheezy.  Similarly, the version constraints in a lot of dependency relations (e.g. “libc6 (>= 2.11)”) are satisfied in testing and unstable at the same time.

Removing the version constrains on dependencies is a rather minor thing as it is basically just a minor constant time optimization on each dependency check.  However, removing an entire clause slightly reduces the “problem size” a bit.  Particularly, the Conflicts/Breaks relations tends to be expensive for us.

The first task was to identify the relations that can (or cannot) be simplified.  In a Britney run we at most 4 versions of the same package per architecture, though usually only 1 or 2[1].  I devised a small set of rules to simplify the relations.  These rules are applied to each package in the relation (atomic proposition, if you will).

  1. If the relation is versioned and it involves a virtual package in any suite, then do not change the relation in any way.  Rationale: A virtual package cannot satisfy any versioned relation (Policy Manual §7.5)
  2. If the relation is a dependency (i.e. Pre-Depends or Depends) and the package is not available in any suite, then do not change the relation in any way.
  3. If the relation is versioned and the relation is satisfied in all suites (where the “relationed-on” package is available), then remove the version constrain. Rationale: If all (present) versions satisfy the relation, then version constrain does not change the semantics.
  4. If the relation is a conflict (i.e. Conflicts or Breaks) and relation is unsatisfiable in all suites, then remove the relation.  Rationale: If none of the (present) versions satisfy the conflict-relation, then there is no conflict[2].

The rules are rather conservative in some cases and there is room for improvements.  However, one has to remember that removing too few relations costs a bit in runtime, removing too many breaks testing… and possibly a lot.  Obviously, I prefer the former to the latter (especially because I will be a part of the “clean up”-crew).

I tested my implementation of those four rules above against the current master branch.  In short, it produces the same result as the master branch in all the tests so far.  In the hand-made test-suite, the tests generally do not have any superfluous relations.  Thus, it is slightly slower, though usually within 0.1 seconds of the master branch.

On the other hand, in the live-data samples I have collected so far it does vastly better.  For the longest run (sample from 2011-12-13), it reduces the runtime with ~70 seconds (from ~215 to ~145 seconds).  In the other runs, it reduces total runtime with ~35 and ~2 seconds, respectively.  In these samples, only the amd64 and i386 packages are considered (and human hints are ignored).

For those interested, the code is available in my branch.  🙂

[0] There is perfectly valid reason for doing that.

I might get back to that in a later post.

[1] One in unstable, testing, testing-proposed-updates and proposed-updates.  The latter may seem a bit weird, but… [0]

[2] This is the rule that prune relations like the one in the dpkg Breaks-field.

Edit: 2011-01/09, clarified that we have at most 4 versions per architecture.

 

Posted in Debian, Release-Team | 2 Comments

Britney in 5 minutes

About 26-28 hours ago in #debian-release on IRC:

<nthykier> damn, a britney run in 5 minutes
<adsb> they happen
<adsb> you've been spoilt by never seeing b1 at her "finest" *cough*
<aba> you mean, running for more than a day?
<Ganneff> adsb wants night-long runs?
<aba> I can remember runs where we had to block certain packages to
      make sure the run could actually end *sometimes*
<adsb> I'm quite happy with just the memory of that sort of run,
       thanks :P
<nthykier> I don't mind being spoiled if it stays at 5 minutes :P

I took the liberty of collecting the resulting data for the Britney test suite. In its reduced state[1] in runs in 30 seconds on my machine.  It is already my favourite live data sample in the test suite.  😀

[1] Only i386 and amd64 are considered, manual hints are ignored etc.

Posted in Debian, Release-Team | 4 Comments

Handling transitions with smooth updates

Lately, I have been working more on release stuff. It all started when Julien convinced me to do the gpsd transition. It was a small, simple transition though I had to bounce obdgpslogger from testing (#648495). After that I picked up the zita-convovler transition (finished yesterday) and gssdp/gupnp{,-igd} (currently blocked by #653131 and #652783). I also got the mono, libindicator+libdbusmenu+libindicate and the libarchive transitions on my to-do list.

The mono transition is going to be most interesting and challenging of these. It is a bit above 100 packages and the binNMU order (for the 30ish packages that are not arch:all) is non-trivial. Thankfully, Iain Laney appears to have that part covered and will be helping me get it right.

I am also very happy with Britney2’s transition assistance. Unlike her retired older sister, Britney2 “smooth updates” libraries, which allows us to break the transition into smaller steps.

Normally when Britney2 migrates a source package, she will throw out all the binaries from the old (version of the) source package. Then she moves the new source package and its binaries into testing. But in a smooth update, she will keep the old library binary packages around (if they are still in use).

In a concrete example, during the zita-convolver transition, we transitioned from libzita-convolver2 to libzita-convolver3. On the 24th of December[1], Britney migrated zita-convolver 3.1.0-1 to testing with libzita-convolver3, but kept libzita-convolver2 in testing as well. This is because ir.lv2 was not ready to migrate at that time.

With Britney1 zita-convolver 3.1.0-1 would have had to wait until all of its reverse dependencies were ready to migrate. For a small transition like zita-convolver (with 3 or so reverse dependencies), it would have been easy. But having to “keep” 100+ packages “migration ready” for the mono transition… that is where handling a transition becomes an art.

I may still need some hinting to finish the mono transition, but most likely it will be a lot easier than it would have been with Britney1. 🙂

[1] The PTS says the 25th. This is because it uses the day it receives the “migration”-email from trille, which was sent the day after.

 

Posted in Debian, Release-Team | 1 Comment

Status on build-arch target goal – Nov 13th

We have been working on adding build-arch support for about a week now and I figured a little status update would be in order. 🙂

According to UDD, we had 506 packages to fix when we started and after todays update it has dropped to 485.  It is a little less than the “4 packages a day” needed to ensure that they are all fixed in Wheezy, but I think it has been a good start.  🙂

On a related note, I am very pleased to see the progress on the general build-arch fixes.  According to the statistics collected by Lintian on lintian.d.o[1], maintainers in Debian has fixed a total of 67 packages since we started.  To put that into perspective, we fixed about 36 packages the week before that.

[1] Currently only DD-accessible on lintian.debian.org

/srv/lintian.debian.org/history/tags/debian-rules-missing-recommended-target.dat

Posted in Debian | Leave a comment

build-arch for everyone

The other day, I was asked if it was really possible to get build-arch support in Wheezy (as in, buildds using build-arch) by adding these optional targets.

Lets have a look at the data that is available to us:

  • At least ~500 packages must be fixed (the “reduced set“)
  • The Wheezy freeze is expected in June (as I recall, I may be off)
    • NMUs can take quite a while, so lets reduce it to May.

To simplify my calculation I will assume we can fix packages over a course of 150 days (which is 5 months of ~30 days or every day in Dec to April). So 500/150 = 10/3 =~ 3.3 packages from the reduced set should be fixed every day.  Erring on the side of caution, we should make that 4 packages every day.

So if we fix 4 packages from the reduced set every day, we will definitely fix all of them before Wheezy.  But the reduced set are only the source packages that could possibly “benefit” from having a build-arch target (it builds both arch:all and non-arch:all packages).   There could (and probably will) still be sources building non-arch:all packages without a build-arch target.  Furthermore, with the rate of 4 a day we will only have a month to get dpkg and buildd support…

In short, no.  I do not expect us to get archive-wide build-arch support on buildds for Wheezy.  But I will do my best to ensure that option #4 “flip the switch” becomes very attractive early in the “Wheezy + 1” development schedule.

I hope you will join us in this endeavour.  Most of the time it is a trivial 3-4 line fix and often you can even throw in some hardening flags to spice it up a bit.  The easiest way to help is to fix your (team’s) packages listed in the “reduced set“.   Once you are done with that you can look at the “rest” of your (team’s) packages (see the full dd-list).

The most important thing to remember is that Build-Depends-Indep is still broken!  That is, you cannot rely on Build-Depends-Indep being installed on a buildd in the “build” target.  So any existing workarounds have to stay for now (i.e. check for certain commands or deferring indep till binary-indep is called).  If you are in doubt about how to fix a certain package feel free to ask for help.

Note: I pull the data directly from lintian.d.o (full set) and UDD (reduced set).  I try to remember to refresh it daily.  The full set is basically (z)grep | cut | uniq on the lintian log, the reduced set is found using the UDD query from this script (based on a query done by Jakub Wilk).  Since all data is (in)directly based on lintian.d.o only packages that are in sid are considered.

 

Posted in Debian | Leave a comment

Testing testing migration

If you have been following Lintian’s development closely, you will probably have noticed that I have not really done anything there for the past week. Instead I have turned my focus on our testing migration script, britney2. First, I have created a minimal test suite[1]. It started as 4 simple tests and by now it contains about 30 tests.

The size of each test is rather small; the largest tests are about 1600 binary packages in total[2], but most are 2-20 binary packages in total. Thus the test suite is rather fast compared to a “live data sample”, which easily takes more than 10 minutes for a single run. Unfortunately, hand-crafting the test data is somewhat annoying and easy to get wrong.

The test suite has a somewhat unfair focus on “auto-hint”[3] cases, so the current britney2 fails up to 14 tests. Some of these appears to fail because the auto-hinter (for some reason) receives incomplete information about the situation. To my knowledge we not been able to debug the situation, but Adam has a refactor branch that does not seem to have this issue. Personally I am hoping it will soon be merged into the master branch, especially because it seems to simplify a lot of common operations.

Joachim Breitner (who has been working on a SAT-solver based britney) also contributed a couple of test cases[4]. Allegedly, SAT-britney does rather well on the test suite, failing only 2 tests as far as I can tell[5]. On the other hand, it does solve a some of the more interesting cases britney2 does not solve.

On a more mathematical note, the britney2 implementation behaves like a function[6] with an attractive fixed point[7]. This is interesting, because for some cases it may take britney2 a couple of iterations to reach the right solution. This fixed point is somewhat simple to find by using the following steps (pseudo-code):

// Runtime complexity O(n * br * diff), where "n" is the number of iterations until
// a fixed point is reached, "br" is the complexity of "run_britney" and "diff" is
// the runtime of the "last != current" comparison.
function find_fixed_point(initial);
    last = run_britney(initial)
    current = run_britney(last)
    while last != current ; do
        last = current
        current = run_britney(last)
    od
    return current
end

This gives us a simple way to test if britney will eventually solve the issue herself (and when she will do it). Currently britney2 is automatically run twice a day, so for every 2 iterations (beyond the first) roughly translates to a 24-hours delay. So far the test suite does not have a lot of problems that requires more than one iteration. Personally I would be pleased if it turned out to stay that way as the test suite coverage grows.

If you are interested in playing around with this, you can get sources from:

  • britney2
    • Currently only works in stable (i.e. requires python2.5 and python-apt < 0.8 or so)
    • See the INSTALL file for instructions
    • Adam’s branch
      • use the “p-u” branch.
  • SAT-britney
    • I haven’t tested this one and I do not know the requirements here
  • britney-tests
    • See the README file for instructions

Footnotes:

[1] http://lists.debian.org/debian-release/2011/10/msg00178.html

[2] These tests are auto-generated, so it is merely an “up-scaled pattern”.

[3] Basically if two (or more) packages needs to migrate into testing at the exact same time, they need to be hinted in.

[4] Not to mention all the copy-waste errors he pointed out in mine. Apparently, SAT-britney has stricter requirements to the data than britney2. 😛

[5] I assume the test called “sat-britney-death” (created by Joachim) was named that way for a reason. The second failure is caused by SAT britney not reading hints (yet?), so the “approve tpu package” test case should fail.

[6] A function that maps an “archive” into another “archive”… erh, I mean, it maps a set of packages into another set of packages… 😛

[7] http://en.wikipedia.org/wiki/Fixed_point_%28mathematics%29

Assuming my claim to be true, the function will have more than one fixed point. The obtained fixed point depends on the initial state of testing.

As an example:
– y depends on x
– x in testing has RC bugs

If x is not in testing, it cannot migrate to testing (due to its RC bugs). If x is not in testing, then y cannot migrate into testing. But if x starts in testing, then y may be able to migrate. This can happen if x migrated to testing before an RC bug was filed against it.

(Dis-)Proving my claim is an exercise left for the reader.

Posted in Debian, Release-Team | 3 Comments