Wheezy was brought to you by …

During the Wheezy freeze, the Debian release team deployed 3254 hints[1].  This number may include some duplicates (i.e. where two members of the team hinted the same package), it certainly does not include a lot of rejected requests <insert more disclaimers here>.

The top hinter was *drum roll*… Adam, who did 1799 hints(That is 55% of all hints during the freeze).  For comparison, the second and third runner ups added together did 1023 hints (or 31.4%).  Put in a different way, on average Julien Cristau and I would both add about 1.5 hints each day and Adam would on his own add 5.6 hints a day.

Of course, this is not intended to diminish the work other the rest of the team.  Reviewing and unblocking packages is not all there is to a release.  Particularly, a great thanks to Cyril Brulebois for his hard work on the Debian Installer (without which Debian Wheezy could not be released at all).

Enjoy!

[1] Determined by:

  egrep -c 'done 201(3|2-?(07|08|09|10|11|12)) $HINT_FILE

It does not count hints, but the little “header” we add above our hints.  One header can apply to multiple hints (which is good, because udeb unblocks are usually paired with a regular unblock and we don’t want to count them as two separate hints).

Posted in Debian, Release-Team | 4 Comments

“Das Lintian-overrider 2000″ vs “unjustified overrides”

In January, I did a “TV-shop ad”-style post on a little script called “lintian-overrider” and it prompted Simon to ask:

That’s a great tool, but don’t you fear it makes unjustified overrides too easy ?

In my experience, people sometimes have issues writing overrides (justified or not).  In fact it sometimes leaves them really frustrated with Lintian.  If that frustration eventually leads to them reject Lintian, then we are doing the project a disservice.  I will quote Russ Allbery as I believe he covered when he wrote:

I care most about all of the regular Debian developers [...] continuing to use Lintian, so that Lintian can stay as effective as it is now at getting people to make archive-wide changes. [...] This only works if we can get nearly everyone uploading packages to run Lintian all the time.

If a handful of people adds overrides for tags they should not have, then we can fix that (e.g. by filing a bug against their packages).  But it is unlikely that we will ever make them run Lintian again once they have boycott it.

 

Posted in Debian, Lintian | Leave a comment

Introducing “Das Lintian-overrider 2000″

Have you ever tried to add a Lintian override only to get it wrong?  Fret not, with the “Lintian-overrider 2000″ such are problems of the past!  Simply feed the tag emitted by Lintian to the Lintian-overrider 2000 and it will show you the correct format for the override plus the file to put said overide in.  Furthermore, it will show you variants that you may (or may not) want to use instead.

$ echo "W: login: setuid-binary bin/su 4755 root/root" | \
     lintian-overrider --alternative-forms
  --8<-- debian/login.lintian-overrides --8<--
# If you want to override all (present and future) variants
# of this tag, use:
#  setuid-binary
setuid-binary bin/su 4755 root/root
# Alternative forms...
#   login: setuid-binary bin/su 4755 root/root
#   login binary: setuid-binary bin/su 4755 root/root
# For architecture specific overrides, use one of:
#   login [i386-any amd64-any other-archs] binary: setuid-binary bin/su 4755 root/root
#   login [!i386-any !amd64-any !other-archs] binary: setuid-binary bin/su 4755 root/root
  --8<-- End of debian/login.lintian-overrides --8<--

No more fiddling with that stupid syntax. Just feed it to the Lintian-overrider 2000 and instantly you will get the overrides you want!

If you sometimes make a copy-waste mistake or just feel life is too short manually update those files, the Lintian-overrider 2000 is the tool for you! With its --source-dir command line option, your lintian overrrides are updated automatically!

The Lintian-overrider 2000 can also automatically maintain your Lintian overrides for you! It is simple, just do:

$ lintian -o <path/to/your/changes-file.changes> | \
      lintian-overrider --there-are-no-issues --source-dir <path/to/unpacked/source-tree>

Alioth is ready to take your git clone (or HTTP GET) request, so go order your copy now!

</tv-shop-ad>

Posted in Debian, Lintian | 9 Comments

Getting space for more packages

In 2011, I wrote about how small files could consume a lot of space. I meant to do a follow-up on the savings but I forgot about it until now.

In 2.5.7, we started compressing some of the collected data files. Some of these are ridiculously compressable (#664794).  Even better, compressing them is sometimes faster than writing them directly to the disk, so in some cases it is a pure win/win.  For lintian.d.o, we also see a vast size reduction in overall size of the laboratory.

I have taken a few samples occasionally. The samples were done with du(1):


$ du -csh [--apparent-size] laboratory/*

Version/date du -csh –apparent-size
N/A – around 20 Mar 2012 (#664794) 16G 13G
2.5.6 (Fri Apr 27 2012) 14GB N/A
2.5.6 (Mon Jun 04 2012)) N/A 12G
2.5.10.2 (Fri Sep 21 2012) 12G 8.3G
2.5.11 (Wed Jan 2 2013) 10G 6.1G

And the most awesome part of this? The comparison is quite biased against the 2.5.11 entry, which is the only entry to also process experimental (approx. 10% extra packages).  Some of the early entries (2.5.6 and “older”) might also have suffered from the “too many links” issue[1].  I only wish I had been better at collecting data points, so I could have made a proper graph of it.  :)

It sounds almost too good to be true, but if you look at the size of one of the linux-image packages[2], the space usage dropped from 27M to 15M between 2.5.5 to 2.5.9.   Currently it is squeezed down to 14M (tested with head of the git master branch).

[1] I believe is about 5-10% less binary packages processed for those runs.

[2] linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb

Posted in Debian, Lintian | Leave a comment

Performance bottlenecks in Lintian

Thanks to a heads up from Bastian Blank, I learned that Lintian 2.5.7 and 2.5.8 were horribly slow on the Linux binaries.  Bastian had already identified the issue and 2.5.9 fixed the performance regression.

But in light of that, I decided to have a look at a couple of other bottlenecks.  First, I added a simple benchmark support to Lintian 2.5.10 (enabled with -dd) that prints the approximate run time of a given collection.  As an example, when running lintian -dd on lintian 2.5.10, you can see something like:

N: Collecting info: unpacked for source:lintian/2.5.10 ...
[...]
N: Collection script unpacked for source:lintian/2.5.10 done (0.699s)

When done on linux-image, the slowest 3 things with 2.5.10 are (in order of appearance):

[...]
N: Collection script strings for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (12.333s)
N: Collection script objdump-info for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (15.915s)
[...]
N: Finished check: binaries (5.911s)
[...]

(The mileage (and order) probably will vary a bit.)

These 3 things makes up about 22 seconds of a total running time on approximately 28-30s on my machine.  Now if you wondering how 12, 16 and 6 becomes 22 the answer is “parallelization”.  strings and objdump-info are run in parallel so only the “most
expensive” of the two counts in practise (with multiple processing units).

The version of linux-image I have been testing (3.2.20-1, amd64) has over 2800 ELF binaries (kernel modules).  That makes the runtime of strings and objdump-info much more dominating than in “your average package”.   For the fun of it – I have done a small informal benchmark of various Lintian versions on the binary.

I have used the command line:

# time is the bash shell built-in and not /usr/bin/time
$ time lintian -EvIL +pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null
# This was used with only versions that did not accept -L +pedantic
$ time lintian -EvI --pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null

With older versions of Lintian (<= 2.5.3) Perl starts to emit warnings; these have been manually filtered out.  I used lintian from the git repository (i.e. I didn’t install the packages, but checked out the relevant git tags).  I had libperlio-gzip-perl installed (affects the 2.5.10 run).

Most results are only from a single run, though I ran it twice on the first version (hoping my kernel would cache the deb for the next run). The results are:

2.5.10
real    0m28.836s
user    0m36.982s
sys     0m3.280s

2.5.9
real    1m9.378s
user    0m33.702s
sys     0m11.177s

2.5.8
real    4m54.492s
user    4m0.631s
sys     0m30.466s

2.5.7 (not tested, but probably about same as 2.5.8)

2.5.{0..6}
real    1m20s   - 1m22s
user    0m19.0s - 0m20.7s
sys     0m5.1s  - 0m5.6s

I think Bastian’s complaint was warranted for 2.5.{7,8}.  :)

While it would have been easy to attribute the performance gain in 2.5.10 on the new parallelization improvements, it is simply not the case. These improvements only apply to running collections when checking multiple packages.  On my machine, the parallelization limit for a package is effectively determined by the dependencies between the collections on my machine.

Instead the improvements comes from reducing the number of system(3) (or fork+exec) calls Lintian does.  Mostly through using xargs more, even if it meant slightly more complex code.  But also, libperlio-gzip-perl shaved off a couple of seconds on “binaries” check.

But as I said, linux-image is “not your average package”.  Most of the improvements mentioned here are hardly visible on other packages.   So let’s have a look at some more other bottlenecks.  In my experience the following are the “worst offenders”:

  • unpacked (collection)
    • Seen on wesnoth-1.9 source. Here the problem seems to be tar+bzip2, so there is not really a lot to do (on the Lintian side). Though feel free to prove me wrong. :)
  • file-info (collection)
    • Seen in eclipse/eclipse-cdt source. file(1) appears to spend a lot of time classifying some source files. For eclipse-cdt, I experience an approx. 10 second speed up (from 40s to 30s) if file are recompiled with -O2. (That would be #659355).  However, even if file is compiled with -O2, the file-info collection is still the dominating  factor.
  • manpages (check)
    • Running man on manpages can be a dominating factor in certain doc packages. This is #677874 and suggestions for fixing it are more than welcome.

But enough Lintian for now… time to fix some RC bugs!

Posted in Debian, Lintian | 1 Comment

Kudos to Jakub Adam, Miguel Landaeta and James Page

Credit where it is due and I believe it is due for Jakub Adam for packaging eclipse packages.  If you use any of the eclipse packages provided the apt repositories for Wheezy or sid, it is very likely you have Jakub Adam to thank for it.

I also believe that Miguel Landaeta and James Page deserve praise for their work.  Miguel is to thank for the removal of libservlet2.4-java and updating its reverse dependencies – not in that order ;-) .  James Page, on the other hand, has been introducing and updating a lot of packages, noticeably the jenkins packages.

Thank you and keep up the good work.

Posted in Debian | 1 Comment

Some sponsors are “evil and pedantic”

If you want to enable all Lintian tags, just remember the phrase:

Some sponsors are “evil and pedantic”

Or on the command-line:

 $ lintian -EvIL +pedantic ...

It works for Lintian 2.5.5 (and newer), which handles “pedantic” like other severities.  If you need help understanding the tags, you can add an extra “i” (-i) to “evil”.  That being said, remember that experimental (-E) and pedantic (-L +pedantic) tags are what they are for a reason. Also quite a few people will probably find verbose (-v) too noisy. However, leaving any of them out would have ruined the mnemonic. :)

Of course, you can also ask Lintian to enable all tags via your lintianrc file.  Here is a quick-start:

display-info = yes # or no
display-experimental = yes # or no
pedantic = yes # or no
verbose = yes # or no


Posted in Debian, Lintian | 2 Comments

Testing testing migration

If you have been following Lintian’s development closely, you will probably have noticed that I have not really done anything there for the past week. Instead I have turned my focus on our testing migration script, britney2. First, I have created a minimal test suite[1]. It started as 4 simple tests and by now it contains about 30 tests.

The size of each test is rather small; the largest tests are about 1600 binary packages in total[2], but most are 2-20 binary packages in total. Thus the test suite is rather fast compared to a “live data sample”, which easily takes more than 10 minutes for a single run. Unfortunately, hand-crafting the test data is somewhat annoying and easy to get wrong.

The test suite has a somewhat unfair focus on “auto-hint”[3] cases, so the current britney2 fails up to 14 tests. Some of these appears to fail because the auto-hinter (for some reason) receives incomplete information about the situation. To my knowledge we not been able to debug the situation, but Adam has a refactor branch that does not seem to have this issue. Personally I am hoping it will soon be merged into the master branch, especially because it seems to simplify a lot of common operations.

Joachim Breitner (who has been working on a SAT-solver based britney) also contributed a couple of test cases[4]. Allegedly, SAT-britney does rather well on the test suite, failing only 2 tests as far as I can tell[5]. On the other hand, it does solve a some of the more interesting cases britney2 does not solve.

On a more mathematical note, the britney2 implementation behaves like a function[6] with an attractive fixed point[7]. This is interesting, because for some cases it may take britney2 a couple of iterations to reach the right solution. This fixed point is somewhat simple to find by using the following steps (pseudo-code):

// Runtime complexity O(n * br * diff), where "n" is the number of iterations until
// a fixed point is reached, "br" is the complexity of "run_britney" and "diff" is
// the runtime of the "last != current" comparison.
function find_fixed_point(initial);
    last = run_britney(initial)
    current = run_britney(last)
    while last != current ; do
        last = current
        current = run_britney(last)
    od
    return current
end

This gives us a simple way to test if britney will eventually solve the issue herself (and when she will do it). Currently britney2 is automatically run twice a day, so for every 2 iterations (beyond the first) roughly translates to a 24-hours delay. So far the test suite does not have a lot of problems that requires more than one iteration. Personally I would be pleased if it turned out to stay that way as the test suite coverage grows.

If you are interested in playing around with this, you can get sources from:

  • britney2
    • Currently only works in stable (i.e. requires python2.5 and python-apt < 0.8 or so)
    • See the INSTALL file for instructions
    • Adam’s branch
      • use the “p-u” branch.
  • SAT-britney
    • I haven’t tested this one and I do not know the requirements here
  • britney-tests
    • See the README file for instructions

Footnotes:

[1] http://lists.debian.org/debian-release/2011/10/msg00178.html

[2] These tests are auto-generated, so it is merely an “up-scaled pattern”.

[3] Basically if two (or more) packages needs to migrate into testing at the exact same time, they need to be hinted in.

[4] Not to mention all the copy-waste errors he pointed out in mine. Apparently, SAT-britney has stricter requirements to the data than britney2. :P

[5] I assume the test called “sat-britney-death” (created by Joachim) was named that way for a reason. The second failure is caused by SAT britney not reading hints (yet?), so the “approve tpu package” test case should fail.

[6] A function that maps an “archive” into another “archive”… erh, I mean, it maps a set of packages into another set of packages… :P

[7] http://en.wikipedia.org/wiki/Fixed_point_%28mathematics%29

Assuming my claim to be true, the function will have more than one fixed point. The obtained fixed point depends on the initial state of testing.

As an example:
– y depends on x
– x in testing has RC bugs

If x is not in testing, it cannot migrate to testing (due to its RC bugs). If x is not in testing, then y cannot migrate into testing. But if x starts in testing, then y may be able to migrate. This can happen if x migrated to testing before an RC bug was filed against it.

(Dis-)Proving my claim is an exercise left for the reader.

Posted in Debian, Release-Team | 3 Comments

Wheezy release progress (February)

About 5-6 weeks ago, I wrote about the Wheezy release progress, so it is about time for another update.  According to UDD, we are down to 204 RC bugs (down from 249, since my last post).  It is not quite the 2.4 RC bugs per day – actually it is about 1.1 RC bug a day.

Unfortunately, we do not appear to be fixing bugs faster than we are reporting them at the moment.  If you look at Richard Hartmann’s post from last week, then we had 206 RC bugs left.  Even worse, we appear to have regressed between week 7 and 8 of this year (194 to 206).  If you want the Wheezy release to happen soon, please consider helping us by providing bug fixes that comply with the Wheezy freeze policy.

If the pace of RC bug fixes do not pick up, the alternative is that the release team “deals” with the bugs.  Note that “deals” generally falls into one of 2 categories.  Either we defer/ignore the problem for Wheezy[1] or we remove affected packages from Wheezy/testing.  Particularly, if we have to remove packages, they may take reverse dependencies with them as collateral damage.

I do not like these tools anymore than you do.  But if the RC bugs fixes are not coming in, it is the only two tools we have left.

[1] Meaning that at best the bug fix will occur at Wheezy point release… at worst, not at all.

Posted in Debian, Release-Team | 6 Comments

Wheezy release progress (January)

In December, I wrote a post on the progress of the Wheezy release.  Back then, we had 348 RC bugs in testing and today there are about 249 left.  A bit of simple math put us at 99 RC bugs fixed since last and an average of about 2.4 RC bugs fixed per day (up from 1.8).  Assuming a constant rate, we would be able to release in about 100 days or 3 months (plus 1 week or two) from now.  If you want the release earlier than that, fix RC bugs even faster! :)

Though, the data we are looking might be inaccurate.  We filed a bug against UDD, which claims that #639407 affects Wheezy while it is in fact fixed as far as the BTS is concerned.  If UDD is only wrong in this direction, we can hope for a positive surprise when the UDD bug is fixed.  On the other hand, we may also be unpleasently surprised if not.

While talking about the UDD, I would like to mention a new feature its bug search script.  It is now possible to filter based on whether or not a package is unblocked.  If you take  this into account (and assume that all unblocked packages will migrate in a timely manner), our RC bug count drops to about 219.  It also gives us a much better view of how many packages that could actually use an unblock. which is now down to about 61.

In case you have been wondering what is up with the “removal of RC buggy leaf packages”.  We have still been looking at those, but for the last couple of times there have been at most one candidate for a given run (obviously not the same package each time). It felt a bit excessive to send an email to debian-devel for just one RC bug.

Posted in Debian, Release-Team | 5 Comments

Wheezy release progress (December)

Wheezy has been frozen for 5-and-(almost-)a-half months now.

Last month, Neil sent a mail to d-d-a stating that there were 403 RC bugs left and today UDD claims there are 348 left.  My “pre-coffee^Wtea” math gets this to an average of approximately 1.8 RC bugs a day.

This average is only the change in “total” RC bugs between the 8th of Nov and today.  It does not account for RC bugs being filed and fixed between these two days.

However, there is another number that worries me.  Namely the number of RC bugs fixed in sid, but not Wheezy.  Today it is 139. According to Richard’s weekly updates, that number has  generally remaining in the interval 143-148 for 6 weeks (except week 45 where it went down to 134).

Those 139 are generally “just” an unblock (and a couple of days) from reaching Wheezy (or a tpu upload).  It does also include “unblocked, but not old enough to migrate” packages.  Unfortunately the 112 “unread” emails in my d-release inbox suggests most of these 139 still needs attention.

Most of these “unread” emails are (unfortunately) unblock/tpu requests that none of us (to my knowledge) has had time to respond to.  So, if you cannot find an RC bug to fix, I hope you will consider doing a bit of peer review and help us bring down the number of unanswered unblock requests.

DDs can access the same tools we use (See Neil’s mail). Otherwise, the diff is just a:

 

$ dget -d http://.../pkg_testing-version_arch.dsc
$ dget -d http://.../pkg_sid-version_arch.dsc
$ debdiff pkg_testing-version_arch.dsc pkg_sid-version_arch.dsc

 

Posted in Debian, Release-Team | 5 Comments