Wheezy was brought to you by …

During the Wheezy freeze, the Debian release team deployed 3254 hints[1].  This number may include some duplicates (i.e. where two members of the team hinted the same package), it certainly does not include a lot of rejected requests .

The top hinter was *drum roll*… Adam, who did 1799 hints(That is 55% of all hints during the freeze).  For comparison, the second and third runner ups added together did 1023 hints (or 31.4%).  Put in a different way, on average Julien Cristau and I would both add about 1.5 hints each day and Adam would on his own add 5.6 hints a day.

Of course, this is not intended to diminish the work other the rest of the team.  Reviewing and unblocking packages is not all there is to a release.  Particularly, a great thanks to Cyril Brulebois for his hard work on the Debian Installer (without which Debian Wheezy could not be released at all).

Enjoy!

[1] Determined by:

  egrep -c 'done 201(3|2-?(07|08|09|10|11|12)) $HINT_FILE

It does not count hints, but the little “header” we add above our hints.  One header can apply to multiple hints (which is good, because udeb unblocks are usually paired with a regular unblock and we don’t want to count them as two separate hints).

Posted in Debian, Release-Team | 6 Comments

Wheezy release progress (February)

About 5-6 weeks ago, I wrote about the Wheezy release progress, so it is about time for another update.  According to UDD, we are down to 204 RC bugs (down from 249, since my last post).  It is not quite the 2.4 RC bugs per day – actually it is about 1.1 RC bug a day.

Unfortunately, we do not appear to be fixing bugs faster than we are reporting them at the moment.  If you look at Richard Hartmann’s post from last week, then we had 206 RC bugs left.  Even worse, we appear to have regressed between week 7 and 8 of this year (194 to 206).  If you want the Wheezy release to happen soon, please consider helping us by providing bug fixes that comply with the Wheezy freeze policy.

If the pace of RC bug fixes do not pick up, the alternative is that the release team “deals” with the bugs.  Note that “deals” generally falls into one of 2 categories.  Either we defer/ignore the problem for Wheezy[1] or we remove affected packages from Wheezy/testing.  Particularly, if we have to remove packages, they may take reverse dependencies with them as collateral damage.

I do not like these tools anymore than you do.  But if the RC bugs fixes are not coming in, it is the only two tools we have left.

[1] Meaning that at best the bug fix will occur at Wheezy point release… at worst, not at all.

Posted in Debian, Release-Team | 6 Comments

“Das Lintian-overrider 2000” vs “unjustified overrides”

In January, I did a “TV-shop ad”-style post on a little script called “lintian-overrider” and it prompted Simon to ask:

That’s a great tool, but don’t you fear it makes unjustified overrides too easy ?

In my experience, people sometimes have issues writing overrides (justified or not).  In fact it sometimes leaves them really frustrated with Lintian.  If that frustration eventually leads to them reject Lintian, then we are doing the project a disservice.  I will quote Russ Allbery as I believe he covered when he wrote:

I care most about all of the regular Debian developers […] continuing to use Lintian, so that Lintian can stay as effective as it is now at getting people to make archive-wide changes. […] This only works if we can get nearly everyone uploading packages to run Lintian all the time.

If a handful of people adds overrides for tags they should not have, then we can fix that (e.g. by filing a bug against their packages).  But it is unlikely that we will ever make them run Lintian again once they have boycott it.

 

Posted in Debian, Lintian | Leave a comment

Wheezy release progress (January)

In December, I wrote a post on the progress of the Wheezy release.  Back then, we had 348 RC bugs in testing and today there are about 249 left.  A bit of simple math put us at 99 RC bugs fixed since last and an average of about 2.4 RC bugs fixed per day (up from 1.8).  Assuming a constant rate, we would be able to release in about 100 days or 3 months (plus 1 week or two) from now.  If you want the release earlier than that, fix RC bugs even faster! 🙂

Though, the data we are looking might be inaccurate.  We filed a bug against UDD, which claims that #639407 affects Wheezy while it is in fact fixed as far as the BTS is concerned.  If UDD is only wrong in this direction, we can hope for a positive surprise when the UDD bug is fixed.  On the other hand, we may also be unpleasently surprised if not.

While talking about the UDD, I would like to mention a new feature its bug search script.  It is now possible to filter based on whether or not a package is unblocked.  If you take  this into account (and assume that all unblocked packages will migrate in a timely manner), our RC bug count drops to about 219.  It also gives us a much better view of how many packages that could actually use an unblock. which is now down to about 61.

In case you have been wondering what is up with the “removal of RC buggy leaf packages”.  We have still been looking at those, but for the last couple of times there have been at most one candidate for a given run (obviously not the same package each time). It felt a bit excessive to send an email to debian-devel for just one RC bug.

Posted in Debian, Release-Team | 5 Comments

Introducing “Das Lintian-overrider 2000”

Have you ever tried to add a Lintian override only to get it wrong?  Fret not, with the “Lintian-overrider 2000” such are problems of the past!  Simply feed the tag emitted by Lintian to the Lintian-overrider 2000 and it will show you the correct format for the override plus the file to put said overide in.  Furthermore, it will show you variants that you may (or may not) want to use instead.

$ echo "W: login: setuid-binary bin/su 4755 root/root" | \
     lintian-overrider --alternative-forms
  --8<-- debian/login.lintian-overrides --8<--
# If you want to override all (present and future) variants
# of this tag, use:
#  setuid-binary
setuid-binary bin/su 4755 root/root
# Alternative forms...
#   login: setuid-binary bin/su 4755 root/root
#   login binary: setuid-binary bin/su 4755 root/root
# For architecture specific overrides, use one of:
#   login [i386-any amd64-any other-archs] binary: setuid-binary bin/su 4755 root/root
#   login [!i386-any !amd64-any !other-archs] binary: setuid-binary bin/su 4755 root/root
  --8<-- End of debian/login.lintian-overrides --8<--

No more fiddling with that stupid syntax. Just feed it to the Lintian-overrider 2000 and instantly you will get the overrides you want!

If you sometimes make a copy-waste mistake or just feel life is too short manually update those files, the Lintian-overrider 2000 is the tool for you! With its --source-dir command line option, your lintian overrrides are updated automatically!

The Lintian-overrider 2000 can also automatically maintain your Lintian overrides for you! It is simple, just do:

$ lintian -o <path/to/your/changes-file.changes> | \
      lintian-overrider --there-are-no-issues --source-dir <path/to/unpacked/source-tree>

Alioth is ready to take your git clone (or HTTP GET) request, so go order your copy now!

</tv-shop-ad>

Posted in Debian, Lintian | 10 Comments

Getting space for more packages

In 2011, I wrote about how small files could consume a lot of space. I meant to do a follow-up on the savings but I forgot about it until now.

In 2.5.7, we started compressing some of the collected data files. Some of these are ridiculously compressable (#664794).  Even better, compressing them is sometimes faster than writing them directly to the disk, so in some cases it is a pure win/win.  For lintian.d.o, we also see a vast size reduction in overall size of the laboratory.

I have taken a few samples occasionally. The samples were done with du(1):


$ du -csh [--apparent-size] laboratory/*

Version/date du -csh –apparent-size
N/A – around 20 Mar 2012 (#664794) 16G 13G
2.5.6 (Fri Apr 27 2012) 14GB N/A
2.5.6 (Mon Jun 04 2012)) N/A 12G
2.5.10.2 (Fri Sep 21 2012) 12G 8.3G
2.5.11 (Wed Jan 2 2013) 10G 6.1G

And the most awesome part of this? The comparison is quite biased against the 2.5.11 entry, which is the only entry to also process experimental (approx. 10% extra packages).  Some of the early entries (2.5.6 and “older”) might also have suffered from the “too many links” issue[1].  I only wish I had been better at collecting data points, so I could have made a proper graph of it.  🙂

It sounds almost too good to be true, but if you look at the size of one of the linux-image packages[2], the space usage dropped from 27M to 15M between 2.5.5 to 2.5.9.   Currently it is squeezed down to 14M (tested with head of the git master branch).

[1] I believe is about 5-10% less binary packages processed for those runs.

[2] linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb

Posted in Debian, Lintian | Leave a comment

Wheezy release progress (December)

Wheezy has been frozen for 5-and-(almost-)a-half months now.

Last month, Neil sent a mail to d-d-a stating that there were 403 RC bugs left and today UDD claims there are 348 left.  My “pre-coffee^Wtea” math gets this to an average of approximately 1.8 RC bugs a day.

This average is only the change in “total” RC bugs between the 8th of Nov and today.  It does not account for RC bugs being filed and fixed between these two days.

However, there is another number that worries me.  Namely the number of RC bugs fixed in sid, but not Wheezy.  Today it is 139. According to Richard’s weekly updates, that number has  generally remaining in the interval 143-148 for 6 weeks (except week 45 where it went down to 134).

Those 139 are generally “just” an unblock (and a couple of days) from reaching Wheezy (or a tpu upload).  It does also include “unblocked, but not old enough to migrate” packages.  Unfortunately the 112 “unread” emails in my d-release inbox suggests most of these 139 still needs attention.

Most of these “unread” emails are (unfortunately) unblock/tpu requests that none of us (to my knowledge) has had time to respond to.  So, if you cannot find an RC bug to fix, I hope you will consider doing a bit of peer review and help us bring down the number of unanswered unblock requests.

DDs can access the same tools we use (See Neil’s mail). Otherwise, the diff is just a:

 

$ dget -d http://.../pkg_testing-version_arch.dsc
$ dget -d http://.../pkg_sid-version_arch.dsc
$ debdiff pkg_testing-version_arch.dsc pkg_sid-version_arch.dsc

 

Posted in Debian, Release-Team | 5 Comments

Lintian-NG – Accessing Lintian’s collection data with 90 lines

 $ LINTIAN_ROOT=. perl lintian-ng ../lintian_2.5.10.dsc
 Successfully unpacked 1 packages
 Did you know that source:lintian/2.5.10 contains 3745 files or/and directories (excl. root dir)
 This useless information was brought to you by Lintian-NG
 $ git describe
 2.5.10-61-g7670427
 $ wc -l lintian-ng
 90 lintian-ng

This includes everything from creating a temporary lab, adding the packages to it and running all of Lintian’s collections on it (parallelised, of cource).

Writing this script yesterday would probably have required 250-300 lines of code to achieve the same. That said, 90 lines are still a bit much to copy/waste into this blog, so I won’t do a “$ cat lintian-ng”. 🙂

If you are interested in how it looks you can (for now) fetch it from http://people.debian.org/~nthykier/lintian-ng/lintian-ng

Oh yeah, “files or/and directories” is actually any (extractable) entry in the tarball…

Posted in Debian, Lintian | Leave a comment

Performance bottlenecks in Lintian

Thanks to a heads up from Bastian Blank, I learned that Lintian 2.5.7 and 2.5.8 were horribly slow on the Linux binaries.  Bastian had already identified the issue and 2.5.9 fixed the performance regression.

But in light of that, I decided to have a look at a couple of other bottlenecks.  First, I added a simple benchmark support to Lintian 2.5.10 (enabled with -dd) that prints the approximate run time of a given collection.  As an example, when running lintian -dd on lintian 2.5.10, you can see something like:

N: Collecting info: unpacked for source:lintian/2.5.10 ...
[...]
N: Collection script unpacked for source:lintian/2.5.10 done (0.699s)

When done on linux-image, the slowest 3 things with 2.5.10 are (in order of appearance):

[...]
N: Collection script strings for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (12.333s)
N: Collection script objdump-info for binary:linux-image-3.2.0-2-amd64/3.2.20-1/amd64 done (15.915s)
[...]
N: Finished check: binaries (5.911s)
[...]

(The mileage (and order) probably will vary a bit.)

These 3 things makes up about 22 seconds of a total running time on approximately 28-30s on my machine.  Now if you wondering how 12, 16 and 6 becomes 22 the answer is “parallelization”.  strings and objdump-info are run in parallel so only the “most
expensive” of the two counts in practise (with multiple processing units).

The version of linux-image I have been testing (3.2.20-1, amd64) has over 2800 ELF binaries (kernel modules).  That makes the runtime of strings and objdump-info much more dominating than in “your average package”.   For the fun of it – I have done a small informal benchmark of various Lintian versions on the binary.

I have used the command line:

# time is the bash shell built-in and not /usr/bin/time
$ time lintian -EvIL +pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null
# This was used with only versions that did not accept -L +pedantic
$ time lintian -EvI --pedantic linux-image-3.2.0-2-amd64_3.2.20-1_amd64.deb >/dev/null

With older versions of Lintian (<= 2.5.3) Perl starts to emit warnings; these have been manually filtered out.  I used lintian from the git repository (i.e. I didn’t install the packages, but checked out the relevant git tags).  I had libperlio-gzip-perl installed (affects the 2.5.10 run).

Most results are only from a single run, though I ran it twice on the first version (hoping my kernel would cache the deb for the next run). The results are:

2.5.10
real    0m28.836s
user    0m36.982s
sys     0m3.280s

2.5.9
real    1m9.378s
user    0m33.702s
sys     0m11.177s

2.5.8
real    4m54.492s
user    4m0.631s
sys     0m30.466s

2.5.7 (not tested, but probably about same as 2.5.8)

2.5.{0..6}
real    1m20s   - 1m22s
user    0m19.0s - 0m20.7s
sys     0m5.1s  - 0m5.6s

I think Bastian’s complaint was warranted for 2.5.{7,8}.  🙂

While it would have been easy to attribute the performance gain in 2.5.10 on the new parallelization improvements, it is simply not the case. These improvements only apply to running collections when checking multiple packages.  On my machine, the parallelization limit for a package is effectively determined by the dependencies between the collections on my machine.

Instead the improvements comes from reducing the number of system(3) (or fork+exec) calls Lintian does.  Mostly through using xargs more, even if it meant slightly more complex code.  But also, libperlio-gzip-perl shaved off a couple of seconds on “binaries” check.

But as I said, linux-image is “not your average package”.  Most of the improvements mentioned here are hardly visible on other packages.   So let’s have a look at some more other bottlenecks.  In my experience the following are the “worst offenders”:

  • unpacked (collection)
    • Seen on wesnoth-1.9 source. Here the problem seems to be tar+bzip2, so there is not really a lot to do (on the Lintian side). Though feel free to prove me wrong. 🙂
  • file-info (collection)
    • Seen in eclipse/eclipse-cdt source. file(1) appears to spend a lot of time classifying some source files. For eclipse-cdt, I experience an approx. 10 second speed up (from 40s to 30s) if file are recompiled with -O2. (That would be #659355).  However, even if file is compiled with -O2, the file-info collection is still the dominating  factor.
  • manpages (check)
    • Running man on manpages can be a dominating factor in certain doc packages. This is #677874 and suggestions for fixing it are more than welcome.

But enough Lintian for now… time to fix some RC bugs!

Posted in Debian, Lintian | 1 Comment

Parsing bash/shell

I have been avoiding #629247 for quite a while. Not because I think we couldn’t use a better shell parser, but because I dreaded having to write the parser. Of course, #629247 blocks about 16 bugs and that number will only increase, so “someone” has to solve it eventually… Unfortunately, that “someone” is likely to be “me”.  So…

I managed to scrabble down the following Perl snippet. It does a decent job at getting lines split into “words” (which may or may not contain spaces, newlines, quotes etc.). It currently tokenizes the “<<EOF”-constructs (heredocs?).  Also it does not allow one to distinguish between “EOF” and ” EOF” (the former ends the heredoc, the latter doesn’t.).

Other defects includes that it does not tokenize all operators (like “>&”).  Probably all I need is a list of them and all the “special cases” (Example: “>&” can optionally take numbers on both sides, like “>&2” or “2>&1”).

It does not always appear to terminate (I think EOF + unclosed quote triggers this).  If you try it out and notice something funny, please let me know.

You can also find an older version of it in the bug #629247 and the output it produced at that time (that version used ” instead of – as token marker).

#!/usr/bin/perl

use strict;
use warnings;

use Text::ParseWords qw(quotewords);
my $opregex;

{
    my $tmp = join( "|", map { quotemeta $_ } qw (&& || | ; ));
    # Match & but not >& or <&
    # - Actually, it should eventually match those, but not right now.
    $tmp .= '|(?<![\>\<])\&';
    $opregex = qr/$tmp/ox;
}
my @tokens = ();
my $lno;
while (my $line = <>) {
    chomp $line;
    next if $line =~ m/^\s*(?:\#|$)/;
    $lno = $. unless defined $lno;
    while ($line =~ s,\\$,,) {
        $line .= "\n" . <>;
        chomp $line;
    }
    $line =~ s/^\s++//;
    $line =~ s/\s++$//;
    # Ignore empty lines (again, via "$empty \ $empty"-constructs)
    next if $line =~ m/^\s*(?:\#|$)/;

    my @it = quotewords ($opregex, 'delimiters', $line);
    if (!@it) {
        # This happens if the line has unbalanced quotes, so pop another
        # line and redo the loop.
        $line .= "\n" . <>;
        redo;
    }

    foreach my $orig (@it) {
        my @l;
        $orig =~ s,",\\\\",g;
        @l = quotewords (qr/\s++/, 1, $orig);
        pop @l unless defined $l[-1] && $l[-1] ne '';
        shift @l if $l[0] eq '';
        push @tokens, map { s,\\\\",",g; $_ } @l;
    }
    print "Line $lno: -" . join ("- -", map { s/\n/\\n/g; $_ } @tokens ) . "-\n";
    @tokens = ();
    $lno = undef;
}

Here is a little example script and the “tokenization” of that script (no, the example script is not supposed to be useful).

$ cat test
#!/bin/sh

for p in *; do
    if [ -d "$p" ];then continue;elif
    [ -f "$p" ]
    then echo "$p is a file";fi
done
$ ./test.pl test
Line 3: -for- -p- -in- -*- -;- -do-
Line 4: -if- -[- --d- -"$p"- -]- -;- -then- -continue- -;- -elif-
Line 5: -[- --f- -"$p"- -]-
Line 6: -then- -echo- -"$p is a file"- -;- -fi-
Line 7: -done-
Posted in Debian, Lintian | 6 Comments