auto-decrufter in top 5 after 10 months

About 10 months ago, we enabled an auto-decrufter in dak.  Then after 3 months it had become the top 11th “remover”.  Today, there are only 3 humans left that have removed more packages than the auto-decrufter… impressively enough, one of them is not even an active FTP-master (anymore).  The current score board:

 5371 Luca Falavigna
 5121 Alexander Reichle-Schmehl
 4401 Ansgar Burchardt
 3928 DAK's auto-decrufter
 3257 Scott Kitterman
 2225 Joerg Jaspert
 1983 James Troup
 1793 Torsten Werner
 1025 Jeroen van Wolffelaar
  763 Ryan Murray

For comparison, here is the number removals by year for the past 6 years:

 5103 2011
 2765 2012
 3342 2013
 3394 2014
 3766 2015  (1842 removed by auto-decrufter)
 2845 2016  (2086 removed by auto-decrufter)

Which tells us that in 2015, the FTP masters and the decrufter performed on average over 10 removals a day.  And by the looks of it, 2016 will surpass that.  Of course, the auto-decrufter has a tendency to increase the number of removed items since it is an advocate of “remove early, remove often!”.:)

 

Data is from https://ftp-master.debian.org/removals-full.txt.  Scoreboard computed as:

  grep ftpmaster: removals-full.txt | \
   perl -pe 's/.*ftpmaster:\s+//; s/\]$//;' | \
   sort | uniq -c | sort --numeric --reverse | head -n10

Removals by year computed as:

 grep ftpmaster: removals-full.txt | \
   perl -pe 's/.* (\d{4}) \d{2}:\d{2}:\d{2}.*/$1/' | uniq -c | tail -n6

(yes, both could be done with fewer commands)

Posted in Debian | Leave a comment

Putting Debian packages in labelled boxes

Lintian 2.5.44 was released the other day and (to most) the most significant bug fix was probably that Lintian learned about Policy 3.9.8.  I would like to thank Axel Beckert for doing that.  Notably it also made me update the test suite so to make future policy releases less painful.

For others, it might be the fact that Lintian now accepts (valid) versioned provides (which seemed prudent now that Britney accepts them as well).  Newcomers might appreciate that we are giving a much more sensible warning when they have extra spaces in their changelog “sign off” line (rather than pretending it is an improper NMU).  But I digress…

 

What I am here to talk about is that Lintian 2.5.44 started classifying packages based on various “facts” or “properties”, we can determine.  Therefore:

  • Every package will have at least one tag now!
  • These labels are known as something called “classification tags”.
  • The tags are not issues to be fixed!  (I will repeat this later to ensure you get this point!)

Here are some of the “labelled boxes” your packages will be put into[0]:

The tags themselves are (as mentioned) mere classifications and their primary purpose is to classify or measure certain properties.  With them any body can download the data set and come with some bold statement about Debian packages (hopefully without relying too much on “lies, damned lies and statistics“).  Lets try that immediately!

  • Almost 75% of all Debian packages do not need to run arbitrary code doing installation[2]!
  • The “dh-sequencer” with cdbs is the future![3]

In the next release, we will also add tracking of auto-generated snippets from dh_*-tools.  Currently unversioned, but I hope to add versioning to that so we can find and rebuild packages that have been built with buggy autoscripts (like #788098)

If you want to see the classification tags for your package, please run lintian with like this:

# Add classification tags
$ lintian -L +classification <pkg-or-changes>
# Or if you want only classification tags$ lintian -L =classification <pkg-or-changes>

Please keep in mind that classification tags (“C”) are not issues in themselves. Lintian is simply attempting to add a visible indicator about a given “fact” or “property” in the package – nothing more, nothing less.

 

Future work – help (read: patches) welcome:

 

[0] Mind you, the reporting framework’s handling of these tags could certainly be improved.

[1] Please note how it distinguishes 1.0 into native and non-native based on whether the package has a diff.gz.  Presumably that can be exploited somehow …

[2] Disclaimer: At the time of writing, only ~80% of the archive have been processed.  This is computed as: NS / (NS + WS), where NS and WS are the number of unique packages with the tags “no-ctrl-scripts” and “ctrl-script” respectively.

[3] … or maybe not, but we got two packages classified as using both CDBS and the dh-sequencer.  I have not looked at it in detail. For the curious: libmecab-java and ctioga2.

Posted in Debian, Lintian | Leave a comment

Easter patching of Britney

I decided to take a couple of days of vacation next to Easter and obviously ended up with tons of time.  I ended up channelling most of the (productive) time into improving Britney.

In raw results:

  • I wrote about 35 patches during my (extended) Easter holiday + reviewed and merged/cherry-picked 2 patches from others.
    • Today, the “britney-fixes-2016-03” branch had 48 commits not yet in master (8 or so written before Easter).
  • I submitted 33 of the patches for review with the intention of merging them into master soon.
    • The rest will be bundled for a later round.

The most “exciting” items in the patch series are probably:

  • Support for versioned provides (#786803)
    • Admittedly, there is a complete punt on multi-arch’ified provides.
  • Avoid cruft re-entering testing to satisfy dependencies of other packages
  • First step towards supporting packages being read from a standard (dak-built) mirror.
    • Britney still assumes some data files are stored in the “mirror”.  Though it will hopefully work for derivatives/users that disables (read: patches out) the aging and RC bugs policies.
    • Future work include”partial” suite support and self-contained components.
  • A crash fix (#815995) that only occurs with package “hijacking” (i.e. multiple source packages building the same binary).

 

Once reviewed, these will be merged into master and we will have versioned provides support (in Britney).:)

Posted in Debian, Release-Team | 1 Comment

Performance tuning of lintian, take 3

About 7 months ago, I wrote about we had improved Lintian’s performance. In 2.5.41, we are doing another memory reduction, where we primarily reduce the memory consumption of data about ELF binaries.  Like previously, memory reductions follows the “less is more” pattern.

My initial test subject was linux-image-4.4.0-trunk-rt-686-pae_4.4-1~exp1_i386.deb. It had a somewhat unique property that the ELF data made up a little over half the cache.

  • We do away with a lot of unnecessary default values [f4c57bb, 470875f]
    • That removed about ~3MB (out of 10.56MB) of that ELF data cache
  • Discard section information we do not use [3fd98d9]
    • This reduced the ELF data cache to 2MB (down from the 7MB).
  • Then we stop caching the output of file(1) twice [7c2bee4]
    • While a fairly modest reduction (only 0.80MB out of 16MB total), it also affects packages without ELF binaries.

At this point, we had reduced the total memory usage from 18.35MB to 8.92MB (the ELF data going from 10.56MB to 1.98MB)[1]. At this point, I figured that I was happy with the improvement and discarded my test subject.

While impressive, the test subject was unsurprisingly a special case.  The improvement in “regular” packages[2] (with ELF binaries) were closer to 8% in total.  Not being satisfied with that, I pulled one more trick.

  • Keep only “UND” and “.text” symbols [2b21621]
    • This brought coreutils (just the lone deb) another 10% memory reduction in total.

In the grand total, coreutils 8.24-1 amd64 went from 4.09MB to 3.48MB.  The ELF data cache went from 3.38MB to 2.84MB.  Similarly, libreoffice/4.2.5-1 (including its ~170 binaries) has also seen a 8.5% reduction in total cache size[3] and is now down to 260.48MB (from 284.83MB).

 

[1] If you are wondering why I in 3fd98d9 wrote “The total “cache” memory usage is approaching 1/3 of the original for that package”, then you are not alone.  I am not sure myself any more, but it seems obviously wrong.

[2] FTR: The sample size of “regular packages” is 2 in this case.  Of which one of them being coreutils…

[3] Admittedly, since “take 2” and not since 2.5.40.2 like the rest.

Posted in Debian, Lintian | 1 Comment

Lintian 2.5.40 – now with less output

You have probably tried to run lintian (-EIL +pedantic) on your packages only to watch lintian drown your terminal.  If you have, you would certainly not be the first.

A concrete example with lintian 2.5.40.2:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb | wc -l
85

Notably, at least 45 of these appeared in 2.5.40 (the hardening-no-bindnow tag):

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb \
  --tags hardening-no-bindnow | wc -l
45

In a single release, we have over doubled the number of tags in the given package.  I very much doubt this is the first time such a thing happened. Therefore, we have implemented a “per package” tag filter in 2.5.40.

The filter is applied automatically when stdout is a tty and restricts lintian to emitting no more than 3 concrete instances of a given tag per package.  If a fourth tag would have been emitted, the filter replaces it with a “how to see all instances” message and suppresses further instances in that package.

Accordingly, lintian “only” emits 25 lines (instead of 85) for the example package.  It looks something like this:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb 
I: 389-ds-base: spelling-error-in-binary usr/bin/dbscan-bin conents contents
X: 389-ds-base: hardening-no-bindnow usr/bin/dbscan-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/dsktune-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/infadd-bin
X: 389-ds-base: hardening-no-bindnow ... use --no-tag-display-limit to see all (or pipe to a file/program)
I: 389-ds-base: spelling-error-in-binary usr/lib/x86_64-linux-gnu/dirsrv/libns-dshttpd.so.0.0.0 occured occurred
[...]

With this very simple filter in place, the entire lintian output for that single binary now fits on my screen.  I am pretty sure the filter could do with additional smarts, but I believe it is a good start.

 

Posted in Debian, Lintian | 3 Comments

Tor enabled MTA

As I posted earlier, I have migrated to use tor on my machine.  Though I had a couple of unsolved issues back then.  One of them being my Mail Transport Agent (MTA) did not support tor.

A regular user might not have a lot of use for a MTA on their laptop.  However, it is needed for a lot of Debian development scripts (bts, mass-bug, nmudiff), if they are to file/manipulate bugs for you.

I have some requirements for my MTA

  • tor support (or at least “torsocks”-able)
  • support end-to-end encryption with my provider (STARTTLS)
  • verify that it is talking to my provider.
  • rewrite my “From” if it is not correct (otherwise the mail will just be rejected)
  • support the auth mechanisms of my provider
  • it should be simple to configure

I also have some non-requirements:

  • Local mail delivery is not required
  • The MTA will not be used as a general mail relay.
    • One target relay
    • No relaying from other hosts
  • Mail delivery queue is nice to have but not a strict requirement.

Originally, I used postfix, which supported most of these requirements.  Except:

  • My attempt to make it use tor failed.  The best suggestion I found was to divert its smtp handler and then replace it with a torsocks call to the original handler.  Sadly, it just seg. faulted.
  • While postfix is almost certainly able to verify it is talking with my provider, I never got it configured to do that.  In the end, postfix was to complicated for what I was ready to put up with.

 

Per suggestion of Jakub Wilk, I tried msmtp, which turned out do what I wanted.

  • There is a trivial config file example to start with.  I did not need to read any manuals or extended documentation to figure out what they were doing.
  • You probably also want to specify tls_priorities (assuming msmtp is linked against gnutls)
    • A code dive suggests it defaults to “NORMAL:-VERS-SSL3.0″ if not set.  It is probably not too bad, but could be better.:)
    • From a quick look at the gnutls manual “PFS:%PROFILE_<name>” seems like decent value (requires gnutls >= 3.2.4 and that your provider has decent/modern SSL setup).
    • You probably want to have a look at the values for the %PROFILE_<name> before deciding on one.
  • The msmtp program supports connecting through SOCKS proxies and even has a sample config snippet for using it with tor.
    • Of course, by the time I had discovered that I had already been using “torsocks /usr/sbin/sendmail” a couple of times. :)

The only feature I will probably miss is having a local queue, which can be rate limited.  But all in all, I am quite happy with it so far.:)

Posted in Debian | 2 Comments

“dput change-all-of-debian.changes”

Lucas Nussbaum recently did a blog post called “Debian is still changing“.  I found it a very welcome continuation of his previous blog post on the same topic.  I find the graphs very interesting and was very happy to learn that he included relative graphs this time.

Now I can with relatively ease say that 69% of all Debian packages are using a dh-style build (source).  We have another 15% using classic debhelper, which means that at least 84% of all packages uses debhelper directly.  Assuming all CDBS based packages rely on the “debhelper class”, we are at 99%!  The latter is certainly an assumption, although I suspect it is probably pretty accurate[1].

 

Now, it is very cute to have “world dominance”, but that is not my primary interest in these numbers.  Instead, we can use these numbers to determine that:

  • We can deploy changes to up to 99% of all source packages via existing debhelper tools
  • We can deploy changes to up to 84% of all sources packages via debhelper + CDBS if it requires a new debhelper tool.

Such as automatic dbgsym packages, indexable build-id from dbg(sym) packages via Packages files[2], and replacing maintscripts with ldconfig triggers. All of these changes happen to be changes that could be trivially deployed with very little risk and very high efficiency[3].  Notably, none of them required a compat bump (or a new debhelper tool).

Of course, I do not intend to say that every change can (or should) be deployed via debhelper and much less into an existing “dh_cmd”-tool.  Notably, dh_strip is reaching its breaking point for content.  And if we were to require a compat bump for your change, we can now at least see the adoption rate via lintian.:)

Nevertheless, it is nice to know that (politics aside) there is some agility in the Debian build system!:)

 

[1] I would very much love to see numbers to (dis)prove my assumption about CDBS + debhelper.  In fact, an absolute number of packages not using debhelper (indirectly) in Debian would be very intriguing.

[2] New fields by default end up the Packages file.  See e.g. the Packages.xz file on the debug mirror or your apt-cache via:

apt-cache show mscgen-dbgsym | grep ^Build-Ids

The latter assumes that you have the debug mirror in your sources list.

[3] Efficiency here being features people rarely override/disable.

Posted in Debhelper, Debian | Leave a comment