Another Britney patchset

I just submitted another patch series to improve Britney for review.  If accepted, they will probably be merged into master within 2 weeks. The changes this time are probably most exciting for people that run/maintain Britney.  Key highlights include:

  • Britney will be able to use a regular mirror (without partial suites) as data source
    • Previously you would have to decompress and merge the Packages/Sources for each component.
    • Partial suite support is still not added, but I hope to add it eventually.  I know it is feature used by at least Ubuntu.
    • This change implies renaming some input files around (Dates, Urgency and BugsV files) as Britney expected these next to the Packages files.
  • More machine parsable facts added to “excuses.yaml”.  It will cover almost all excuses currently in use.
  • Britney will support two use cases for “faux packages” natively.
    • I hope to use this to eliminate our need to “injecting” fake packages into Britney’s data source.

I would like to dwell a moment on the “faux packages”.  We have had a helper script generate and inject fake packages into the list of packages (called “faux packages”).  They generally serve two purposes, which Britney will support:

  1. Whitelist of fake packages to satisfy dependencies of other packages.
    • These are generally stand-in for non-free machine configuration packages, where the end-user system would also fetch packages from the vendor’s repository.
    • Packages relying on “faux packages” are generally not in “main” as Debian’s main component is required to be self-contained.
    • These are (still) be called “faux packages” in/after the patch series
  2. Ensuring that certain packages are present and installable in testing.
    • We have a lot of d-i related packages here to avoid accidental breakage of d-i.
    • These are now referred to as a “constraint” (assuming there is no bike-shedding over the name).

Since Britney will now distinguish between these two use cases, I also make Britney enforce the second use case slightly better.  Mind you, it can still be overruled by force-hints and BREAK_ARCHES, so there still enough rope to hang yourself.

 

The other exciting part of this patch set (for me, at least) is that Britney will hopefully become simpler to deploy. No doubt there are still some missing features and paper cuts left, but I suspect we are not far from a:

  1. Fill out a template config file pointing Britney to your mirror
  2. Run britney -c britney.conf
  3. Make your archive kit update your target suite based on Britney’s output.
  4. Put step 2+3 in crontab/jenkins/task scheduler of choice
  5. Profit

There will certainly be some features that requires extra steps.  An example is the “anti rc-bugs regression” feature, which requires you to feed Britney with the list of RC bugs for your source and target suite. But even without, Britney would still protect your target suite from most installability issues.

Posted in Debian, Release-Team | 1 Comment

auto-decrufter in top 5 after 10 months

About 10 months ago, we enabled an auto-decrufter in dak.  Then after 3 months it had become the top 11th “remover”.  Today, there are only 3 humans left that have removed more packages than the auto-decrufter… impressively enough, one of them is not even an active FTP-master (anymore).  The current score board:

 5371 Luca Falavigna
 5121 Alexander Reichle-Schmehl
 4401 Ansgar Burchardt
 3928 DAK's auto-decrufter
 3257 Scott Kitterman
 2225 Joerg Jaspert
 1983 James Troup
 1793 Torsten Werner
 1025 Jeroen van Wolffelaar
  763 Ryan Murray

For comparison, here is the number removals by year for the past 6 years:

 5103 2011
 2765 2012
 3342 2013
 3394 2014
 3766 2015  (1842 removed by auto-decrufter)
 2845 2016  (2086 removed by auto-decrufter)

Which tells us that in 2015, the FTP masters and the decrufter performed on average over 10 removals a day.  And by the looks of it, 2016 will surpass that.  Of course, the auto-decrufter has a tendency to increase the number of removed items since it is an advocate of “remove early, remove often!”.:)

 

Data is from https://ftp-master.debian.org/removals-full.txt.  Scoreboard computed as:

  grep ftpmaster: removals-full.txt | \
   perl -pe 's/.*ftpmaster:\s+//; s/\]$//;' | \
   sort | uniq -c | sort --numeric --reverse | head -n10

Removals by year computed as:

 grep ftpmaster: removals-full.txt | \
   perl -pe 's/.* (\d{4}) \d{2}:\d{2}:\d{2}.*/$1/' | uniq -c | tail -n6

(yes, both could be done with fewer commands)

Posted in Debian | Leave a comment

Putting Debian packages in labelled boxes

Lintian 2.5.44 was released the other day and (to most) the most significant bug fix was probably that Lintian learned about Policy 3.9.8.  I would like to thank Axel Beckert for doing that.  Notably it also made me update the test suite so to make future policy releases less painful.

For others, it might be the fact that Lintian now accepts (valid) versioned provides (which seemed prudent now that Britney accepts them as well).  Newcomers might appreciate that we are giving a much more sensible warning when they have extra spaces in their changelog “sign off” line (rather than pretending it is an improper NMU).  But I digress…

 

What I am here to talk about is that Lintian 2.5.44 started classifying packages based on various “facts” or “properties”, we can determine.  Therefore:

  • Every package will have at least one tag now!
  • These labels are known as something called “classification tags”.
  • The tags are not issues to be fixed!  (I will repeat this later to ensure you get this point!)

Here are some of the “labelled boxes” your packages will be put into[0]:

The tags themselves are (as mentioned) mere classifications and their primary purpose is to classify or measure certain properties.  With them any body can download the data set and come with some bold statement about Debian packages (hopefully without relying too much on “lies, damned lies and statistics“).  Lets try that immediately!

  • Almost 75% of all Debian packages do not need to run arbitrary code doing installation[2]!
  • The “dh-sequencer” with cdbs is the future![3]

In the next release, we will also add tracking of auto-generated snippets from dh_*-tools.  Currently unversioned, but I hope to add versioning to that so we can find and rebuild packages that have been built with buggy autoscripts (like #788098)

If you want to see the classification tags for your package, please run lintian with like this:

# Add classification tags
$ lintian -L +classification <pkg-or-changes>
# Or if you want only classification tags$ lintian -L =classification <pkg-or-changes>

Please keep in mind that classification tags (“C”) are not issues in themselves. Lintian is simply attempting to add a visible indicator about a given “fact” or “property” in the package – nothing more, nothing less.

 

Future work – help (read: patches) welcome:

 

[0] Mind you, the reporting framework’s handling of these tags could certainly be improved.

[1] Please note how it distinguishes 1.0 into native and non-native based on whether the package has a diff.gz.  Presumably that can be exploited somehow …

[2] Disclaimer: At the time of writing, only ~80% of the archive have been processed.  This is computed as: NS / (NS + WS), where NS and WS are the number of unique packages with the tags “no-ctrl-scripts” and “ctrl-script” respectively.

[3] … or maybe not, but we got two packages classified as using both CDBS and the dh-sequencer.  I have not looked at it in detail. For the curious: libmecab-java and ctioga2.

Posted in Debian, Lintian | Leave a comment

Easter patching of Britney

I decided to take a couple of days of vacation next to Easter and obviously ended up with tons of time.  I ended up channelling most of the (productive) time into improving Britney.

In raw results:

  • I wrote about 35 patches during my (extended) Easter holiday + reviewed and merged/cherry-picked 2 patches from others.
    • Today, the “britney-fixes-2016-03” branch had 48 commits not yet in master (8 or so written before Easter).
  • I submitted 33 of the patches for review with the intention of merging them into master soon.
    • The rest will be bundled for a later round.

The most “exciting” items in the patch series are probably:

  • Support for versioned provides (#786803)
    • Admittedly, there is a complete punt on multi-arch’ified provides.
  • Avoid cruft re-entering testing to satisfy dependencies of other packages
  • First step towards supporting packages being read from a standard (dak-built) mirror.
    • Britney still assumes some data files are stored in the “mirror”.  Though it will hopefully work for derivatives/users that disables (read: patches out) the aging and RC bugs policies.
    • Future work include”partial” suite support and self-contained components.
  • A crash fix (#815995) that only occurs with package “hijacking” (i.e. multiple source packages building the same binary).

 

Once reviewed, these will be merged into master and we will have versioned provides support (in Britney).:)

Posted in Debian, Release-Team | 1 Comment

Performance tuning of lintian, take 3

About 7 months ago, I wrote about we had improved Lintian’s performance. In 2.5.41, we are doing another memory reduction, where we primarily reduce the memory consumption of data about ELF binaries.  Like previously, memory reductions follows the “less is more” pattern.

My initial test subject was linux-image-4.4.0-trunk-rt-686-pae_4.4-1~exp1_i386.deb. It had a somewhat unique property that the ELF data made up a little over half the cache.

  • We do away with a lot of unnecessary default values [f4c57bb, 470875f]
    • That removed about ~3MB (out of 10.56MB) of that ELF data cache
  • Discard section information we do not use [3fd98d9]
    • This reduced the ELF data cache to 2MB (down from the 7MB).
  • Then we stop caching the output of file(1) twice [7c2bee4]
    • While a fairly modest reduction (only 0.80MB out of 16MB total), it also affects packages without ELF binaries.

At this point, we had reduced the total memory usage from 18.35MB to 8.92MB (the ELF data going from 10.56MB to 1.98MB)[1]. At this point, I figured that I was happy with the improvement and discarded my test subject.

While impressive, the test subject was unsurprisingly a special case.  The improvement in “regular” packages[2] (with ELF binaries) were closer to 8% in total.  Not being satisfied with that, I pulled one more trick.

  • Keep only “UND” and “.text” symbols [2b21621]
    • This brought coreutils (just the lone deb) another 10% memory reduction in total.

In the grand total, coreutils 8.24-1 amd64 went from 4.09MB to 3.48MB.  The ELF data cache went from 3.38MB to 2.84MB.  Similarly, libreoffice/4.2.5-1 (including its ~170 binaries) has also seen a 8.5% reduction in total cache size[3] and is now down to 260.48MB (from 284.83MB).

 

[1] If you are wondering why I in 3fd98d9 wrote “The total “cache” memory usage is approaching 1/3 of the original for that package”, then you are not alone.  I am not sure myself any more, but it seems obviously wrong.

[2] FTR: The sample size of “regular packages” is 2 in this case.  Of which one of them being coreutils…

[3] Admittedly, since “take 2” and not since 2.5.40.2 like the rest.

Posted in Debian, Lintian | 1 Comment

Lintian 2.5.40 – now with less output

You have probably tried to run lintian (-EIL +pedantic) on your packages only to watch lintian drown your terminal.  If you have, you would certainly not be the first.

A concrete example with lintian 2.5.40.2:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb | wc -l
85

Notably, at least 45 of these appeared in 2.5.40 (the hardening-no-bindnow tag):

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb \
  --tags hardening-no-bindnow | wc -l
45

In a single release, we have over doubled the number of tags in the given package.  I very much doubt this is the first time such a thing happened. Therefore, we have implemented a “per package” tag filter in 2.5.40.

The filter is applied automatically when stdout is a tty and restricts lintian to emitting no more than 3 concrete instances of a given tag per package.  If a fourth tag would have been emitted, the filter replaces it with a “how to see all instances” message and suppresses further instances in that package.

Accordingly, lintian “only” emits 25 lines (instead of 85) for the example package.  It looks something like this:

$ lintian -EIL +pedantic 389-ds-base_1.3.4.5-2_amd64.deb 
I: 389-ds-base: spelling-error-in-binary usr/bin/dbscan-bin conents contents
X: 389-ds-base: hardening-no-bindnow usr/bin/dbscan-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/dsktune-bin
X: 389-ds-base: hardening-no-bindnow usr/bin/infadd-bin
X: 389-ds-base: hardening-no-bindnow ... use --no-tag-display-limit to see all (or pipe to a file/program)
I: 389-ds-base: spelling-error-in-binary usr/lib/x86_64-linux-gnu/dirsrv/libns-dshttpd.so.0.0.0 occured occurred
[...]

With this very simple filter in place, the entire lintian output for that single binary now fits on my screen.  I am pretty sure the filter could do with additional smarts, but I believe it is a good start.

 

Posted in Debian, Lintian | 3 Comments

Tor enabled MTA

As I posted earlier, I have migrated to use tor on my machine.  Though I had a couple of unsolved issues back then.  One of them being my Mail Transport Agent (MTA) did not support tor.

A regular user might not have a lot of use for a MTA on their laptop.  However, it is needed for a lot of Debian development scripts (bts, mass-bug, nmudiff), if they are to file/manipulate bugs for you.

I have some requirements for my MTA

  • tor support (or at least “torsocks”-able)
  • support end-to-end encryption with my provider (STARTTLS)
  • verify that it is talking to my provider.
  • rewrite my “From” if it is not correct (otherwise the mail will just be rejected)
  • support the auth mechanisms of my provider
  • it should be simple to configure

I also have some non-requirements:

  • Local mail delivery is not required
  • The MTA will not be used as a general mail relay.
    • One target relay
    • No relaying from other hosts
  • Mail delivery queue is nice to have but not a strict requirement.

Originally, I used postfix, which supported most of these requirements.  Except:

  • My attempt to make it use tor failed.  The best suggestion I found was to divert its smtp handler and then replace it with a torsocks call to the original handler.  Sadly, it just seg. faulted.
  • While postfix is almost certainly able to verify it is talking with my provider, I never got it configured to do that.  In the end, postfix was to complicated for what I was ready to put up with.

 

Per suggestion of Jakub Wilk, I tried msmtp, which turned out do what I wanted.

  • There is a trivial config file example to start with.  I did not need to read any manuals or extended documentation to figure out what they were doing.
  • You probably also want to specify tls_priorities (assuming msmtp is linked against gnutls)
    • A code dive suggests it defaults to “NORMAL:-VERS-SSL3.0″ if not set.  It is probably not too bad, but could be better.:)
    • From a quick look at the gnutls manual “PFS:%PROFILE_<name>” seems like decent value (requires gnutls >= 3.2.4 and that your provider has decent/modern SSL setup).
    • You probably want to have a look at the values for the %PROFILE_<name> before deciding on one.
  • The msmtp program supports connecting through SOCKS proxies and even has a sample config snippet for using it with tor.
    • Of course, by the time I had discovered that I had already been using “torsocks /usr/sbin/sendmail” a couple of times. :)

The only feature I will probably miss is having a local queue, which can be rate limited.  But all in all, I am quite happy with it so far.:)

Posted in Debian | 2 Comments